Compare commits

..

1960 Commits

Author SHA1 Message Date
cb90e36684 Update onnx.rst (#40605)
Correcting the link (current 404s)
2020-06-30 06:46:40 -07:00
54a63e0420 Update header names and add in C++ docs (#27172)
1. Update "Package Reference" to "Python API"
2. Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
3. Add "Other Languages" section and add in C++ docs
2019-10-01 21:52:53 -04:00
8554416a19 delete C_CONTIGUOUS assertions to be compatible with particular builds of numpy 2019-08-08 05:54:09 -07:00
64069120e4 fix install_requires properly 2019-08-08 05:16:06 -07:00
11d7e8d85b Fix regression in triangular_solve when number of batches = 1 for CUDA (#23997)
Changelog:
- When number of batches = 1, dispatch to trsm instead of trsm_batched in MAGMA

Test Plan:
- All triangular_solve tests should pass to ensure that the change is valid
2019-08-07 20:36:21 -07:00
4d1d843d18 Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. 2019-08-07 16:01:08 -07:00
3e8e88b7f4 Use prerendered KaTeX in docs. (#23376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23376

This uses master version of sphinxcontrib-katex as it only
recently got prerender support.

Fixes #20984

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16582064

Pulled By: ezyang

fbshipit-source-id: 9ef24c5788c19572515ded2db2e8ebfb7a5ed44d
2019-08-07 15:58:36 -07:00
214dc0244b Adds torch.random to docs/toc 2019-08-07 15:57:29 -07:00
db37022260 [jit] prefix module qualified names with __module__ (#23633)
This is temporary, won't be needed with the new serialization format.
But for now, since the main module gets its name from the archive name,
we need this for safety, other wise something like
`torch.jit.save("torch.pt") will break things.

ghstack-source-id: f36febe1025ff04e7f79617e548819d4876dc7fa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23630
2019-08-07 18:55:48 -04:00
a0e9dd3190 [jit] don't try to set training after ScriptModule has been initialized. (#23681)
Now when initializing a ScriptModule during the torch.jit.load()
process, there is already a cpp module backing the thing. That means
that setting training will overwrite whatever the initialized
ScriptModule had.

This PR splits apart the common "set up internal state" part of the
Module __init__ and calls that from ScriptModule.__init__ and
Module.__init__, leaving the "nn.Module-specific" part (setting
`self.training`) for the nn.Module __init__

ghstack-source-id: 9b2ba8a15c43cf230363e4cd10ba4ad3ac4931f7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23680
2019-08-07 18:54:45 -04:00
5e36beca0f [jit] Support nn.GRU and Make nn.LSTM accept PackedSequence (#23700)
* [jit] Support nn.GRU and Make nn.LSTM accept PackedPaddedSequence

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix

* add link to comments
2019-08-07 18:54:08 -04:00
0d7d1d080d Fix argmax docstring (#23855) 2019-08-07 18:52:54 -04:00
e6f9422aca Fix expansion of stride argument (#23969)
In max_pool2d, max_pool3d, avg_pool2d, avg_pool3d.

There is only one substantive change: when stride.size() == 1,
we expand it to size 2.  However, I also took the opportunity
to give a better error message.

Testing here is bare minimum, because I'm in a hurry.  Just make
sure C++ API with all size 1 inputs works.

This is a squash of four commits.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2019-08-07 16:18:52 -04:00
805556cbb6 [v1.2] [RETRY] Fixed Bool in IsIntegralType bug (plus review comments) (#23955)
* Fixed Bool in IsIntegralType bug

* Added deprecation message

* Resolved PR comments

* Update for review comments.

* Get rid of tab.
2019-08-07 16:10:56 -04:00
7bdc5b6a63 Fix unused imports in torch/onnx/symbolic_opset8.py (#23678)
Summary:
Which causes lint errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23678

Differential Revision: D16622458

Pulled By: mrshenli

fbshipit-source-id: 145ad30dfb452dd556573c1b3d4cdd9cd7852752
2019-08-07 13:09:35 -07:00
5ad90d39ab No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (#23806)
Summary:
Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between
{INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All
we need to do is to pass down INSTALL_TEST.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806

Differential Revision: D16691833

Pulled By: soumith

fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61
2019-08-07 13:03:41 -07:00
8893c32cd8 Upload OS X binaries to pytorch-nightly please...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2019-08-07 15:56:40 -04:00
cdb032efea [v1.2] Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833) (#23952)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23480.

I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here.

Script:
```
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2)
for i in range(9):
    print(i)
    print(scheduler.get_lr())
    scheduler.step()
```
Output:
```
0
[0.5]
1
[0.5]
2
[0.25]
3
[0.5]
4
[0.42677669529663687]
5
[0.25]
6
[0.07322330470336313]
7
[0.5]
8
[0.4809698831278217]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833

Differential Revision: D16657251

Pulled By: gchanan

fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9
2019-08-07 13:39:24 -04:00
2ad6f427e8 [v1.2] fix torch.frac documentation (#23830) (#23951)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/13968 .

Following the math formula in wiki: https://en.wikipedia.org/wiki/Fractional_part
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23830

Differential Revision: D16656871

Pulled By: ailzhang

fbshipit-source-id: a71467870cf9566e0c7b1a045f72607dada81e1f
2019-08-07 13:39:12 -04:00
060f67723c Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261) (#23811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261
ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2

Test Plan: Imported from OSS

Differential Revision: D16005990

Pulled By: izdeby

fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85
2019-08-07 13:00:03 -04:00
fb50b52949 allow INSTALL_TEST to pass through from env to cmake (#23793)
Summary:
This allows `INSTALL_*` to pass through to cmake.
Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793

Differential Revision: D16648668

Pulled By: soumith

fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859
2019-08-05 13:09:06 -04:00
3314b57c33 Changed tensor comparison return type from uint8 to bool (#21113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113
ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf

Test Plan: Imported from OSS

Differential Revision: D15552204

Pulled By: izdeby

fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037
2019-08-05 07:43:49 -07:00
1ac19ea8ba Fix dataloader._shutdown_workers if not all workers are started (#23762) 2019-08-04 23:28:47 -04:00
93a8dda495 cpu binary builds are built with cu100 docker image now instead of cu80 2019-08-04 21:08:06 -04:00
de276b1d14 add appropriate install_requires 2019-08-04 20:00:30 -04:00
47828fd9fc Document empty_strided (#23740)
Changelog:
- Add doc string for torch.empty_strided
- Remove empty file named `python` in test/
2019-08-03 01:09:33 -04:00
f0a65e660c Allowing batching for det/logdet/slogdet operations (#22909) (#23634)
Summary:
Changelog:
- Add batching for det / logdet / slogdet operations
- Update derivative computation to support batched inputs (and consequently batched outputs)
- Update docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909

Test Plan:
- Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py`
- Add autograd tests for batched inputs

Differential Revision: D16580988

Pulled By: ezyang

fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a
2019-08-02 00:15:52 -04:00
7e327f259b [1.2.0] fix align_corners doc (#23709)
* fix align_corners doc

* Update torch/nn/functional.py

Co-Authored-By: bnehoran <bnehoran@users.noreply.github.com>

* Update torch/nn/functional.py

Co-Authored-By: bnehoran <bnehoran@users.noreply.github.com>
2019-08-02 00:14:50 -04:00
ccc7b5c225 Use dst dir for temp file (#23629) (#23713)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23629

Differential Revision: D16594223

Pulled By: soumith

fbshipit-source-id: db0275415111f08fc13ab6be00b76737a20f92df
2019-08-02 00:14:32 -04:00
6f5fb78e9e Fix CTC loss for zero-length targets on GPU (#23298) (#23715)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18215 at last!

Also sprinkle tests...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
2019-08-02 00:14:11 -04:00
4f5211691f [jit] Recursive compilation error hot fixes (#23686)
* [jit] Recursive compilation error hot fixes

This is a combination of #23454 and #23682 which are needed for the
error reporting on recrusively compiled code

* #23682
2019-08-01 18:40:15 -04:00
7a7cfcbf0f add setup metadata to help PyPI flesh out content on pypi package page 2019-08-01 14:36:17 -04:00
129939132b [v1.2.0] Slightly improve dataloader docs on when auto-batching is disabled (#23672)
* Slightly improve dataloader docs on when auto-batching is disabled

* fix typo
2019-08-01 14:26:33 -04:00
bb4ff00f33 fix pin_memory_thread not exiting quickly (#23647) 2019-08-01 11:08:20 -04:00
49e32ffc9f Enable stable docs push to pytorch.github.io:site-v1.2.0 2019-08-01 06:22:00 -07:00
d6df9575f9 Fix regression in torch.qr (#23591) (#23606)
Summary:
Changelog:
- Use narrow instead of narrow_copy while returning
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23591

Test Plan:
- All tests should pass to ensure that the change is correct

Fixes https://github.com/pytorch/pytorch/issues/23580

Differential Revision: D16581174

Pulled By: ezyang

fbshipit-source-id: 1b6bf7d338ddd138ea4c6aa6901834dd202ec79c
2019-07-31 21:04:02 -04:00
c3bad0de3e at::view (#23452) (#23604)
Summary:
accidently calls clone, but what we want is creating an empty tensor and set storage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23452
ghstack-source-id: 87438096

Differential Revision: D16442756

fbshipit-source-id: 6d5663f82c9bd4e9de8fc846c52992477843af6a
2019-07-31 20:25:35 -04:00
18cbf11329 Prepare the stable docs build for v1.2.0
This sets up the docs build in dry-run mode. If everything looks okay I
will enable it.
2019-07-31 13:25:47 -07:00
a9dc2a15b7 Refactor the pytorch_doc_push_script to take a branch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23556

Test Plan:
- Run ci

Imported from OSS

Differential Revision: D16563747

Pulled By: zou3519

fbshipit-source-id: 104371b3712c00b073a82e5145090e7bd6fd2d53
2019-07-31 13:18:55 -07:00
e7abff0778 Delete re_worker_requirements 2019-07-30 13:02:20 -04:00
b3a9a7a9b9 Rename gels to lstsq (#23460)
Summary:
Changelog:
- Rename `gels` to `lstsq`
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460

Test Plan: - All tests should pass to confirm that the patch is correct

Differential Revision: D16547834

Pulled By: colesbury

fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496
2019-07-30 09:56:04 -07:00
cfe9400996 Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. (#23528)
Summary:
After https://github.com/pytorch/pytorch/issues/23455, there is no need of this preprocessing in Python scripts.
They will be automatically processed in CMake (plus CPPFLAGS here
probably meant to be CXXFLAGS).

Reference:

- https://cmake.org/cmake/help/v3.15/envvar/CFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/CXXFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/LDFLAGS.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23528

Differential Revision: D16561561

Pulled By: ezyang

fbshipit-source-id: 962a27a2b0a18db0f95477ad067a2611e4128187
2019-07-30 08:07:36 -07:00
fd61cc9ebc Moved at::assert_no_internal_overlap to TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22917

Differential Revision: D16521429

Pulled By: pbelevich

fbshipit-source-id: 80ae583c6486d6948431b79e1452902bdf2cfbc3
2019-07-30 07:47:33 -07:00
4b78ce1ba4 Clean cmake infrastructure up (#23527)
Summary:
Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse)

Use cmake targets for ROCm where possible.

While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527

Differential Revision: D16561010

Pulled By: ezyang

fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f
2019-07-30 07:26:48 -07:00
437a8b3eed Named inference rule for copy_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23229

Test Plan: Imported from OSS

Differential Revision: D16494413

Pulled By: zou3519

fbshipit-source-id: 4acb85e5a4ad09bf5f7cbb84cc8d4ceac0cd9967
2019-07-30 07:17:34 -07:00
16da355b30 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
LARGE: 66
MEDIUM: 649
XLARGE: 1

Updated actions:
From LARGE to MEDIUM: 18
From LARGE to XLARGE: 2
From MEDIUM to LARGE: 20
From XLARGE to LARGE: 1

Differential Revision: D16559356

fbshipit-source-id: a51ef034265649314661ab0e283089a069a20437
2019-07-30 02:53:11 -07:00
4e59055c4d optimize matmul memory usage for certain cases (#23433)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23433

Differential Revision: D16524135

Pulled By: ailzhang

fbshipit-source-id: e7684fec60c9b9db9a09f8ac157b13c8dde1bdd2
2019-07-29 22:35:45 -07:00
7b081e5d1e Improve error message for changing tensor metadata after .data or .detach() (#23504)
Summary:
When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message.

Closes https://github.com/pytorch/pytorch/issues/23393.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504

Differential Revision: D16547415

Pulled By: yf225

fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c
2019-07-29 22:25:14 -07:00
db1e9b1d6c Fix a few clang warnings.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23524

Differential Revision: D16549562

fbshipit-source-id: 58351fc2858d495b135023626116f6f565c8e9b1
2019-07-29 22:08:50 -07:00
30bc19d751 dictKeys and dictItems ops on typed dicts return typed lists (#23270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23270
ghstack-source-id: 87389530

Differential Revision: D16448942

fbshipit-source-id: e6b578f0e97776112259d7ea38e143e4716ec273
2019-07-29 20:00:34 -07:00
c8817f9436 fix default value for script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23542

Test Plan: Imported from OSS

Differential Revision: D16557122

Pulled By: suo

fbshipit-source-id: c86578aa2c55f44ed5d573d33874a82244df3d09
2019-07-29 19:51:26 -07:00
6314af6e57 Revert D16526027: [jit] Include recursive class compilations in error call stack
Differential Revision:
D16526027

Original commit changeset: 109f2968430d

fbshipit-source-id: c27252540ec6b7da60739eb7dcc8b1650672c226
2019-07-29 19:02:39 -07:00
6574f6167c Revert D16554694: [jit] add a test for inline tracing
Differential Revision:
D16554694

Original commit changeset: 0fae4458f18c

fbshipit-source-id: 08aa0c292fa5b2dbdd0d1f0e59f531416edef760
2019-07-29 18:57:06 -07:00
265b498de2 add a test for inline tracing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23535

Test Plan: Imported from OSS

Differential Revision: D16554694

Pulled By: suo

fbshipit-source-id: 0fae4458f18c06ffbd484905ad7836dce9ce69cc
2019-07-29 18:05:15 -07:00
52b95fd4be Include recursive class compilations in error call stack (#23454)
Summary:
Previously these were left out which would lead to confusing messages,
now it looks something like:

```
torch.jit.frontend.UnsupportedNodeError: import statements aren't
supported
:
at ../test.py:13:9
    def bad_fn(self):
        import pdb
        ~~~~~~ <--- HERE
'__torch__.X' is being compiled since it was called from 'fn'
at ../test.py:16:12
def fn(x):
    return X(10)
           ~~~~ <--- HERE
```

Fixes #23453
](https://our.intern.facebook.com/intern/diff/16526027/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454

Pulled By: driazati

Differential Revision: D16526027

fbshipit-source-id: 109f2968430dbf51ee91b1b3409badfd557d19a4
2019-07-29 18:00:05 -07:00
696642ae8d Change docs to use recursive script API (#21612)
Summary:
Use the recursive script API in the existing docs

TODO:
* Migration guide for 1.1 -> 1.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21612

Pulled By: driazati

Differential Revision: D16553734

fbshipit-source-id: fb6be81a950224390bd5d19b9b3de2d97b3dc515
2019-07-29 17:51:22 -07:00
bfee46f8e2 Update argument list for non-fbgemm path for qconv_prepack (#23521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23521

non-fbgemm path should have the same arguments as fbgemm path.

Reviewed By: jianyuh

Differential Revision: D16547637

fbshipit-source-id: bb00d725fb968cbee32defb8facd2799a7e79bb4
2019-07-29 17:41:11 -07:00
65a89472c4 Put all modules in the global Python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23154

Test Plan: Imported from OSS

Differential Revision: D16441913

Pulled By: suo

fbshipit-source-id: a79f2c3e06a33cbd79b2e3333f16c069f356f451
2019-07-29 16:38:20 -07:00
e366af7d87 Add TORCH_CHECK to disable sub for bool tensors (#23519)
Summary:
This resolves two issues in one shot:

- sub shouldn't be available for bool type.
- When sub is applied to an unsupported type, the current error messages
  shows "add_cpu/add_cuda is not implemented for [type]". They should be
  "sub_cpu/sub_cuda" instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23519

Differential Revision: D16548770

Pulled By: izdeby

fbshipit-source-id: fe404a2a97b8d11bd180ec41364bf8e68414fb15
2019-07-29 16:28:35 -07:00
3c986dff77 introduce auto_set to simplify benchmarking the backward path of operators (#23276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276

This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example:

```
...
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
...
```

In this way, the benchmark will generate three different test cases.
1. input_one requires grad
2. input_two requires grad
3. both inputs require grad

Here is a sample output:
```
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwdall
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 863.744

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd1
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 727.915

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd2
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 687.626
```

Reviewed By: zheng-xq

Differential Revision: D16450355

fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305
2019-07-29 15:58:41 -07:00
41dfe7204b Threading and CPU Inference note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23417

Test Plan:
cd docs; make html

Imported from OSS

Differential Revision: D16523781

Pulled By: ilia-cher

fbshipit-source-id: d6c09e8a85d39e6185bbdc4b312fea44fcdfff06
2019-07-29 15:45:49 -07:00
f4eb93f7bc Support pack_padded_sequence and pad_packed_sequence
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23249

Test Plan: Imported from OSS

Differential Revision: D16466587

Pulled By: wanchaol

fbshipit-source-id: a721da01b2da0ef90cac80b77f1285102e3b1118
2019-07-29 15:36:47 -07:00
c384fbf4c8 support torch._C._get_tracing_state in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23248

Test Plan: Imported from OSS

Differential Revision: D16466588

Pulled By: wanchaol

fbshipit-source-id: 3c3d5dec2cea2f9cb080eadaef457cc62ac3fbe0
2019-07-29 15:05:50 -07:00
e1f8985973 Specify onnxruntime version to install for CI tests (#23517)
Summary:
No real change on the CI since currently the default latest is 0.4.0. houseroad bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23517

Differential Revision: D16550375

Pulled By: bddppq

fbshipit-source-id: a669b8af678c79c4d6909300b28458fe6b7cd30c
2019-07-29 14:58:15 -07:00
c779eff579 support torch.as_tensor in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23247

Test Plan: Imported from OSS

Differential Revision: D16466590

Pulled By: wanchaol

fbshipit-source-id: cf52721eacd177d9040564790382db13a9fcc2fe
2019-07-29 14:38:22 -07:00
3a568c9a2b CI: install clang-tidy (#23518)
Summary:
Install clang-tidy (from LLVM 8) for the `clang_tidy` job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23518

Differential Revision: D16549621

Pulled By: ezyang

fbshipit-source-id: b1d20641380cdfdb0589249770b98163528fa69f
2019-07-29 14:28:26 -07:00
a8edc2b5d2 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926

Differential Revision: D16546369

Pulled By: colesbury

fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027
2019-07-29 13:47:05 -07:00
9219a37c12 avoid Include the same header file twice (#23418)
Summary:
avoid Include the same header file twice
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23418

Differential Revision: D16546422

Pulled By: colesbury

fbshipit-source-id: 5cd868cce73d9199ced9b6f2f6f57bf42e5a5d5b
2019-07-29 13:34:11 -07:00
dec4eacae4 fix fbcode weak ordering (#23511)
Summary:
There is an internal fbcode assert that fails if i do not add these checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23511

Differential Revision: D16545606

Pulled By: eellison

fbshipit-source-id: cd3a799850bae8f052f9d81c1e4a2678fda19317
2019-07-29 13:14:39 -07:00
0c9979dd7d Fix TestCuda.test_events_wait (#23520)
Summary:
PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
Whenever a test is run, assertLeaksNoCudaTensors is called,
which in turn calls CudaMemoryLeakCheck, which in turn calls
initialize_cuda_context_rng, where it executes torch.randn
on each device, where a kernel is launched on each device.

Since the kernel may not finish on device 1, the assertion
self.assertTrue(s1.query()) fails.

The fix is to insert

        torch.cuda.synchronize(d0)
        torch.cuda.synchronize(d1)

at the beginning of the test so that previously launched kernels finish before the real
test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520

Differential Revision: D16547701

Pulled By: soumith

fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a
2019-07-29 13:09:41 -07:00
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
56664c2c65 Untap caskroom/homebrew-cask in attempt to unbreak OS X builds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23514

Test Plan: Imported from OSS

Differential Revision: D16546841

Pulled By: ezyang

fbshipit-source-id: 96d2e988cb0dddfeec0174875761dfa26f25a8c1
2019-07-29 12:45:01 -07:00
31f1928096 add sorting policy to ChunkDataset (#23053)
Summary:
Add a sorting policy to ChunkDataset.

This is considered an advanced parameter for developers who want to apply a 'sorting policy' to the chunk data before sampling into minibatch.

Different than the collate method, this policy is applied on the chunk level instead of minibatch level. When a chunk of data is loaded (multiple chunks if cross_chunk_shuffle_count_ is greater than 1), this policy is targeting to the full loaded data. It will be useful if developers want to perform some pre-processing (like bucketing) to the chunk data before example sampler samples the data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23053

Differential Revision: D16537692

Pulled By: colesbury

fbshipit-source-id: cd21ed40ab787a18b8c6dd304e5b806a7a45e6ba
2019-07-29 12:34:02 -07:00
a356276d79 add note to Contribution Guide around recently released research (#23513)
Summary:
Thanks adefazio for the feedback, adding a note to the Contribution guide so that folks don't start working on code without checking with the maintainers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23513

Differential Revision: D16546685

Pulled By: soumith

fbshipit-source-id: 1ee8ade963703c88374aedecb8c9e5ed39d7722d
2019-07-29 12:24:59 -07:00
06762b4721 Fix distributions.Categorical.sample bug from .view() (#23328)
Summary:
This modernizes distributions code by replacing a few uses of `.contiguous().view()` with `.reshape()`, fixing a sample bug in the `Categorical` distribution.

The bug is exercised by the following test:
```py
batch_shape = (1, 2, 1, 3, 1)
sample_shape = (4,)
cardinality = 2
logits = torch.randn(batch_shape + (cardinality,))
dist.Categorical(logits=logits).sample(sample_shape)
# RuntimeError: invalid argument 2: view size is not compatible with
#   input tensor's size and stride (at least one dimension spans across
#   two contiguous subspaces). Call .contiguous() before .view().
#   at ../aten/src/TH/generic/THTensor.cpp:203
```
I have verified this works locally, but I have not added this as a regression test because it is unlikely to regress (the code is now simpler).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23328

Differential Revision: D16510678

Pulled By: colesbury

fbshipit-source-id: c125c1a37d21d185132e8e8b65241c86ad8ad04b
2019-07-29 12:09:50 -07:00
be644d822b fixes #20178 (#23297)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23297

Differential Revision: D16497552

Pulled By: VitalyFedyunin

fbshipit-source-id: 386933b15c27d02351f042be71b153bc9439004d
2019-07-29 12:04:44 -07:00
236149edc5 Make randperm works properly on non-contiguous tensors. (#23043)
Summary:
Close https://github.com/pytorch/pytorch/issues/22710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043

Differential Revision: D16446340

Pulled By: VitalyFedyunin

fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9
2019-07-29 11:59:04 -07:00
d6ff78fd00 fix an over-indented return in trace_module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23358

Differential Revision: D16519010

Pulled By: Krovatkin

fbshipit-source-id: a7e4225b70e915d91c74874e3eca9bcb87baf84c
2019-07-29 11:15:55 -07:00
505fa83b2f Implement named inference rule for mul
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23193

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23193 gh/zou3519/75/head

Imported from OSS

Differential Revision: D16494401

Pulled By: zou3519

fbshipit-source-id: 0e2395d7de39158ec51feed5da0389715ec52600
2019-07-29 09:58:18 -07:00
d3fcb4ccd3 Try another version of apt/dpkg killing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23499

Test Plan: Imported from OSS

Differential Revision: D16542875

Pulled By: ezyang

fbshipit-source-id: 05aa97f2d61e4fc00a819768448944f85701cab8
2019-07-29 09:13:24 -07:00
8ada7c9920 Remove two CMAKE_ build options from additional_options. (#23451)
Summary:
Following up 915261c8bef85e3b845a0384d3fb55e55707b609
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23451

Differential Revision: D16542303

Pulled By: ezyang

fbshipit-source-id: 1406c311c198eb237f85d6d8f1f0d58626be8257
2019-07-29 08:13:59 -07:00
09ba4df031 Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445)
Summary:
Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445

Differential Revision: D16542275

Pulled By: ezyang

fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2
2019-07-29 08:13:56 -07:00
b335f3910f Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455)
Summary:
- MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts.
- Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used.
- Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455

Differential Revision: D16542274

Pulled By: ezyang

fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac
2019-07-29 08:08:47 -07:00
81e46d4f78 Fix build issue. CUDA may be installed in $CUDA_HOME/lib on macOS. (#23491)
Summary:
Closes gh-16955.
Closes https://github.com/pytorch/vision/issues/977

On Linux both `lib64` and `lib` may be present (symlinked). The reports
seem to all be about macOS, but it seems like this is also possibly more
robust on Linux and can't hurt. So not treating platforms differently.

Note that Eigen has a similar check in its CMake:

```
if(CUDA_64_BIT_DEVICE_CODE AND (EXISTS "${CUDA_TOOLKIT_ROOT_DIR}/lib64"))
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib64")
else()
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib")
endif()
 ```

There may be other issues for building from source on macOS, can't test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23491

Differential Revision: D16538973

Pulled By: soumith

fbshipit-source-id: cc309347b7d16e718e06878d3824d0a6e40b1019
2019-07-29 08:08:43 -07:00
97f129bf0a Let set_rng_state and get_rng_state accept string parameter (#23448)
Summary:
Currently set_rng_state and get_rng_state do not accept string as their parameters. This commit let them accept strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23448

Differential Revision: D16527172

Pulled By: soumith

fbshipit-source-id: 8f9a2129979706e16877cc110f104770fbbe952c
2019-07-29 08:08:39 -07:00
7a82066282 Update PyTorch Docker image to 323. (#23389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23389

Test Plan: Imported from OSS

Differential Revision: D16541971

Pulled By: ezyang

fbshipit-source-id: 2b7e483f4d6eedef7f5c140ffc0fac21fecd179b
2019-07-29 07:29:54 -07:00
f546a3b8d8 fixing documentation, issue 22697 (#23268)
Summary:
As fmassa commented :

> Agree, it should probably be weight, start, end
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23268

Differential Revision: D16493403

Pulled By: zou3519

fbshipit-source-id: 51ed07f6f7abdbd41dc323570aed41d804fa9c1b
2019-07-29 07:24:49 -07:00
19858f7cc6 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 981
LARGE: 56

Updated actions:
From MEDIUM to LARGE: 10
From LARGE to MEDIUM: 3
From LARGE to XLARGE: 1

Differential Revision: D16532427

fbshipit-source-id: c58bf59e6c571627b3994f8cdfa79758fb85892b
2019-07-29 04:37:23 -07:00
91d28026f8 Remove unused cuBLAS driver functions for getrs (#23375)
Summary:
Changelog:
- Remove getrs driver functions from THCBlas{.h/.cpp}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23375

Test Plan: - Build to pass to confirm no callsites were missed.

Differential Revision: D16539079

Pulled By: soumith

fbshipit-source-id: b5c285a2d36714ddf3393337eec7d85b1eaf3f51
2019-07-28 21:29:18 -07:00
54c280863c Add some compiler flags for building cpp extensions on Windows (#23472)
Summary:
(1) Add `COMMON_MSVC_FLAGS` to the flags in the ninja codepath
(2) Add `/EHsc` to `COMMON_MSVC_FLAG`
(3) Remove `-fPIC` and `-std=c++11` from the flags in the windows codepath
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23472

Differential Revision: D16532993

Pulled By: soumith

fbshipit-source-id: bc2d983f5f8b4eae9c7385bf170f155679e92e87
2019-07-28 20:33:18 -07:00
ef6356133e Revert D16428208: [pytorch][PR] only scatter in forward if multi-device per process
Differential Revision:
D16428208

Original commit changeset: eaa3876b2b95

fbshipit-source-id: 9db3bc86bf419dd06fdaaff434f72b92ecb5a427
2019-07-27 22:41:20 -07:00
64e4152064 Clarify that torch.device without an index will always represent the current device (#23468)
Summary:
Per discussion in https://github.com/pytorch/pytorch/issues/23448
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23468

Differential Revision: D16532950

Pulled By: soumith

fbshipit-source-id: 48c97060aaf55f1d7589afab42c6cd623d71a9a7
2019-07-27 06:49:52 -07:00
ffef0e03b7 Enabling GPU device runs for operators (#23461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461

Enabling GPU device runs for production operator shapes.

Reviewed By: xw285cornell, mingzhe09088

Differential Revision: D16526928

fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388
2019-07-26 18:53:40 -07:00
23e526e6ff Fix SourceRange comparison
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23341

Test Plan: Imported from OSS

Differential Revision: D16505398

Pulled By: jamesr66a

fbshipit-source-id: 0bf6a1a054c7749c0a3334654d5746dd9f5dee96
2019-07-26 18:08:43 -07:00
3497891c14 add sorted keyword for lists and dicts (#23274)
Summary:
Add `sorted` keyword to JIT for lists and dicts. This desugars to a list copy and a call to `list.sort()`. Since we don't have interfaces yet I implement it in terms of `list.sort()`. When we do we can re-visit implementing this op in a different manner.

The test fails bc of a fix to specialized lists which is landing here: https://github.com/pytorch/pytorch/pull/23267

Ignore the first commit because it is formatting, plz use clang_format ppl :'(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23274

Differential Revision: D16527323

Pulled By: eellison

fbshipit-source-id: aed8faef23cb790b9af036cd6c1b9b1d7066345d
2019-07-26 17:44:15 -07:00
f0ebf769de allow accepting empty input to the benchmark (#23462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462

as title

Reviewed By: hl475

Differential Revision: D16527176

fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc
2019-07-26 17:30:42 -07:00
522cca5040 Support IntList in Dict's shalloEquals (#23205)
Summary:
Required when comparing IntList type of default values

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23205
ghstack-source-id: 87208341

Reviewed By: zrphercule

Differential Revision: D16433809

fbshipit-source-id: 3f60d67d708129be31198161423d819108468077
2019-07-26 17:30:38 -07:00
d6d7a5f075 only scatter in forward if multi-device per process (#22384)
Summary:
Scatter is unnecessary if only using one device, and it breaks on some custom data structures like namedtuple, so would like to avoid :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22384

Differential Revision: D16428208

Pulled By: soumith

fbshipit-source-id: eaa3876b2b95c1006ccaaacdb62f54c5280e730c
2019-07-26 17:30:34 -07:00
e1ae3a75c8 gate module::save logic on mobile (#23415)
Summary:
This is part of the effort to shrink OSS libtorch mobile build size.
We shouldn't need Module::save function on mobile - it depends on
csrc/jit/export.cpp which then depends on ONNX. By gating these two
methods we can avoid these dependencies for libtorch mobile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23415
ghstack-source-id: 87288228

Reviewed By: dreiss

Differential Revision: D16511143

fbshipit-source-id: fd031f91fcf9b7be54cbe1436506965af94ab537
2019-07-26 17:23:38 -07:00
23f963e4a8 Update distributed.rst (#23289)
Summary:
Different backend is supported since https://github.com/pytorch/pytorch/pull/18595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289

Differential Revision: D16528229

Pulled By: soumith

fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0
2019-07-26 16:55:52 -07:00
ca76c82ce3 Add early returns to JIT (#19179)
Summary:
Add early returns to JIT with minimal changes to compiler.cpp and an IR->IR pass that will transform the graph so that there is only one return value.

In compiler.cpp, record when a block will exit so that in the following example will work:
```
if cond:
    a = torch.zeros([2])
else:
    return 2
a += 2
...
```
To match block outputs with values that will not be used, like in the above example with `a`, I add a Bottom Type that subtypes everything else. This allows shape propagation to continue to work, and makes it so that we don't need many extra nodes filling up the graph.

The IR transform currently doesn't work on Loops, I didn't add that to this PR to avoid too much complexity, but will add it as a stack (and it should be very little extra code). the IR  transform is commented at the top of the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19179

Differential Revision: D16519819

Pulled By: eellison

fbshipit-source-id: 322a27f69966d1fd074ebe723c3e948b458b0e68
2019-07-26 16:42:43 -07:00
9223fa1c46 Add support to serialize qtensor in JIT. (#23356)
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428

Differential Revision: D16473237

fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
2019-07-26 15:52:15 -07:00
9dad13e1f0 Revert "Add fbgemm_qlinear_dynamic op (#23104)"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23449

Test Plan: Imported from OSS

Differential Revision: D16524768

Pulled By: ezyang

fbshipit-source-id: 9eb01b021011d1172317b5adb774c10c42ac2b86
2019-07-26 15:02:33 -07:00
953459f29e Dont allow conversions with QInt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22931

Test Plan: Imported from OSS

Differential Revision: D16467985

Pulled By: gchanan

fbshipit-source-id: 3925fc96a641e66b92fa65c542a2a23190c915a5
2019-07-26 14:45:14 -07:00
190d255d2e Add FLOAT_MODULE to quantized conv (#23414)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23414
ghstack-source-id: 87225586

Differential Revision: D16511055

fbshipit-source-id: c617733f60cfe38f4791e35e57e9551f2b5d8c09
2019-07-26 14:02:20 -07:00
83d6c6be07 ONNX export for index_select (#21866)
Summary:
ONNX export for index_select
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21866

Reviewed By: zrphercule

Differential Revision: D16471345

Pulled By: houseroad

fbshipit-source-id: 745c23ba8a3223b5ec59b924df7358a36a92518c
2019-07-26 13:56:15 -07:00
74f8094ea5 Rename threading build options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407

Test Plan:
USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB
BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python
setup.py develop install --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D16522538

Pulled By: ilia-cher

fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5
2019-07-26 13:09:14 -07:00
aae48748f2 Avoid unnecessary tensor clone in Cloneable (#20995)
Summary:
As pointed out by SsnL in https://github.com/pytorch/pytorch/issues/20910, when clone destination is different from the module's device,
`Cloneable` currently calls `clone()` and then `to()` on every parameter and buffer, where the first clone is unnecessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20995

Differential Revision: D15517353

Pulled By: mrshenli

fbshipit-source-id: 6b6dc01560540a63845663f863dea0a948021fa5
2019-07-26 12:46:42 -07:00
53182e53f0 fix observer name in the benchmark output (#23443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443

as title

Reviewed By: hl475

Differential Revision: D16520962

fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c
2019-07-26 12:20:41 -07:00
828c08b4c7 allow passing a list of operators to benchmark (#23442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442

Replace the argument name from `operator` to `operators` which can take a list of operators to test.

Reviewed By: hl475

Differential Revision: D16520779

fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb
2019-07-26 12:20:36 -07:00
0bc90194fb Catch and print exception traceback in parallel_apply() workers (#18055)
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.

This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.

Before:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
    raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

After:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
    ''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
  ...
  File "../models/foo.py", line 319, in bar
    baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055

Differential Revision: D16444972

Pulled By: zhangguanheng66

fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
2019-07-26 11:41:22 -07:00
7499fe72e9 remove c2 tests from benchmark_all_test (#23437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437

as title

Reviewed By: hl475

Differential Revision: D16519770

fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113
2019-07-26 11:12:53 -07:00
e5e2face8f Change handling of DataParallel in ONNX exporter (#23365)
Summary:
Don't automatically unwrap top layer DataParalllel for users. Instead, we provide useful error information and tell users what action to take.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23365

Reviewed By: zrphercule

Differential Revision: D16514273

Pulled By: houseroad

fbshipit-source-id: f552de5c53fb44807e9d9ad62126c98873ed106e
2019-07-26 11:12:49 -07:00
c8c5e11fba Support variadic returns in Schema's operator<< (#23204)
Summary:
old: prim::PythonOp(...) ->
new: prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23204
ghstack-source-id: 87208343

Reviewed By: zrphercule

Differential Revision: D16433592

fbshipit-source-id: 36cbb329188f112e09c3b1708a8090781b830dfe
2019-07-26 10:58:00 -07:00
34f53564b4 Don't warn when using conda compilers with utils.cpp_extension (#23396)
Summary:
The conda compiler are gcc/c++ 7.3.0, but have custom version strings
for clarity:

    x86_64-conda_cos6-linux-gnu-cc
    x86_64-conda_cos6-linux-gnu-c++

Using these compilers to build a C++ or CUDA extension now gives this warning (unnecessarily):

```
                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/home/rgommers/anaconda3/envs/pytorch-nightly/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux.
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23396

Differential Revision: D16500637

Pulled By: soumith

fbshipit-source-id: 5b2fc3593e22e9a7d07dc2c0456dbb4934ffddb2
2019-07-26 10:17:14 -07:00
47a54295ee Add fbgemm_qlinear_dynamic op (#23104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23104

ghstack-source-id: 87247148

As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for ```torch.fbgemm_linear_int8_weight``` (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear.

Differential Revision: D16381552

fbshipit-source-id: 1ccc4174fd02c546eee328940ac4b0da48fc85e8
2019-07-26 10:11:56 -07:00
b7984fa8a7 Remove cases of AT_FORALL_SCALAR_TYPES_EXCEPT_HALF.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22890

Test Plan: Imported from OSS

Differential Revision: D16467980

Pulled By: gchanan

fbshipit-source-id: 93ddbd041b7f65cafe8520b095289f14ad6d667f
2019-07-26 09:58:35 -07:00
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
c8a50a26d2 Named inference rule for torch.prod
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23106

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16419175

Pulled By: zou3519

fbshipit-source-id: beb9ef838525c1ea7d7839cb9b8d68028fb4917f
2019-07-26 08:50:45 -07:00
9817d7e16b Implement named inference rule for torch.sum
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23081

Test Plan:
- New tests [namedtensor ci]

Imported from OSS

Differential Revision: D16419174

Pulled By: zou3519

fbshipit-source-id: 8679f77f121664d0398d7f062a53c0fa37482481
2019-07-26 08:50:40 -07:00
4104e80eae qconv+relu and qlinear+relu modules (#23410)
Summary:
adding qconv+relu and qlinear+relu modules in nn/_intrinsic/quantized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23410

Test Plan:
Extended tests to test these new modules as well

buck test mode/dev caffe2/test:quantized -- 'test_linear_api'  --print-passing-details
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
      ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 4.055 1/1 (passed)
Test output:
> test_linear_api (test_nn_quantized.ModuleAPITest)
> test API functionality for nn.quantized.linear and nn._intrinsic.quantized.linear_relu ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 4.056s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
Summary (total time 10.66s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'  --print-passing-details
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 5.195 1/2 (passed)
Test output:
> test_conv_api (test_quantized_conv.QuantizedConvTest)
> Tests the correctness of the conv functional. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.195s
>
> OK
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 10.616 2/2 (passed)
Test output:
> test_conv_api (test_nn_quantized.ModuleAPITest)
> Tests the correctness of the conv module. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 10.616s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
Summary (total time 17.31s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
``

Differential Revision: D16505333

Pulled By: dskhudia

fbshipit-source-id: 04f45cd0e76dc55f4694d558b913ab2958b7d727
2019-07-26 08:50:36 -07:00
fb8725fdbd Update doc about subsequent builds: options can be changed in build/CMakeCache.txt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23331

Test Plan: Imported from OSS

Differential Revision: D16517622

Pulled By: ezyang

fbshipit-source-id: d2d15b8bb2599b3b9abb7a1aa6bc91bfc0d8e5d0
2019-07-26 08:50:32 -07:00
0b4c0b95e9 For second-time build, let build_type be inferred from CMakeCache.txt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23323

Test Plan: Imported from OSS

Differential Revision: D16517621

Pulled By: ezyang

fbshipit-source-id: 22984df214d01246a7868980e148936698940ea8
2019-07-26 08:50:28 -07:00
beb109f6f1 Enable cpp api test in multigpu-test.sh (#23380)
Summary:
blocking https://github.com/pytorch/pytorch/issues/20995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23380

Differential Revision: D16517013

Pulled By: mrshenli

fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837
2019-07-26 07:44:07 -07:00
46224ef89e Update ONNX docs (#23185)
Summary:
This is still work in progress.

There are several more items to add to complete this doc, including

- [x] LHS indexing, index assignments.
- [x] Tensor List.
- [x] ~Shape/Type propagation.~
- [x] FAQs

Please review and share your thoughts, feel free to add anything that you think should be included as well. houseroad spandantiwari lara-hdr neginraoof
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23185

Differential Revision: D16459647

Pulled By: houseroad

fbshipit-source-id: b401c005f848d957541ba3b00e00c93ac2f4609b
2019-07-26 00:14:54 -07:00
7a0ae0079f export sort to onnx
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21913

Differential Revision: D15982801

Pulled By: houseroad

fbshipit-source-id: 96dbd738c557478fffd48000db7263ae1f9754f5
2019-07-26 00:02:20 -07:00
1c00e0fc3f Added a flatten module (#22245)
Summary:
https://github.com/pytorch/pytorch/issues/2118

I'm not sure I'm doing it correctly, so I'll add tests if we decide that it's roughly correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22245

Differential Revision: D16508957

Pulled By: Chillee

fbshipit-source-id: a8dc7af999ba698c921006889f71cb1bc5a59d50
2019-07-25 22:48:52 -07:00
5b0484d977 Fix forwarding of arguments into kernel function (#23412)
Summary:
They should be forwarded by their actual type, not their rvalue reference.
This looked like perfect forwarding but actually wasn't.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23412
ghstack-source-id: 87214575

Reviewed By: dzhulgakov

Differential Revision: D16507872

fbshipit-source-id: 2b20a37df83067dd53e917fe87407ad687bb147c
2019-07-25 22:00:40 -07:00
3516f3c235 handle exit from init method (#21211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211

There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped.

Reviewed By: zheng-xq

Differential Revision: D15466410

fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7
2019-07-25 21:41:06 -07:00
dd79d45c5a Added torch.bitwise_not docstr (#23397)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/23311
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23397

Differential Revision: D16505107

Pulled By: pbelevich

fbshipit-source-id: 8d515fc27e253469393941c8da23d8e0510e64df
2019-07-25 18:32:58 -07:00
58a3e4f71f Automatic update of fbcode/onnx to 28ca699b69b5a31892619defca2391044a9a6052 (#23404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23404

Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd

Included changes:
- **[28ca699b](https://github.com/onnx/onnx/commit/28ca699b)**: Member Company logo guidelines (#2196) <Prasanth Pulavarthi>
- **[47acb06a](https://github.com/onnx/onnx/commit/47acb06a)**: remove link to outdated issue for contributions wanted (#2186) <Prasanth Pulavarthi>
- **[168519f6](https://github.com/onnx/onnx/commit/168519f6)**: Create sigs.md (#2103) <Prasanth Pulavarthi>
- **[b9320746](https://github.com/onnx/onnx/commit/b9320746)**: mintor format update (#2180) <Prasanth Pulavarthi>
- **[65b8e0f9](https://github.com/onnx/onnx/commit/65b8e0f9)**: add more types support for Equal op (#2176) <Ke Zhang>
- **[dc5e62a9](https://github.com/onnx/onnx/commit/dc5e62a9)**: Update AddNewOP document. (#2172) <Emad Barsoum>
- **[bae8b530](https://github.com/onnx/onnx/commit/bae8b530)**: Add missing space (#2150) <Takeshi Watanabe>
- **[5952b7f5](https://github.com/onnx/onnx/commit/5952b7f5)**: python api example typo fix (#2155) <LeicongLi>
- **[904cb842](https://github.com/onnx/onnx/commit/904cb842)**: Fix errors in RoiAlign shape inference code (#2167) <G. Ramalingam>

Reviewed By: zrphercule

Differential Revision: D16502373

fbshipit-source-id: 81f1e8f0db6828fd551089ae2e0be65153739532
2019-07-25 18:26:04 -07:00
bd54608bd2 fused qconv2d+relu kernel (#23353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23353

Adding support for fused qconv2d + relu

Reviewed By: jianyuh

Differential Revision: D16473318

fbshipit-source-id: cd3c3476a21ffe946dbd9812e833b957c0fd206c
2019-07-25 17:55:47 -07:00
6a8c2758d5 Add better performing versions for groupwise and depthwise convolutions (#22869)
Summary:
Groupwise and depthwise convolutions become faster with this diff
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22869

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qconv'  --print-passing-details

```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
      ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 2.731 1/2 (passed)
Test output:
> test_qconv (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 2.732s
>
> OK
      ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 5.187 2/2 (passed)
Test output:
> test_qconv_unpack (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.188s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
Summary (total time 15.66s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.040 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.402 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
Summary (total time 11.83s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16264144

Pulled By: dskhudia

fbshipit-source-id: 32fa43e5c3d97c8aaa6e0858327a2ac0aef8df5c
2019-07-25 17:55:43 -07:00
2409e6a475 C++ at::Tensor, torch::tensor constructors should not accept QInts.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22889

Test Plan: Imported from OSS

Differential Revision: D16467984

Pulled By: gchanan

fbshipit-source-id: 6e2b1bf2a6a8dbc60138cd437b9cf814a0fc297d
2019-07-25 16:30:25 -07:00
0e3a359a38 Align the operator<< for Argument with FunctionSchema parser (#23203)
Summary:
Align the Argument's operator<< with parser,
additional support:
1) List size
2) real default value
3) Alias information

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23203
ghstack-source-id: 87118985

Reviewed By: zrphercule

Differential Revision: D16433188

fbshipit-source-id: aea5711f93feacd94d1732e2f0d61218a31a0c5c
2019-07-25 15:28:17 -07:00
83b0fbc38d Remove TensorIterator::Builder (#23329)
Summary:
The builder pattern doesn't seem to work well with return-value-optimization.
This saves ~100 ns in the construction of TensorIterator::binary_op.

```
import torch
x = torch.rand(1)
y = torch.rand(1)
z = torch.rand(1)
%timeit torch.add(x, y, out=z)  # ~1.76 us vs ~1.88 us on my machine
```

cc resistor zheng-xq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23329

Differential Revision: D16495070

Pulled By: VitalyFedyunin

fbshipit-source-id: 8ce116075fa4c7149dabfcdfa25885c1187c8e2f
2019-07-25 15:15:49 -07:00
2cfe949d45 DEPRECATE_MESSAGE doesn't actually get expanded; inline it at both sites.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23379

Test Plan: Imported from OSS

Differential Revision: D16495278

Pulled By: ezyang

fbshipit-source-id: 596438fbf3285d6eee7b5d27a014f87b6c261cf1
2019-07-25 14:26:00 -07:00
b755bc1e31 fix importing for module defs that are named "foo.bar"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23367

Test Plan: Imported from OSS

Differential Revision: D16478637

Pulled By: suo

fbshipit-source-id: 30c6e7bfe377ef35d8c39e2d31615075ca0a6a19
2019-07-25 14:07:56 -07:00
b22adeb007 Fix error message for a wrong fork CUDA (#23322)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322

Differential Revision: D16469442

Pulled By: zhangguanheng66

fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc
2019-07-25 12:58:14 -07:00
d18529eb93 Fix upload path for macos binaries (#23386)
Summary:
peterjc123  will this affect windows too?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23386

Differential Revision: D16499443

Pulled By: pjh5

fbshipit-source-id: a3bec32d16f2cd06a097082deae0b020dd8bc5ac
2019-07-25 12:48:04 -07:00
7ee62d3d91 Fix the iOS build (#23293)
Summary:
The legacy iOS build script (`build_ios.sh`) is still working, but the output is in caffe2, not Pytorch. To enable the Pytorch iOS build, we can set the value of `BUILD_CAFFE2_MOBILE` to `NO`, and turn on another cmake arg - `INTERN_BUILD_MOBILE` ljk53  has created for Android.

There is a trivial issue in `used_kernel.cpp` that will cause the compiling error when running `build_ios.sh`, as it uses a `system`API that has been deprecated since iOS 11. The fix below is to bypass this file since it's not needed by mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23293

Test Plan:
The `build_ios.sh` completed successfully, and all the generated static libraries can be compiled and linked successfully on iOS devices.

### Build script

```shell
./scripts/build_ios.sh \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

Differential Revision: D16456100

Pulled By: xta0

fbshipit-source-id: 38c73e1e3a0c219a38ddc28b31acc181690f34e8
2019-07-25 12:41:20 -07:00
071536f895 Fix the operator== for Argument (#23202)
Summary:
type() returns a shared pointer, we should compare its content, not pointer itself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23202
ghstack-source-id: 87125582

Reviewed By: zrphercule

Differential Revision: D16432634

fbshipit-source-id: 639e730dcdc1cec02f280efeea53019b36d9ae37
2019-07-25 11:59:28 -07:00
bbc53bffef AliasAnalysisKind::CONSERVATIVE/FROM_SCHEMA (#22175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22175

- Rename AliasAnalysisKind::DEFAULT to AliasAnalysisKind::CONSERVATIVE
- Introduce AliasAnalysisKind::FROM_SCHEMA that means the alias annotations of the schema should be honored
- Introduce AliasAnalysisKind::INTERNAL_SPECIAL_CASE to be able to run assertions that internal special cased ops are treated correctly

- aten:: and prim:: ops are not treated as special cases anymore, but just use AliasAnalysisKind::FROM_SCHEMA
- There's a set of assertions to ensure that aten:: and prim:: ops are all correctly set up to use AliasAnalysisKind::FROM_SCHEMA. Once this PR lands and passes all tests, we will remove those assertions and open up for the possibility of different AliasAnalysisKind settings for aten:: and prim:: ops

Differential Revision: D15929595

fbshipit-source-id: 7c6a9d4d29e13b8c9a856062cd6fb3f8a46a2e0d
2019-07-25 11:53:51 -07:00
b9202d459a Polish torch::Dict (#23344)
Summary:
torch::List recently received some polishing that now also is done for Dict. This should be done before the PyTorch 1.2 release because of backwards compatibility.

- Dict is just a reference type, so "const Dict" should have the same capabilities as "Dict", constness is not guaranteed in any way.
- DictIterator gets comparison operators <, <=, >, >=

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23344
ghstack-source-id: 87170304

Differential Revision: D16468800

fbshipit-source-id: 2978c3b9cdcfb2cfb3f26516b15bd455d9a48ba9
2019-07-25 11:35:36 -07:00
722f80a07d Align String str() with the format in FunctionSchema (#23201)
Summary:
old: string
new: str

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23201
ghstack-source-id: 87125580

Reviewed By: zrphercule

Differential Revision: D16432340

fbshipit-source-id: 56bc7e8efbc2276315f464958cf38704f75dd06e
2019-07-25 11:31:00 -07:00
7c383ba4a0 Remove superfluous check (#23370)
Summary:
This check is not needed. Even if it were, the assignment is clobbered anyway.

Closes #23300.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23370
ghstack-source-id: 87157671

Differential Revision: D16485329

fbshipit-source-id: 8ccac79e81f5e0d0d20099d550411c161f58c233
2019-07-25 11:26:16 -07:00
39de49c7ec Fix a tiny bug and refactor (#22808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22808

- Use ```size_to_dim_```.
- ```mod``` is not in the scope. Should be ```module```.

Reviewed By: mingzhe09088

Differential Revision: D16225799

fbshipit-source-id: 9a263227d2d508eefdfddfee15fd0822819de946
2019-07-25 11:26:12 -07:00
39fd264799 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23381

Differential Revision: D16496327

Pulled By: pjh5

fbshipit-source-id: 529029544a5f8c8106bcb7cebdc71aee33e3b86c
2019-07-25 10:39:37 -07:00
ed316c0ca0 Align Dict str() with the format in FunctionSchema (#23200)
Summary:
Old: Dict[int, str]
New: Dict(int, str)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23200
ghstack-source-id: 87125581

Reviewed By: zrphercule

Differential Revision: D16432159

fbshipit-source-id: a3dc5fa397697a53e78290d25e19589f757c1eb8
2019-07-25 10:27:41 -07:00
f477cab2dc Add type checks in _intrinsics.modules.fused (#23361)
Summary:
recreated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23361
ghstack-source-id: 87142339

Reviewed By: zafartahirov

Differential Revision: D16455500

fbshipit-source-id: ab2c9d10c7c025ae77f5b80f28e6bd261620a5f7
2019-07-25 09:49:01 -07:00
25e06618c8 Support parsing variadic return schema (#23199)
Summary:
all cases should be prim ops, but let's support it. it will expect variadic return schema to be prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23199
ghstack-source-id: 87113845

Differential Revision: D16431635

fbshipit-source-id: 798b6957ce5d800f7fcf981c86fdcb009cd77a78
2019-07-25 09:39:49 -07:00
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
82545ecc71 Specify build dir as a global variable in BUILD_DIR in the build system.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23318

Test Plan: Imported from OSS

Differential Revision: D16493987

Pulled By: ezyang

fbshipit-source-id: 497e9dd924280f61dde095b4f2b50f5402d9da97
2019-07-25 07:19:47 -07:00
915261c8be Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16493553

Pulled By: ezyang

fbshipit-source-id: 852f4779e70f84a4c9f7bab4c2ae4927248ffc93
2019-07-25 07:19:44 -07:00
f91b19c2aa Do not explicitly set USE_FBGEMM in tools/setup_helpers/cmake.py (#23314)
Summary:
Instead, defer its default value to CMakeLists.txt

NO_FBGEMM has already been handled in tools/setup_helpers/env.py
(although deprecated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314

Differential Revision: D16493580

Pulled By: ezyang

fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2
2019-07-25 07:11:52 -07:00
ba6f65cf33 Add document of functions nn.init.ones_/zeros_ (#23145)
Summary:
Functions `nn.init.ones_` and `nn.init.zeros_` were not documented. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23145

Differential Revision: D16427108

Pulled By: soumith

fbshipit-source-id: 4fac31e79717a436411ef5e107a829b403e576c9
2019-07-25 06:09:50 -07:00
252710262f (#22775)
Summary:
passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures
consistent fusion in nested Blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775

Differential Revision: D16439979

Pulled By: soumith

fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1
2019-07-25 05:54:03 -07:00
0c79753c0d Improve documentation for torch.enable_grad , torch.no_grad and torch.set_grad_enabled (#23310)
Summary:
Modified documentation for ` torch.enable_grad` , ` torch.no_grad` and `torch.set_grad_enabled`.

Fixes https://github.com/pytorch/pytorch/issues/19189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23310

Differential Revision: D16489626

Pulled By: soumith

fbshipit-source-id: f0926e4f51ffd97521e67bee3a16ad954458247a
2019-07-25 05:48:33 -07:00
2938299de1 Fix lint failure introduced in #23346
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23371

Differential Revision: D16489985

Pulled By: pietern

fbshipit-source-id: 914048563bbe7bf5ab897c6f12f4a1bb4bff30e1
2019-07-25 05:17:15 -07:00
0842624d50 Fix upload issue with linux libtorch nightlies (#23368)
Summary:
This is a small fix on top of gh-23348, which fixed the libtorch
nightly build timeouts.

For the latest nighly build (25 July), see
https://circleci.com/workflow-run/33d0a24a-b77c-4a8f-9ecd-5646146ce684
The only failures are these uploads, which is because `aws s3 cp`
can only deal with one file at a time. The only way to make it do
multiple files at once is:
```
aws s3 cp . "$s3_dir" --exclude "*"  --include "libtorch-*.zip" --recursive --acl public-read
```
which is much more verbose. executing one `cp` per file should be fine,
and this is also what's done in `binary_macos_upload.sh`

Closes gh-23039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23368

Differential Revision: D16488853

Pulled By: soumith

fbshipit-source-id: 6dc04b4de2f6cd2de5ae9ad57a6e980f56896498
2019-07-25 04:52:43 -07:00
95e822622b Enhance interpretation of GLOO_SOCKET_IFNAME (#22978)
Summary:
With this change you can now list multiple interfaces separated by
comma. ProcessGroupGloo creates a single Gloo context for every device
in the list (a context represents a connection to every other
rank). For every collective that is called, it will select the context
in a round robin fashion. The number of worker threads responsible for
executing the collectives is set to be twice the number of devices.

If you have a single physical interface, and wish to employ increased
parallelism, you can also specify
`GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`.  This makes ProcessGroupGloo
use 4 connections per rank, 4 I/O threads, and 8 worker threads
responsible for executing the collectives.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978
ghstack-source-id: 87006270

Differential Revision: D16339962

fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1
2019-07-25 04:52:38 -07:00
1dd4d55565 Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292)
Summary:
Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231
To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292

Differential Revision: D16488773

Pulled By: soumith

fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a
2019-07-25 04:42:26 -07:00
e8ad167211 Remove usage of FOLDER constant in test_distributed.py (#23223)
Summary:
This is step 1 in trying to get rid of constants that are set prior to
executing the test runner. All setup logic should be concentrated in
the setupClass() function of the TestCase subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23223
ghstack-source-id: 87005260

Reviewed By: zhaojuanmao

Differential Revision: D16439147

fbshipit-source-id: 7a929ad4b1c8e368e33d1165becbd4d91220882c
2019-07-25 02:55:30 -07:00
711be82951 Make optimize a thread_local flag
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170

Test Plan: Imported from OSS

Differential Revision: D16441912

Pulled By: suo

fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55
2019-07-24 23:09:21 -07:00
b3980f46a2 Replace uint8 with int8 in Linear and LSTM quantization path (#23347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23347

This diff replaces uint8 with int8 to match with the underlying kernel implementation.  When we do int8 quantization,  we are computing with uint8 (input activation) * int8 (weight) -> uint8 (output activation). The weight is quantized into int8.

Reviewed By: jianyuh

Differential Revision: D16469435

fbshipit-source-id: a697655b0e97833fc601e5980970aec4dba53c39
2019-07-24 22:25:12 -07:00
fba325be34 Try kill -9ing apt-get (#23354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23354

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16474254

Pulled By: ezyang

fbshipit-source-id: 0dd7ce02e1aa1a42a24d2af066ebd0ac5206c9a0
2019-07-24 19:27:10 -07:00
ff3cc795c8 Build binaries with local version string specifying CUDA version (#23325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23325

Fixes #19990

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16473826

Pulled By: ezyang

fbshipit-source-id: 466db2c22fabd7b574f0a08aec67a18318ddb431
2019-07-24 18:32:32 -07:00
cf0f3556f6 Enabled cumsum and cumprod for bool tensors (#23346)
Summary:
```
a = torch.tensor([[True, False, True],
                  [False, False, False],
                  [True, True, True]])

>>> torch.cumsum(a, 0)
tensor([[1, 0, 1],
        [1, 0, 1],
        [2, 1, 2]])

>>> torch.cumsum(a, 1)
tensor([[1, 1, 2],
        [0, 0, 0],
        [1, 2, 3]])
```

Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23346

Differential Revision: D16469393

Pulled By: izdeby

fbshipit-source-id: b55f3ca0588f9961a771def40f6ef58932021e1a
2019-07-24 18:16:01 -07:00
c9312e1a8b Open source 3D depthwise conv (#23164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23164

for open source CSN model

Reviewed By: weiyaowang

Differential Revision: D16412312

fbshipit-source-id: bb4e7748e697271563f974ca05878f8832d83653
2019-07-24 17:56:56 -07:00
73dee44ec5 Specifying libtorch variant in libtorch builds (#23348)
Summary:
This should fix all libtorch timeout issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23348

Differential Revision: D16472782

Pulled By: pjh5

fbshipit-source-id: b1f3a783d044eb881f6e8755e0c891093e93c93e
2019-07-24 17:31:43 -07:00
3297d8e203 Switch keys to be sequential and stable in pickle serialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23280

Test Plan: Imported from OSS

Differential Revision: D16452816

Pulled By: zdevito

fbshipit-source-id: e143780b8e834298a575ac76d49576df94fbe27b
2019-07-24 17:13:51 -07:00
93da1030df Fix pickler bug where it would not load if no tensors were saved
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23263

Test Plan: Imported from OSS

Differential Revision: D16446928

Pulled By: zdevito

fbshipit-source-id: f70f86b28c3901a97b65b4d7654e39dc6e1aab6a
2019-07-24 17:13:46 -07:00
7922b5057d Memoize storages in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23262

Test Plan: Imported from OSS

Differential Revision: D16446927

Pulled By: zdevito

fbshipit-source-id: 92d26f64ff6269b1deef821edae31745158b5137
2019-07-24 17:13:42 -07:00
71a047c3e3 Unwrap DataParallel automatically (#23334)
Summary:
Handle DataParallel for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23334

Differential Revision: D16467844

Pulled By: houseroad

fbshipit-source-id: 696aeada437c6c0612ac4ef9c4d51e3386625de0
2019-07-24 16:29:48 -07:00
c23ba35009 Skip QNNpack tests on ppc64le (where support is not enabled) (#23343)
Summary:
Proposed PR for
https://github.com/pytorch/pytorch/issues/23342

Disables execution of QNNpack tests if IS_PPC.
Basically this parallels the same skipping of tests for IS_WINDOWS as well, which is already present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23343

Differential Revision: D16469218

Pulled By: soumith

fbshipit-source-id: 80b651d00e5d413e359cf418f79e20d74cd9c8e1
2019-07-24 15:24:00 -07:00
22c169fb9c Improve the error message for ONNX export (#23317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23317

Print out the kind type when fail to export

Reviewed By: zrphercule

Differential Revision: D16462641

fbshipit-source-id: 27157c0bd597362f90ac8cfb33e1808bac0ec48b
2019-07-24 15:03:05 -07:00
87d3f66506 max_pool_with_indices: return valid indices if all input elements are -inf (#23161)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20465.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23161

Differential Revision: D16442672

Pulled By: ezyang

fbshipit-source-id: 8c2ee13acd73954c7307720c01c732f460266a63
2019-07-24 14:51:39 -07:00
b7d90332ea add notes about overshoot in bicubic mode (#23321)
Summary:
fix https://github.com/pytorch/pytorch/issues/21044

Bicubic interpolation can cause overshoot.

Opencv keeps results dtype aligned with input dtype:
- If input is uint8, the result is clamped [0, 255]
- If input is float, the result is unclamped.

In Pytorch case, we only accept float input, so we'll keep the result unclamped, and add some notes so that users can explicitly call `torch.clamp()` when necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23321

Differential Revision: D16464796

Pulled By: ailzhang

fbshipit-source-id: 177915e525d1f54c2209e277cf73e40699ed1acd
2019-07-24 14:46:37 -07:00
d522b3ca58 BlackBoxPredictor OSS part N: ThreadLocalPtr, InferenceGraph (#23257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23257

Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: zrphercule

Differential Revision: D16428124

fbshipit-source-id: b35deada5c015cd97b91ae12a7ea4aac53bd14b8
2019-07-24 14:35:30 -07:00
2915d53096 Move OptionalType wrapping out of constants.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23234

Pulled By: driazati

Differential Revision: D16460880

fbshipit-source-id: d4e6b747615dbfe73a92ce571d3b2aaae7179f1b
2019-07-24 14:35:26 -07:00
48ca64dbf7 Better error for compiling a module type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23312

Pulled By: driazati

Differential Revision: D16461299

fbshipit-source-id: 11e56c44d561c3fbf70a96c22c5fd494eea0cf19
2019-07-24 14:24:50 -07:00
d6dcec37b6 Add docs about prod ecosystem features (#23010)
Summary:
Covering fleet-wide profiling, api logging, etc.

It's my first time writing rst, so suggestions are definitely welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23010

Differential Revision: D16456721

Pulled By: dzhulgakov

fbshipit-source-id: 3d3018f41499d04db0dca865bb3a9652d8cdf90a
2019-07-24 14:15:33 -07:00
mal
87482bb15a Remove torch::autograd::Node::get_shared_ptr()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23307

Test Plan: Imported from OSS

Differential Revision: D16460972

fbshipit-source-id: 0678627e05dd4c69c4dafa6b717db5cd1a531f56
2019-07-24 13:50:47 -07:00
8fdbe1e10b Support LSTM with FP16 weight (#23291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23291

This diff implements LSTM with FP16 weights based on FBGEMM.

At a high level, here are the steps:
1. Quantize and pack weight in every layer of LSTM
2. Pass weights from step 1 to the ATen `quantized_lstm` function which does matrix multiplication with FP16 weight. The following code shows the dtype of each variable used in MM:
Y  =   X * W + B
(fp32, fp32, fp16, fp32)

Reviewed By: jianyuh

Differential Revision: D16389595

fbshipit-source-id: c26ae4e153c667a941f4af64e9d07fc251403cee
2019-07-24 12:40:11 -07:00
1f608d09cf Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA"
Differential Revision:
D16440000

Original commit changeset: e05683275522

fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914
2019-07-24 12:05:36 -07:00
1a9edfcd36 Prevent xla-build from clobbering pytorch_linux_trusty_py3_6_gcc5_4_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23304

Test Plan: Imported from OSS

Differential Revision: D16459166

Pulled By: zou3519

fbshipit-source-id: 8f5c35ebf1fe6e86705e7fb4fff4c720bcd8f97b
2019-07-24 11:58:44 -07:00
0c0ffccbb6 deepCopy also copies type information of lists (#23271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23271
ghstack-source-id: 87088503

Differential Revision: D16449220

fbshipit-source-id: 551b7cef8f6d0d2d5a56b24ddbe2e0bb2c0c3dbe
2019-07-24 11:53:01 -07:00
895e79adf1 Revert D16441000: Switch from KaTeX to imgmath for documentation rendering.
Differential Revision:
D16441000

Original commit changeset: c1ab557cb816

fbshipit-source-id: cbfec2ca648b614b291debd6b3e215db9fbeb57b
2019-07-24 11:43:17 -07:00
94711d7471 Quantized conv avoid functional usage (#22733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22733

This refactor changes the conv module to avoid the usage of the functional ops.

Reviewed By: jerryzh168

Differential Revision: D15835572

fbshipit-source-id: f2294cd708fbe8372eb3a15cc60d83777d4f7029
2019-07-24 11:43:12 -07:00
67179d71f7 Reduce number of processes spawned for gloo_test.TestCase.test_forked_cw (#23221)
Summary:
It used to be run with comm_size=8, which causes flaky results in a
stress run. The flakiness was caused by too many listening sockets
being created by Gloo context initialization (8 processes times 7
sockets times 20-way concurrency, plus TIME_WAIT).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23221
ghstack-source-id: 86995596

Reviewed By: d4l3k

Differential Revision: D16437834

fbshipit-source-id: 998d0e2b087c0ab15eca64e308059c35e1b51e7b
2019-07-24 11:31:20 -07:00
3ed79f4b6c Fix argument names in torch doc (#22973)
Summary:
I manually went through all functions in `torch.*` and corrected any mismatch between the arguments mentioned in doc and the ones actually taken by the function. This fixes https://github.com/pytorch/pytorch/issues/8698.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22973

Differential Revision: D16419602

Pulled By: yf225

fbshipit-source-id: 5562c9b0b95a0759abee41f967c45efacf2267c2
2019-07-24 11:22:45 -07:00
eb51131fb4 Revert D16423217: [pytorch][PR] Update sleef to master, fixes #20535
Differential Revision:
D16423217

Original commit changeset: 587de3f10e83

fbshipit-source-id: 466e56eab73ce669cc179d08b7f39d2c8b0ffb34
2019-07-24 11:10:15 -07:00
803af9988c Fix the problem in parseOctal and throw exception if use \xhh to specify hex value (#23198)
Summary:
follow the rules:
1) https://docs.python.org/2.0/ref/strings.html
2) https://en.cppreference.com/w/cpp/language/escape
didn't find anything about the format \h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23198
ghstack-source-id: 86986691

Reviewed By: zrphercule

Differential Revision: D16431215

fbshipit-source-id: 7b342708d1984e08b3cbbcf6d487623f1dc63a14
2019-07-24 10:41:59 -07:00
b9a7fc529a Suppress warnings in tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23264

Pulled By: driazati

Differential Revision: D16449965

fbshipit-source-id: ff7d6ddf2275e5f44883a19b6a24c6beaa42ccf4
2019-07-24 10:36:46 -07:00
200cb836f1 Enabled 'add_cuda' for bool and fixed alpha scalar bug (#23044)
Summary:
Enabled 'add_cuda' for bool
Tested via unit tests

Fixed https://github.com/pytorch/pytorch/issues/22431 #22430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23044

Differential Revision: D16368095

Pulled By: izdeby

fbshipit-source-id: 033d28095ff1c5df4078905c52782cf4cf9ed6b0
2019-07-24 10:31:34 -07:00
fbf28b5458 Change TensorIterator to be stack allocated, using named return value optimization to elide copies.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22519

Differential Revision: D16451460

fbshipit-source-id: 6ca6ae2fdf1af5a2f792b42e55279413971b3c46
2019-07-24 10:22:19 -07:00
7203612f85 Update sleef to master, fixes #20535 (#23168)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20535

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23168

Differential Revision: D16423217

Pulled By: ezyang

fbshipit-source-id: 587de3f10e839b94f51f673741b5fda8849e32f6
2019-07-24 08:18:14 -07:00
fd1d06e317 Let Python build scripts accept both CMAKE_BUILD_TYPE and the oldschool DEBUG and REL_WITH_DEB_INFO variables. (#22875)
Summary:
Currently the build type is decided by the environment variable DEBUG
and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be
effective. This makes the interface more consistent with CMake. This
also prepares https://github.com/pytorch/pytorch/issues/22776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875

Differential Revision: D16281663

Pulled By: ezyang

fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026
2019-07-24 08:07:47 -07:00
aa660b8eb7 Re-land "Fix error message for a wrong fork CUDA" (#23209)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209

Differential Revision: D16440000

Pulled By: zhangguanheng66

fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98
2019-07-24 07:01:04 -07:00
4cd726c7b3 Update ROCm CI to python3.6 (#23088)
Summary:
Rehash of https://github.com/pytorch/pytorch/issues/22322 .

Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088

Differential Revision: D16448261

Pulled By: bddppq

fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8
2019-07-23 23:07:45 -07:00
91bef6c168 format sugared_value & compiler.cpp (#23283)
Summary:
there are a lot of formatting changes which makes other diffs to these PRs noisy & hard to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23283

Differential Revision: D16453590

Pulled By: eellison

fbshipit-source-id: 97b4bf1dbbbfb09c44c57402f61ea27287060044
2019-07-23 22:29:22 -07:00
bc15a20db9 Minor refactor: propagating messages in TestCase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23146

Test Plan: Imported from OSS

Differential Revision: D16413801

Pulled By: zafartahirov

fbshipit-source-id: 8009a7afe77e127ddd220fb71c6c272b0d44c733
2019-07-23 21:18:44 -07:00
8a77098247 Make Module::register_module / register_parameter / register_buffer public (#23196)
Summary:
In Python, `register_module` / `register_parameter` / `register_buffer` method in `nn.Module` is public. This PR makes those APIs public for C++ `nn::Module` as well. Closes https://github.com/pytorch/pytorch/issues/23140.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23196

Differential Revision: D16440239

Pulled By: yf225

fbshipit-source-id: e0eff6e1db592961fba891ec417dc74fa765e968
2019-07-23 21:18:41 -07:00
24601daa12 Adding check for a single batch in adaptive_avg_pool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23137

Test Plan: Imported from OSS

Differential Revision: D16403804

Pulled By: zafartahirov

fbshipit-source-id: df79a8c768ffabeceb4c0044c967a623c5885484
2019-07-23 21:11:06 -07:00
mal
e7a9b0d62f Rename torch::autograd::Function to torch::autograd::Node
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269

Test Plan: Imported from OSS

Differential Revision: D16454878

fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af
2019-07-23 20:52:22 -07:00
0ab19d66ee Port lu_solve to ATen (#22379)
Summary:
Changelog:
- Port TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Remove TH/THC implementations
- Update doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379

Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists)

Differential Revision: D16089645

Pulled By: zou3519

fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8
2019-07-23 19:11:35 -07:00
2197bee3da add cudnn.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23265

Differential Revision: D16453611

Pulled By: bddppq

fbshipit-source-id: b49e01b6d019781097ec5072cdc6d37a2988bfbe
2019-07-23 18:09:23 -07:00
a936a90391 caffe2/caffe2/fb/operators/cc_amrc: drop SIMD OpenMP vectorization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23235

Reviewed By: ajtulloch

Differential Revision: D16384612

Pulled By: luciang

fbshipit-source-id: a4c8257c6d3e151ba99167a152ad824b0dde7671
2019-07-23 17:25:00 -07:00
7ed9622fdf Read number of workspaces from argument in recurrent_network_op (#23272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23272

We see significant performance improvements by limiting concurrency
at caffe2 level on mobile. This diff enables setting the number of caffe2
workspaces used during rnn inference.

Reviewed By: akyrola

Differential Revision: D16448611

fbshipit-source-id: 28abaddb4ea60bacb084ceb28cb7a4d1e67ccc17
2019-07-23 17:19:40 -07:00
a35136dd73 Add support for onnx tensor index export (#21716)
Summary:
Support exporting
* Standard tensor indexing like
```
x = torch.ones(4, 5)
ind = torch.tensor([0, 1])

return x[ind]
```
* [Advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing) like
```
x = torch.ones(4,5,6,7,8)
ind1 = torch.tensor([0, 1])
ind2 = torch.tensor([[3], [2]])
ind3 = torch.tensor([[2, 2], [4, 5]])

return x[2:4, ind1, None, ind2, ind3, :]
```
It would be ideal if ONNX can natively support indexing in future opsets, but for opset <= 10 it will always need this kind of workarounds.

There are still various limitations, such as not supporting advanced indexing with negative indices, not supporting mask indices of rank > 1, etc. My feeling is that these are less common cases that requires great effort to support using current opset, and it's better to not make the index export more cumbersome than it already is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21716

Reviewed By: zrphercule

Differential Revision: D15902199

Pulled By: houseroad

fbshipit-source-id: 5f1cc687fc9f97da18732f6a2c9dfe8f6fdb34a6
2019-07-23 17:11:28 -07:00
1de44a6f54 fix specialized list from dict keys (#23267)
Summary:
Previously we weren't specializing the list returned from `dict.keys()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23267

Differential Revision: D16448512

Pulled By: eellison

fbshipit-source-id: fcd2a37ac680bdf90219b099a94aa36a80f4067c
2019-07-23 17:02:19 -07:00
a6ccd62a81 BlackBoxPredictor OSS part 5: glow transforms
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: bertmaher

Differential Revision: D16367134

fbshipit-source-id: fc6bacc1be3ff6336beb57cdad58168d3a2b8c28
2019-07-23 16:39:23 -07:00
bdb1e1305d exclude some caffe2 modules from libtorch mobile build (#20000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20000
ghimport-source-id: f47773ef1c6849cd0c0e65080400416c6b370d39

Test Plan:
- verified libtorch mobile library builds and links successfully;

Imported from OSS

Differential Revision: D15169024

Pulled By: ljk53

fbshipit-source-id: 20ac89c6e7053239c93e51f00c5c5dc3595bea74
2019-07-23 16:20:27 -07:00
1c0309a9a9 make OMP_NUM_THREADS default in launch.py (#22501)
Summary:
per https://github.com/pytorch/pytorch/issues/22260, default number of open mp threads are spawned to be the same of number of cores available, for multi processing data parallel cases, too many threads may be spawned and could overload the CPU, resulting in performance regression.

so set OMP_NUM_THREADS = number of CPU processors/number of processes in default to neither overload or waste CPU threads
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22501

Test Plan:
1. without and with this change, example codes result in same result
      python ~/local/fbsource-fbcode/fbcode/caffe2/torch/distributed/launch.py --nproc_per_node=2 pytorch/examples/yanlizhao/distributed_launch_example.py

  Setting OMP_NUM_THREADS environment variable for each process to be: 24, which
  is max(1, num_cpus / num_processes), you can further tune the variable for optimal performance in your application if needed.
  final loss =  tensor(0.5211, device='cuda:0', grad_fn=<MseLossBackward>)

Differential Revision: D16092225

Pulled By: zhaojuanmao

fbshipit-source-id: b792a4c27a7ffae40e4a59e96669209c6a85e27f
2019-07-23 16:14:24 -07:00
058645acb1 Fusion and _intrinsic modules (#23003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003

torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu

Fusion function to combine specific modules: (conv,bn) and  (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.

TODO: Add tests for _intrinsic modules.

Conv BN fusion code is based on DsKhudia's implementation

Differential Revision: D16199720

fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
2019-07-23 14:54:19 -07:00
7b229342ca Renamed CosineAnnealingLr to CosineAnnealingLR (#23242)
Summary:
fixing https://github.com/pytorch/pytorch/issues/23160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242

Differential Revision: D16443348

Pulled By: pbelevich

fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070
2019-07-23 14:54:15 -07:00
8d4956fd02 hook up dropout sparse with replacement operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23183

Reviewed By: ffjiang

Differential Revision: D16428262

fbshipit-source-id: 0d6e17d15c898629bbd2826441f2c9701a78b0bd
2019-07-23 14:34:25 -07:00
6f01d13728 Implement dropout with replacement for id list features. (#22880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22880

Implement sparse dropout with replacement value.

Reviewed By: xianjiec

Differential Revision: D16267012

fbshipit-source-id: 8c4878230f61bb3ac333291e2c6aaf2fbdc5f9ce
2019-07-23 14:34:21 -07:00
e0f632c58b pickler.cpp: respect __getstate__/__setstate__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23190

Test Plan: Imported from OSS

Differential Revision: D16431553

Pulled By: zdevito

fbshipit-source-id: 680ea1507c12727fd17aedb3067f522cf490e306
2019-07-23 14:27:51 -07:00
bae10db522 Incorporating arguments to pull production operators and adding device type. (#23197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197

Incorporating arguments to pull production operators and adding device type.

Reviewed By: mingzhe09088

Differential Revision: D16387263

fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf
2019-07-23 13:43:26 -07:00
d8220b0599 add simple inheritance support to AST
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23109

Test Plan: Imported from OSS

Differential Revision: D16441914

Pulled By: suo

fbshipit-source-id: 18a57762d376759b98c18bc160eacbcc99f78ee9
2019-07-23 12:21:27 -07:00
017870a633 kill module_lookup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23097

Test Plan: Imported from OSS

Differential Revision: D16383329

Pulled By: suo

fbshipit-source-id: 282f8bac2245d584b66139daf4e5ea7b2b317295
2019-07-23 12:21:23 -07:00
2a37740a86 make RHS of assignment optional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23033

Test Plan: Imported from OSS

Differential Revision: D16383330

Pulled By: suo

fbshipit-source-id: 63c55fae06f0cd534eb5053f91a773431ad052d4
2019-07-23 12:21:19 -07:00
3be0a2b4be Parse all stmts in class defs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23031

Test Plan: Imported from OSS

Differential Revision: D16383327

Pulled By: suo

fbshipit-source-id: 6485109a66e653b7f26d30b91a97af8d71594e22
2019-07-23 12:21:15 -07:00
0dabaad819 Add Module::replace_module to C++ api (#22546)
Summary:
This adds a replace_module method to the C++ api. This is needed to be able to replace modules.

The primary use case I am aware of is to enable finetuning of models.
Given that finetuning is fairly popular these days, I think it would be good to facilitate this in the C++ api as well.

This has been reported by Jean-Christophe Lombardo on the [forums](https://discuss.pytorch.org/t/finetuning-a-model-on-multiple-gpu-in-c/49195).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22546

Differential Revision: D16440289

Pulled By: yf225

fbshipit-source-id: c136f914b8fc5c0f1975d877ea817fda5c851cda
2019-07-23 11:50:06 -07:00
f112c522af LinearReLU module (#23022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23022

will be tested in later diffs.

Added LinearReLU module for qat, allows conversion from torch.nn._intrisic.LinearReLU to torch.nn._intrinsic.qat.LinearReLU

Reviewed By: zafartahirov

Differential Revision: D16286800

fbshipit-source-id: 84cce3551d46e649781b9b6107d4076e10e51018
2019-07-23 11:17:25 -07:00
192dd8faf1 Set correct list type in pybind_utils (#23188)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23188
ghstack-source-id: 87008828

Differential Revision: D16430911

fbshipit-source-id: 9d9d29bf42402e0fff323dfd0ed65fcfd5564fd3
2019-07-23 10:52:38 -07:00
19be7ece15 Fix erase_number_types test (#23181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23181

We can't run dead code elimination after erasing number types because dce relies on graph invariants that erase_number_types breaks.

Reviewed By: houseroad

Differential Revision: D16427819

fbshipit-source-id: d1b98a74d2558b14d4be692219691149689a93d8
2019-07-23 10:23:10 -07:00
e56f11b750 Fix onnx export (#23180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23180

This pass needs to be run later because it breaks jit graph invariants and the lower_all_tuples pass still needs a valid jit graph.

Reviewed By: houseroad

Differential Revision: D16427680

fbshipit-source-id: 427c7e74c59a3d7d62f2855ed626cf6258107509
2019-07-23 10:23:06 -07:00
60afcabc6f DictConstruct sets correct types (#23171)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23171
ghstack-source-id: 87009037

Differential Revision: D16423640

fbshipit-source-id: 0f4f9b12759b8a9defaae775e33e2b0af9bb7791
2019-07-23 10:23:01 -07:00
67aede98c3 Exclude unused onnx targets (#23195)
Summary:
e.g. onnxifi_dummy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23195

Differential Revision: D16441493

Pulled By: bddppq

fbshipit-source-id: 76816e7a7c73f60f3c7abea10fbdbf086cea0476
2019-07-23 10:22:57 -07:00
9d03133c14 ListConstruct sets correct element type (#23189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23189
ghstack-source-id: 86971099

Differential Revision: D16430987

fbshipit-source-id: 9af255075b670e6f811e1a9d104f2738a38e9515
2019-07-23 10:14:35 -07:00
2073cc73f8 Use concrete types in jit test for generic lists (#23192)
Summary:
Creating an untyped generic list is deprecated, we always want type information to be present.

This fixes test cases and removes one that used lists with ambigious types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23192
ghstack-source-id: 86972891

Differential Revision: D16431482

fbshipit-source-id: 4ca5cd142118a3f0a4dcb8cd77383127c54abb29
2019-07-23 10:04:12 -07:00
21f52ce0d4 Remove trailing semicolon from TORCH_CHECK macros.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22339

Test Plan: Imported from OSS

Differential Revision: D16182743

Pulled By: ezyang

fbshipit-source-id: 3c4ac0abe49ce83901bd5b07279a135857035f80
2019-07-23 09:58:50 -07:00
174f7a586f Switch from KaTeX to imgmath for documentation rendering.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23025

Test Plan: Imported from OSS

Differential Revision: D16441000

Pulled By: ezyang

fbshipit-source-id: c1ab557cb8163e9c69585c32d237c076582a6d73
2019-07-23 09:44:37 -07:00
792d527746 Fix typos in comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23130

Differential Revision: D16402755

Pulled By: ezyang

fbshipit-source-id: 8bf9767c0012aed8ad91289bbaf2d979f130d728
2019-07-23 09:44:33 -07:00
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
e4b75c6580 Fix typo in dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132

Differential Revision: D16402759

Pulled By: ezyang

fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5
2019-07-23 08:45:47 -07:00
45d3f495ef Add document of function torch.as_strided (#22842)
Summary:
Documentation of `torch.as_strided` and `Tensor.as_strided` is missing. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22842

Differential Revision: D16254106

Pulled By: soumith

fbshipit-source-id: dee142483fb9ef7bea84bd44a970b6eccdcdc471
2019-07-23 06:06:00 -07:00
c9e62f6988 Update nccl to 2.4.8-1 (#23186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23186

Differential Revision: D16438723

Pulled By: soumith

fbshipit-source-id: ff4f5b9c7383b92e5cf2053a87caf2ac11be7aeb
2019-07-23 05:35:32 -07:00
9a6ae5c0b1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: aab8ded966d718befb664a6e968eedc6bbe7cb5e
2019-07-22 22:47:52 -07:00
d7448c7812 quantized conv module (#23178)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23178
ghstack-source-id: 86973164

Differential Revision: D16426871

fbshipit-source-id: a2ebb38997acfeb61b7dfd6b11dd8ee9b3a7a8ed
2019-07-22 20:47:40 -07:00
f3a37278cc ConvReLU2d module (#23008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23008

Added ConvReLU2d module to convert from nn._intrinsic.ConvReLU2d to nn._intrinsic.qat.ConvReLU2d

Differential Revision: D16286670

fbshipit-source-id: 2903d825175911c0095497369f313bf2a2eb3833
2019-07-22 20:47:36 -07:00
eb5137a5d1 Export torch.arange to ONNX (#22601)
Summary:
Some overlap with https://github.com/pytorch/pytorch/pull/21716 regarding caffe2 nonzero. Will rebase the other one accordingly whichever gets merged first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22601

Reviewed By: zrphercule

Differential Revision: D16224660

Pulled By: houseroad

fbshipit-source-id: dbfd1b8776cb626601e0bf83b3fcca291806e653
2019-07-22 20:30:39 -07:00
06d11f0434 Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA
Differential Revision:
D16368004

Original commit changeset: 44b6977790ce

fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb
2019-07-22 18:46:48 -07:00
3861520603 Verify flatten works for quantized Tensor (#23121)
Summary:
Added a test in `test_torch.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23121
ghstack-source-id: 86983227

Differential Revision: D16391409

fbshipit-source-id: 04e72b2f753a0a6ddbf58d55b794e443b18a2156
2019-07-22 18:34:25 -07:00
a24f6c13a3 Fix broken indexing when using None and ellipses indexing together (#22905)
Summary:
https://github.com/pytorch/pytorch/issues/20153

I believe you need 2 passes for this. Take this example
```python
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None].shape
```
which results in `[10, 9, 8, 7, 6, 1, 1]`
vs
```
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None, :].shape
```
which results in `[10, 9, 8, 7, 1, 1, 6]`
After only processing `x[..., None, None` we don't know whether we should be creating a new dimension at the end of the dimension list or somewhere in the middle. What we do depends on the elements to the right of it.

Thus, I do 2 passes - one to collect all the dimensions that the index operations operate on, and another that executes the index operations.

This still doesn't work for an ellipse index followed by a tensor index, but it wasn't working previously either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22905

Differential Revision: D16433558

Pulled By: Chillee

fbshipit-source-id: c1b303cb97b1af8b6e405bad33495ef3b4c27c4a
2019-07-22 18:11:23 -07:00
648f10be16 Fix load op to return the shape info as before when loading multiple blobs (#23182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23182

This fixes the issue seen in D16390551

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16428140

fbshipit-source-id: 8ef6705ab2efb346819489e1f166e23269f7ef8a
2019-07-22 15:53:40 -07:00
1c574458b0 nn_quantized test (#23169)
Summary:
- scale/zero_point in quantized modules should be Tensor
- fix conv module permutation API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23169
ghstack-source-id: 86956383

Reviewed By: zafartahirov

Differential Revision: D16423570

fbshipit-source-id: d29498e07bdd8f71a33b4e16e089f80847bbca6d
2019-07-22 15:53:36 -07:00
e08f8f45ff Turning on fbgemm for nightlies (#22784)
Summary:
fbgemm requires a AVX512 which requires a more recent compiler, so this also switches all the nightlies from devtoolset3 to devtoolset7. Since CUDA 9.0 doesn't support devtoolset7, we also switch from CUDA 9.0 to CUDA 9.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22784

Differential Revision: D16428165

Pulled By: pjh5

fbshipit-source-id: c1af3729d8edce88a96fa9069d4c5a1808c25f99
2019-07-22 15:09:11 -07:00
a6e45a69a8 Fix error message for a wrong fork CUDA (#23030)
Summary:
Fix https://github.com/pytorch/pytorch/issues/17357
Unblock 1.2 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030

Differential Revision: D16368004

Pulled By: zhangguanheng66

fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288
2019-07-22 15:04:32 -07:00
3ca7c0ffdb Add get_accessed_features function to ModelLayer class (#23036)
Summary:
We need a way to figure get a complete list fo features that are used in training a model.  One way to do this is to make it possible to get the list of features used in each Model Layer.  Then once the model is complete we can go through the layers and aggregate the features.

I've introduced a function to expose that information here, get_accessed_features, and implemented it in the FeatureSparseToDense layer to start with.

I've tried to include the minimum amount of information to make this useful, while making it easy to integrate into the variety of model layers.  This is, for example, why AccessedFeatures does not contain feature_names which is not always present in a model layer.  I debated whether or not to include feature_type, but I think that's useful enough, and easy enough to figure out in a model layer, that it's worth including.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23036

Test Plan:
Added a unit test to verify the behavior of get_accessed_features in FeatureSparseToDense.

aml_dper2-fblearner-flow-integration-tests failed due to a known issue D16355865
aml_dper3-fblearner-flow-integration-tests failed due to a known issue T47197113

I verified no tests in the integration tests failed to issues other than those known ones.

DPER2 canaries: https://fburl.com/fblearner/1217voga

Reviewed By: volkhin

Differential Revision: D16365380

Pulled By: kevinwilfong

fbshipit-source-id: 2dbb4d832628180336533f29f7d917cbad171950
2019-07-22 15:04:28 -07:00
ff23a02ac4 Pin numba to 0.44.0 to fix Windows CI.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23176

Test Plan: Imported from OSS

Differential Revision: D16426873

Pulled By: ezyang

fbshipit-source-id: 10d800db78416137504c396711dc45109f6f5ca4
2019-07-22 14:59:15 -07:00
b6d06d5496 Remove empty THCThreadLocal{.h/.cpp} (#23157)
Summary:
These files were removed from the build process and cleaned in https://github.com/pytorch/pytorch/pull/9735.

Closes https://github.com/pytorch/pytorch/issues/22572
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23157

Differential Revision: D16426819

Pulled By: soumith

fbshipit-source-id: aa01aec9fe0e3af456ba8b75ae85d0b1df2a8ed9
2019-07-22 14:59:11 -07:00
fdfc676eb6 Invert ownership between PyFunction and THPFunction.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22983

Test Plan: Imported from OSS

Differential Revision: D16422209

Pulled By: ezyang

fbshipit-source-id: d6e41a1606484fbbd7a95a547b83a4199151be68
2019-07-22 14:13:14 -07:00
ae5b52086e Support converting Python number to IValue in pybind_utils.h (#22817)
Summary:
I ran into the following error when trying to pass a Python int as an arg to `torch::jit::createStackForSchema`, and I think it is due to the missing support for `NumberType` in [toIValue](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/pybind_utils.h#L448).

> RuntimeError: Missing cases in toIValue for type: Scalar! File a bug report. (toIValue at ../torch/csrc/jit/pybind_utils.h:449)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22817

Differential Revision: D16276006

Pulled By: mrshenli

fbshipit-source-id: 7f63519bb37219445e836ec1f51ca4f98bf52c44
2019-07-22 14:01:30 -07:00
2becbd3faa BlackBoxPredictor OSS part 4: Open-source other transforms (#23099)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23099

Test Plan:
salex@devvm4218:caffe2 { (fcdaf96|HISTEDIT)}$ submit_canary --q tw_adindexer_canary_on_canary_tier && submit_canary --q tw_adfinder_canary_on_canary_tier && submit_canary prospector_repl
ay_canary
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851419/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717789681292057
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GBYe_ANnNNBnbWsDAAAAAABJPvJBbjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851536/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717806884923980
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GArl_QPncP7tc30IAAAAAACfza93bjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851661/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717823090263325
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GNcyAwRrfFd0MIUIAAAAAABLOINibjEQAAAz

Differential Revision: D16288332

Pulled By: salexspb

fbshipit-source-id: 95899dede6b11a2ae14703b9aaea8e1a677f0aaa
2019-07-22 13:53:43 -07:00
27031dccb2 Updating producer_version in exported ONNX models to pytorch 1.2. (#23120)
Summary:
Bumping up the producer_version in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23120

Reviewed By: zrphercule

Differential Revision: D16420917

Pulled By: houseroad

fbshipit-source-id: 6686b10523c102e924ecaf96fd3231240b4219a9
2019-07-22 13:45:39 -07:00
7e31c02afe Fixed deprecated use of yaml.load
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22985

Test Plan: Imported from OSS

Differential Revision: D16425112

Pulled By: Chillee

fbshipit-source-id: ef0c764c3fd2518b9284d9a20e84d677ebd8f277
2019-07-22 13:25:27 -07:00
76291829ba Refactor named inference rule for reductions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23075

Test Plan: Imported from OSS

Differential Revision: D16419173

Pulled By: zou3519

fbshipit-source-id: 187639b563336f935e5f06351dd0b680de1aadfd
2019-07-22 13:12:03 -07:00
b4b51ed5ec Implement tensor.size(Dimname), tensor.stride(Dimname)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22989

Test Plan: Imported from OSS

Differential Revision: D16364437

Pulled By: zou3519

fbshipit-source-id: 393a93fecac27b5d3b1a7f7692590d8fd5e95a5d
2019-07-22 13:11:59 -07:00
965b97f5f0 Bidirectional GRU and LSTM C++ API forward fix (#22850)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/17998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22850

Differential Revision: D16420854

Pulled By: pbelevich

fbshipit-source-id: 76f38be40d8479fb9cafba92939cea61d81fd336
2019-07-22 12:59:47 -07:00
e5797e9350 Revert D16390551: Fix load op to return the shape info as before when loading multiple blobs
Differential Revision:
D16390551

Original commit changeset: 1055b481a7a9

fbshipit-source-id: ea50a71e3d446a74bd04d9945710cc4ccee63c87
2019-07-22 12:48:14 -07:00
fcdfc35d1c Support get/setstate with no args (#23119)
Summary:
`pickle` supports this and a lot of the quantized use cases for get/set
state follow this pattern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23119

Pulled By: driazati

Differential Revision: D16391234

fbshipit-source-id: 9f63e0a1679daa61b17aa64b5995e2be23b07b50
2019-07-22 12:32:29 -07:00
858d4a6a04 Cleanup API and remove 'experimental' warning (#23000)
Summary:
This fixes ASAN test issues with https://github.com/pytorch/pytorch/pull/21786 seen at https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2212325/output/105/0?file=true and lands it again.

This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc yf225, lanpa, natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23000

Reviewed By: sanekmelnikov

Differential Revision: D16349734

Pulled By: orionr

fbshipit-source-id: 604a9cad56868a55e08b509a0c6f42b84f68de95
2019-07-22 12:10:05 -07:00
fad3031b5c Fix type hints for None constants (#23029)
Summary:
The type hint was being ignored when emitting `None` constants, this also de-dups some testing code
](https://our.intern.facebook.com/intern/diff/16364572/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23029

Pulled By: driazati

Differential Revision: D16364572

fbshipit-source-id: 64f3abd3e37ee49c209480a85ed4f1b8802e5d93
2019-07-22 11:55:05 -07:00
2891784a72 Resolve with closed over variables instead of stack frame (#22270)
Summary:
Previously we looked at the stack frame of the function that called
`script` to resolve variables. This doesn't work if someone calls script
with a function defined somewhere else that references captured
variables. We already have a mechanism to look at the closed over
variables for a function, so this changes the `rcb` to use that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22270

Pulled By: driazati

Differential Revision: D16391346

fbshipit-source-id: ad9b314ae86c249251b106079e76a5d7cf6c04c2
2019-07-22 11:44:36 -07:00
fd90b967b2 Fix load op to return the shape info as before when loading multiple blobs (#23166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23166

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16390551

fbshipit-source-id: 1055b481a7a9e83021209e59f38a7cc0b49003cf
2019-07-22 11:27:59 -07:00
82db5dceb6 Added running via throughput benchmark options. (#23077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077

Although the difference between running from python and this is not much if we
have forward method's loop long enough (like 1000 in this case).

Reviewed By: mingzhe09088

Differential Revision: D16122343

fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78
2019-07-22 11:27:55 -07:00
2ba516d5b6 Added add op framework overhead benchmark for C2 (#23078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078

C2 benchmark.

Reviewed By: mingzhe09088

Differential Revision: D16122337

fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a
2019-07-22 11:27:50 -07:00
0621068cdc Add simple add op based framework overhead benchmark. (#23076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076

Tracing based and non tracing based added

Reviewed By: mingzhe09088

Differential Revision: D16097280

fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892
2019-07-22 11:27:45 -07:00
4223e2f9e9 fix qat tests (#23124)
Summary:
missed instantiating observers in linear

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23124
ghstack-source-id: 86886705

Reviewed By: raghuramank100

Differential Revision: D16401066

fbshipit-source-id: f9f0f359caeca855c62192d13261a33eef57715a
2019-07-22 10:28:35 -07:00
8bc28cc898 Remove cuda free mutex (#23040)
Summary:
Revision of https://github.com/pytorch/pytorch/issues/22173 to address CI failure after merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23040

Differential Revision: D16366872

Pulled By: mrshenli

fbshipit-source-id: 747b6ecf2dc195c25f82b8f732ae9ff52cd3a394
2019-07-22 07:58:29 -07:00
22f7c9e31b (#23105)
Summary:
Fixed a [bug](https://github.com/pytorch/pytorch/issues/22992) where passing result tensor into masked_select wont work with bool mask.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23105

Differential Revision: D16386676

Pulled By: izdeby

fbshipit-source-id: 93a1e9bfbc916c8a8eaa149a70a5553f3711f53e
2019-07-22 07:49:30 -07:00
aeee49d51d Revert "Temporarily skip mypy-0.720 to unbreak master type checks"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23095

Test Plan: Imported from OSS

Differential Revision: D16383149

Pulled By: zou3519

fbshipit-source-id: ca6bdfe0f51f6bdbd4d95142a880f3902f60676d
2019-07-22 06:54:22 -07:00
b8c8977be7 Update ScatterWeightedSum Op (#23087)
Summary:
Update ScatterWeightedSum op when there exists only one weighted X to update slice of Y which is usually the case when the op is used for gradient update. The changes remove the copy overhead and seeing significant operator performance improvement

- 25 - 50% improvment on CUDA based on input configuration

-  ~50% improvement on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23087

Differential Revision: D16385194

Pulled By: bddppq

fbshipit-source-id: 3189e892940fb9c26305269eb0d47479b9b71af0
2019-07-21 22:21:40 -07:00
ff8cb9f622 hipify: do not overwrite files that stay the same (#23112)
Summary:
This is a small patch to not overwrite unchanged files to help a bit with building.
It is not as incremental as one might like, given that one has to pass `--out-of-place-only` to not run into the patching and things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23112

Differential Revision: D16402623

Pulled By: bddppq

fbshipit-source-id: 531ce0078bc716ae31bd92c5248080ef02a065b9
2019-07-21 22:00:53 -07:00
2ac9abf759 Fix memory leak in Adam, Adagrad, RMSProp (#23125)
Summary:
As reported in LaurentMazare/tch-rs#76, the memory grows when weight_decay is present when using Adam. It applies the same fix in https://github.com/pytorch/pytorch/issues/23007 to Adam, Adagrad and RMSProp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23125

Differential Revision: D16402421

Pulled By: soumith

fbshipit-source-id: 59eb4bd81b8bd9e1a5f7c068ed841f70a4c38a80
2019-07-21 10:06:18 -07:00
96b6797fc0 improve enforce in cross_entroy_op (#23062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23062

as title

Reviewed By: xianjiec, BIT-silence

Differential Revision: D16374601

fbshipit-source-id: 62219c6abde311ebc8a0e6a03cfb517d80bb52b5
2019-07-21 00:07:58 -07:00
3e66385002 Lint fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23135

Test Plan: Imported from OSS

Differential Revision: D16403272

Pulled By: zafartahirov

fbshipit-source-id: 31f9eb11216c494a8327bcb5dc37e47a77611e2b
2019-07-20 21:46:18 -07:00
963707c5ea MaxPool2d in the torch (#22765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22765

the pooling signature is the same as the non-quantized one. Adding it to the native_functions.yaml

Reviewed By: jerryzh168

Differential Revision: D16102608

fbshipit-source-id: 7627ad8f02a231f488b74d1a245b853f89d9c419
2019-07-20 21:41:09 -07:00
cf3e6478ad Concat with out (#22408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22408

Quantized Concatenation with out argument

Reviewed By: jianyuh

Differential Revision: D16061526

fbshipit-source-id: 61487cf87763665df19feb8e678da72fd66e8740
2019-07-20 16:13:14 -07:00
05f088ec22 make jit logging visible, so it can be used in a TVM compiler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23041

Differential Revision: D16402934

Pulled By: Krovatkin

fbshipit-source-id: 715f9821809527e94bd7f01f1680db046c888e6c
2019-07-20 14:37:49 -07:00
bb9119f67d Use set -x to help investigate doc push errors (#23111)
Summary:
I couldn't find any verbosity options in the [`docker pull` command docs](https://docs.docker.com/engine/reference/commandline/pull/), but
`docker pull` [got a `--quiet` option](https://github.com/docker/cli/pull/882) in a recent version (not sure if we're using that version), and `--quiet` for `docker push` [is forthcoming](https://github.com/docker/cli/pull/1221).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23111

Differential Revision: D16402993

Pulled By: kostmo

fbshipit-source-id: 52f77b11b839d28f8cf1ecb58518ca69632d7fbe
2019-07-20 12:36:05 -07:00
a62c687445 Remove unused atomics detection code. (#23089)
Summary:
USE_{C11,MSC,GCC}_ATOMICS are not used in PyTorch or submodules. Now we remove their underlying detection code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23089

Differential Revision: D16402750

Pulled By: ezyang

fbshipit-source-id: fde84b958eb0b5b4d3f0406acefa92ab30ea43be
2019-07-20 10:52:53 -07:00
4e5f70089f fix indexing for more than 65535 elems in non-indexed first dim (#23123)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22843, also adds test from https://github.com/pytorch/pytorch/issues/23102
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23123

Differential Revision: D16402422

Pulled By: soumith

fbshipit-source-id: aa7a79159ed947be03ce3725ec8abcf5246a60bf
2019-07-20 06:17:43 -07:00
6791f395f9 support at::view and at::reshape for quantized tensor (#23046)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23046
ghstack-source-id: 86840501

Differential Revision: D16368897

fbshipit-source-id: 9da232c11f21af5f850cd9545e56996a81791d00
2019-07-19 23:34:04 -07:00
a03205ed66 Move THTensor_compute_stride to ATen (#23045)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23045
ghstack-source-id: 86842517

Differential Revision: D16368860

fbshipit-source-id: 8970a73758afadbc9a6a3e263cdcfe5e2fd9cc0d
2019-07-19 23:14:11 -07:00
0d8324b18a Add fused modules in nn._intrinsic (#23085)
Summary:
Using nn.Sequential to represent fused modules

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23085
ghstack-source-id: 86883096

Differential Revision: D16379521

fbshipit-source-id: 57d67cb947de8665bd758848595a4a000366153a
2019-07-19 23:04:25 -07:00
47af41fe72 Quantized concatenation (+fused relu). (#21749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21749

This is the first version without "requantization"

Reviewed By: jerryzh168

Differential Revision: D15807940

fbshipit-source-id: 19bb0482abed8ed9d1521a3fa1f15bda8e6a6a7c
2019-07-19 22:23:41 -07:00
9f4df63c2c Moving np function to test area
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23118

Test Plan: Imported from OSS

Differential Revision: D16400634

Pulled By: zafartahirov

fbshipit-source-id: 44872fdf64b20a6b67e5176042fe58c8c2359738
2019-07-19 22:11:21 -07:00
77353636de Conv module (#23084)
Summary:
Added Conv module for qat

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23084
ghstack-source-id: 86862445

Differential Revision: D16379417

fbshipit-source-id: 742cc8b8e0f132070ca4943a1c2e3db60c2b5bdc
2019-07-19 18:49:52 -07:00
b964bdb53a Fbgemm fp16 tensor support (#23101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101

Support for
- Shape inference
- Tensor info extraction

Reviewed By: zrphercule

Differential Revision: D16345251

fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79
2019-07-19 17:08:03 -07:00
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
79c4f83fbe Include module names in recursive error stacks (#22921)
Summary:
Following on to #22280, this adds module names so they're included in
the call stacks of an error message (e.g. so it appears as `M.forward`
instead of `forward`)
](https://our.intern.facebook.com/intern/diff/16287925/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22921

Pulled By: driazati

Differential Revision: D16287925

fbshipit-source-id: 6f31d72caa87ba2dc527805d36f7d62eb94c0808
2019-07-19 16:09:14 -07:00
7cc029cb75 Quantization aware training in eager mode (#23082)
Summary:
Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650

Differential Revision: D16379374

fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
2019-07-19 14:57:25 -07:00
c09e92255c Add initial support for serializing classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22953

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D16340214

Pulled By: zdevito

fbshipit-source-id: 70fb1968eca34e14492e0d2be52e28b27813f821
2019-07-19 14:51:59 -07:00
6334edc2d8 BlackBoxPredictor OSS: open-source NQL and custom transforms (#22877)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison

This specific diff:

There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22877

Test Plan:
did a bunch of unit tests locally and now

waitforsandcaslte

AdFinder canary:
https://our.intern.facebook.com/intern/ads/canary/419623727275650390

adindexer:
https://our.intern.facebook.com/intern/ads/canary/419623750891549182

prospector:
https://our.intern.facebook.com/intern/ads/canary/419644899887610977
https://our.intern.facebook.com/intern/ads/canary/419645123742738405

Differential Revision: D16267765

Pulled By: salexspb

fbshipit-source-id: 776a1cd5415e0695eae28254b3f155e7a9bd8c2b
2019-07-19 14:37:56 -07:00
f2f3e8ad8c fix overspecializing constants in compilation (#22816)
Summary:
When we specialize the tensor type of constants in compilation it causes all sorts of problems.

Fix for https://github.com/pytorch/pytorch/issues/22809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22816

Differential Revision: D16384094

Pulled By: eellison

fbshipit-source-id: f33c00d92d87108749d09bf037a6e74c5d9adaa2
2019-07-19 14:19:49 -07:00
a302821c5d Adding more binary documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23093

Differential Revision: D16384838

Pulled By: pjh5

fbshipit-source-id: 0ce91c2f3f0ec8f5c026622f27039b36c42a81d4
2019-07-19 14:06:34 -07:00
Jie
a28ffaf350 (#22827)
Summary:
1. Fix out of range memory access for reduction on all dimensions for non-packed
tensor.

2. Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22827

Differential Revision: D16271897

Pulled By: zdevito

fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d
2019-07-19 13:38:17 -07:00
818828e8a8 Only import PIL when needed (#23023)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22389

In most cases we only import `PIL` methods when we need them, but we missed a spot.

cc lanpa natalialunova sanekmelnikov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23023

Reviewed By: sanekmelnikov

Differential Revision: D16373492

Pulled By: orionr

fbshipit-source-id: b08bf8a9b5a861390eadf62eda21ac055777180f
2019-07-19 13:30:43 -07:00
mal
44493a623e Pass variable_list of inputs to _wrap_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23037

Test Plan: Imported from OSS

Differential Revision: D16380071

fbshipit-source-id: ae3333c02ef8a3c09b95bec7b8e92ce649553615
2019-07-19 12:31:23 -07:00
2ee0f0bc3a add break continue to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23091

Differential Revision: D16382604

Pulled By: eellison

fbshipit-source-id: 47432d844c811ecd87ad97155e835b07ae8056cc
2019-07-19 12:17:00 -07:00
6dfecc7e01 Remove deprecated linear algebra functions (and methods) (#22841)
Summary:
Changelog:
- Removed the following linear algebra functions in PyTorch in favor of the renamed operations
  - `btrifact` (use `lu` instead)
  - `btrifact_with_info` (use `lu` with `get_infos=True` instead)
  - `btrisolve` (use `lu_solve` instead)
  - `btriunpack` (use `lu_unpack` instead)
  - `gesv` (use `solve` instead)
  - `pstrf` (use `cholesky` instead)
  - `potrf` (use `cholesky` instead)
  - `potri` (use `cholesky_inverse` instead)
  - `potrs` (use `cholesky_solve` instead)
  - `trtrs` (use `triangular_solve` instead)

- Removed dead code after the removal of `pstrf`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22841

Test Plan:
- All existing tests should pass to verify that the removal is clean

Closes https://github.com/pytorch/pytorch/issues/22832

Differential Revision: D16346184

Pulled By: zou3519

fbshipit-source-id: f748d16ed7609c028de6adcbc28684d5a1af0678
2019-07-19 11:43:06 -07:00
61a683c212 Delete aten/src/ATen/out.txt (#23050)
Summary:
An file introduced in 5c0e0589509540fc991a88ffc48e96cc76fd799d , probably by mistake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23050

Differential Revision: D16379947

Pulled By: ezyang

fbshipit-source-id: b7fa8995028e180603d7830b6f170a7a57310385
2019-07-19 10:27:59 -07:00
5417ddbdae Fix get_all_math_dtypes for device='cuda' retuning None (#23028)
Summary:
This PR fixes the invalid None return when calling get_all_math_dtype(device='cuda').

Issue came from the __append__ method which doesn't have any return value used in `return dtypes.append(...)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23028

Differential Revision: D16362732

Pulled By: colesbury

fbshipit-source-id: 0bbc30a0c663749d768159f1bc37b99f7263297b
2019-07-19 09:29:16 -07:00
84c2c89e2c Revert D16199356: [qat] Quantization aware training in eager mode
Differential Revision:
D16199356

Original commit changeset: 62aeaf47c12c

fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666
2019-07-19 03:18:48 -07:00
f19aa12ae5 Revert D16274792: [qat] Conv module
Differential Revision:
D16274792

Original commit changeset: 1da10194123b

fbshipit-source-id: 71b34774b463f2350289bd39b8cfd798e095ffa5
2019-07-19 03:18:45 -07:00
c362e72d4a Revert D16349133: [quant] Add fused modules in nn._intrinsic
Differential Revision:
D16349133

Original commit changeset: 04d862ac4a0d

fbshipit-source-id: d96d9d98e9b29fddf93d4106621752abb00947eb
2019-07-19 03:18:41 -07:00
2401a05aae Revert D16373996: [fix] conv module missing return
Differential Revision:
D16373996

Original commit changeset: 1ec85d23c9dd

fbshipit-source-id: e507db59405aa240d20f132c3d6df323b241a542
2019-07-19 03:06:39 -07:00
25f0dc3490 BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)
Summary:
This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`.
The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy.
1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`;
2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory.

With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`:

1. before (unit: iterations/sec):
```bash
408/408 [00:24<00:00, 16.69it/s]
```
2. after (unit: iterations/sec):
```bash
408/408 [00:16<00:00, 24.06it/s]
```

The latency reduces from `59.92 ms` to `41.56ms` correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851

Differential Revision: D16056334

Pulled By: dzhulgakov

fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a
2019-07-19 00:54:29 -07:00
12ac9171db fix error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22982

Differential Revision: D16356464

Pulled By: soumith

fbshipit-source-id: 3ddd5de4cf5c000dcf5b2faed39283dc715cba25
2019-07-18 23:38:55 -07:00
cdfdeb74af conv module missing return (#23058)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23058
ghstack-source-id: 86807313

Reviewed By: jianyuh

Differential Revision: D16373996

fbshipit-source-id: 1ec85d23c9ddd9975bc32f6c5d30cde04eb1109e
2019-07-18 22:24:56 -07:00
6601978012 Use ProfiledTensorType in peephole.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22767

Differential Revision: D16342954

Pulled By: Krovatkin

fbshipit-source-id: a577ea942ff4bab6ae15f14d6ba04a68675c70aa
2019-07-18 21:49:45 -07:00
d153b0b58b Updating submodules
Reviewed By: yns88

fbshipit-source-id: 87bb7a817dea65783436d6d6dfbbd492724d20a7
2019-07-18 20:43:55 -07:00
23badc60f3 Fix TBB build for older versions of cmake
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23038

Test Plan:
with-proxy pip install --upgrade cmake==3.11.0
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.13.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.6.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

Imported from OSS

Differential Revision: D16365699

Pulled By: ilia-cher

fbshipit-source-id: cbf779dff63e4e186d9b4c2fc21539a24ce0d5a2
2019-07-18 20:12:26 -07:00
e57b682abf Add fused modules in nn._intrinsic (#22999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22999

Using nn.Sequential to represent fused modules

Reviewed By: zafartahirov

Differential Revision: D16349133

fbshipit-source-id: 04d862ac4a0d20e83dc9d6de6b7d0d0c26bdedfd
2019-07-18 18:58:11 -07:00
12d9d768b8 Conv module (#22899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22899

Added Conv module for qat

Reviewed By: zafartahirov

Differential Revision: D16274792

fbshipit-source-id: 1da10194123b2759a6a35c60d1c2d2c0b569ccdc
2019-07-18 18:58:07 -07:00
65ef671d11 Quantization aware training in eager mode (#22732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732

Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Reviewed By: zafartahirov

Differential Revision: D16199356

fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
2019-07-18 18:58:03 -07:00
8dfbbf7bf2 Add nn.qat.Linear (#22714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22714

We need this module for add fake_quant for weight

Reviewed By: zafartahirov

Differential Revision: D16193585

fbshipit-source-id: ed6c04ecf574ca1fe1dcded22c225da05976f7a3
2019-07-18 18:27:27 -07:00
b6011c3caf Update torchvision in CI. (#22754)
Summary:
To include dea1afbf5e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22754

Differential Revision: D16366676

Pulled By: zhangguanheng66

fbshipit-source-id: abfcb785973f9caa2a5aa1154fa689bbba8ff2dd
2019-07-18 18:22:24 -07:00
358e0d3d44 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 3998c7b50a8b0377e8f1748a8dbd3b7d2afc99a4
2019-07-18 16:36:25 -07:00
9897ec4701 Recursively compile class types (#22475)
Summary:
Try to compile for class types encountered in recursive script
](https://our.intern.facebook.com/intern/diff/16340717/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22475

Pulled By: driazati

Differential Revision: D16340717

fbshipit-source-id: 5e1a46db517be2412f57156efbc4eb3347b01a8a
2019-07-18 15:43:16 -07:00
425d28c30a Reapply: optimize topk on cpu using parallel and partial sort (#19736) (#22865)
Summary:
https://github.com/pytorch/pytorch/issues/19736 was reverted as it was suspected to be broken on the master, trying to reapply
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22865

Differential Revision: D16265457

Pulled By: VitalyFedyunin

fbshipit-source-id: 784bd6405471f15a8a49ebd0f3e98160d7d0679e
2019-07-18 14:15:54 -07:00
c1c4014bba Add warning for legacy autograd function (#22922)
Summary:
When working on https://github.com/pytorch/pytorch/pull/22762, we discovered that we haven't actually deprecated legacy autograd function. This PR puts up the deprecation warning for 1.2, with the goal to remove legacy function support completely in the near future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22922

Differential Revision: D16363916

Pulled By: yf225

fbshipit-source-id: 4b554010a3d1f87a3fa45cc1aa29d019c8f1033c
2019-07-18 14:02:17 -07:00
a2b3403962 Mark protobuf include path as system include (#23012)
Summary:
To suppress (many) compiler warnings from protobuf headers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23012

Differential Revision: D16364573

Pulled By: bddppq

fbshipit-source-id: adbc4921e29389131d43e7bcc1e6fcba19450c76
2019-07-18 13:44:39 -07:00
84d892b645 Remove DistributedDataParallelCPU as DDP now supports CPU models (#22864)
Summary:
cc ailzhang aazzolini yifuwang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22864

Differential Revision: D16358011

Pulled By: mrshenli

fbshipit-source-id: 8db2dc035dea03f07a32c749e754f625fda1bf28
2019-07-18 12:50:45 -07:00
a5e6586618 Revert D16357177: [pytorch][PR] Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup.
Differential Revision:
D16357177

Original commit changeset: f4ca9cd46cc6

fbshipit-source-id: 49e66e7e59df6cbc7f5d847bacc07da134067956
2019-07-18 12:28:46 -07:00
11d257e5df Fix SGD memory leak when there is weight_decay (#23007)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/20146. I am working on another PR that adds CPU and CUDA memory leak checking to all C++ API tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23007

Differential Revision: D16358973

Pulled By: yf225

fbshipit-source-id: 5ee7ed4e61e60424031540a633e1fae09d9df171
2019-07-18 12:10:10 -07:00
502766e99e Add the mathematical definition of torch.sign to clarify this is the sgn function.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22894

Differential Revision: D16345027

Pulled By: ezyang

fbshipit-source-id: 1421571f1f8764539a35b9060d90ea6075f889d3
2019-07-18 11:45:27 -07:00
662fe699c5 Named inference rules for some initializer fns
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22972

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16342782

Pulled By: zou3519

fbshipit-source-id: 25277688ab51e1e98af0e19aeb9c79399171d2fb
2019-07-18 10:04:29 -07:00
57cec0a720 Named inference rules for split/chunk
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22971

Test Plan: Imported from OSS

Differential Revision: D16342783

Pulled By: zou3519

fbshipit-source-id: 379edc8eb2f45a82ee8a6320f8285f8f81ea0b1b
2019-07-18 10:04:25 -07:00
6b70217a7e Adding README for binaries to OSS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23001

Differential Revision: D16359617

Pulled By: pjh5

fbshipit-source-id: bfe3f0e1dcb00f34e9362a74227e8a0bb90a8aaf
2019-07-18 10:04:21 -07:00
b91ab177a0 Add support to print QTensor in cpp (#22950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22950

Print quantized tensor by first dequantizing it and then printing. Also print the scale, zero_point. size and type of tensor.

Reviewed By: jerryzh168

Differential Revision: D16286397

fbshipit-source-id: 2d6fb1796e5b329a77c022b18af0a39f6edde0d7
2019-07-18 09:44:20 -07:00
0c091380cc disable non-deterministic cudnn ctcloss (#22977)
Summary:
Associated issue: https://github.com/pytorch/pytorch/issues/21680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22977

Differential Revision: D16357873

Pulled By: nairbv

fbshipit-source-id: 58711bac7d3e8390e868d594dc265ba053a1537c
2019-07-18 08:28:42 -07:00
29347cc9cf Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. (#22173)
Summary:
There are two mutexes within CUDACachingAllocator that cause a deadlock.  One of the mutexes was added in order to work around the issue of NCCL interacting poorly with cudaFree.  See

- 68ff58d771
- https://github.com/pytorch/pytorch/pull/880

As of NCCL version 2 and its new group start/end APIs, the protection surrounding cudaFree() is no longer needed.  The PyTorch code was updated to use the NCCL2 group start/end API, but the corresponding cuda_free_mutex and its getter getFreeMutex() were not revised.  This PR removes the use of the getFreeMutex() when NCCL2 is used by moving calls to getFreeMutex() into the AutoNcclGroup.  That way, depending on the NCCL version used, we either use the mutex or we use the new group APIs.

The race condition is as follows, thanks to skeelyamd:

The deadlock occurs between hip_free_mutex (aka cuda_free_mutex in github) (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L165) and mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L162).

hip_free_mutex is exported from THCCachingAllocator in getFreeMutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L660) and is acquired in ProcessGroupNCCL::collective (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L397), which then calls back into THCCachingAllocator via c10::cuda::CUDACachingAllocator::recordStream (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L416 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L655 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L379).  At this point it acquires mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L384).

This requires hip_free_mutex to be locked before mutex.

However, in free_blocks (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L505) THCCachingAllocator locks hip_free_mutex.  Free_blocks is called from emptyCache (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L328) which locks mutex.

That requires mutex to be locked before hip_free_mutex.

emptyCache and ProcessGroupNCCL::collective may not be executed concurrently but this is occurring and deadlocking the CPU.

free_blocks is also called by malloc (via cuda_malloc_retry -> free_cached_blocks -> free_blocks) which also locks mutex first and so malloc must not execute concurrent with ProcessGroupNCCL::collective.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22173

Differential Revision: D16357177

Pulled By: pietern

fbshipit-source-id: f4ca9cd46cc6d5e15290d99577d88be3f4fa8972
2019-07-18 07:31:02 -07:00
14ecf92d42 Slightly improve irfft doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22995

Differential Revision: D16356435

Pulled By: soumith

fbshipit-source-id: f6cfd9990fd79faebfb566704359c866ddf36525
2019-07-18 03:12:49 -07:00
c2df54d6d0 avg_pool2d avg_pool3d for LongTensor (#22433)
Summary:
Generate avg_pool2d/avg_pool3d for LongTensor for CPU.
Added divisor_override parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433

Differential Revision: D16108809

Pulled By: ifedan

fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50
2019-07-17 19:59:09 -07:00
52bf38007b Remove usage of legacy autograd function (#22925)
Summary:
We are planning to put up a deprecation warning for legacy autograd function in 1.2: https://github.com/pytorch/pytorch/pull/22922. This PR removes all usage of legacy function in PyTorch core and test suite, to prepare for the eventual removal of legacy function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22925

Differential Revision: D16344834

Pulled By: yf225

fbshipit-source-id: 8bf4cca740398835a08b7a290f3058c3e46781ba
2019-07-17 19:50:36 -07:00
29853293d7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 4f801d353ee14ec0bd6fd24830f0e7a4343d67f8
2019-07-17 18:05:09 -07:00
992f3860a3 Quantized relu to native_functions (#22316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22316

Adding the quantized ReLU to the native_functions.yamp, as it has the same signature as non-quantized relu

Reviewed By: jerryzh168

Differential Revision: D16038441

fbshipit-source-id: 1cfbb594eb9bca1b7ec49ca486defcf1908b0d26
2019-07-17 17:31:02 -07:00
e24f18cea0 Revert D15854892: [pytorch][PR] [tensorboard] Cleanup API and remove 'experimental' warning
Differential Revision:
D15854892

Original commit changeset: 06b849882694

fbshipit-source-id: 588edc4616d020a23645f8c8181782c8412c4c6e
2019-07-17 16:45:54 -07:00
a0ef4abeed Add missing comment from #22103 (#22984)
Summary:
One important comment is missing from https://github.com/pytorch/pytorch/issues/22103 (not sure what happened).
This commit makes it up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22984

Differential Revision: D16347044

Pulled By: ezyang

fbshipit-source-id: 0903909a5fb6740b43195136f1a23c28cfb2a02f
2019-07-17 16:21:38 -07:00
442dd7b906 Implement "trimmed lasso" regularization and support all available regularization in a single interface (#22966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22966

We want to implement "trimmed lasso" for feature selection with learnable and regularizable weights. Trimmed lasso is a simple yet powerful improved version from traditional lasso. More reference can be found at https://arxiv.org/abs/1708.04527 and http://proceedings.mlr.press/v97/yun19a.html. For quick and necessary intro, please refer to P1-3 of the paper at https://arxiv.org/abs/1708.04527.

Given n weights, traditional lasso sums up all weights' l1 norms. The trimmed lasso takes an input integer k (how many weights you want to select from n) and only sums over the smallest n - k weights. Given lambda as the regularization constant, the penalty term is only on the smallest n - k weights, but not other larger weights. If lambda becomes larger than certain threshold, the smallest n - k weights are shrunk to zero. That means we have those weights "dropped". With this property, the number k is the number of weights left after lasso, which we can easily control.

Meanwhile, we further support all available regularization in a single interface. Current supported regularizers on weights include no reg, l1, l2, elastic, trimmed l1, elastic with trimmed l1, group l1, and logbarrier.

Differential Revision: D16326492

fbshipit-source-id: 6e1fd75606005d9bc09d6650435c96a7984ba69c
2019-07-17 16:12:31 -07:00
eb76b7a564 Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6
Differential Revision:
D16199862

Original commit changeset: 46ca6029a232

fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b
2019-07-17 14:26:56 -07:00
796a39ba85 Automatic update of fbcode/onnx to 707064980b9825b8705b9d1c9aad34d8b022d5dd (#22981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22981

Previous import was 806aa863020fa180e57f576cb032ec44ce8ddcca

Included changes:
- **[70706498](https://github.com/onnx/onnx/commit/70706498)**: TensorProto::INT8 & INT16 were missed here (#2164) <ZINEKS>
- **[8218a4ea](https://github.com/onnx/onnx/commit/8218a4ea)**: Fix LabelEncoder's shape inference (#2170) <Wei-Sheng Chin>
- **[0f1a9a1c](https://github.com/onnx/onnx/commit/0f1a9a1c)**: Fixing a unit test in Cumsum Operator (#2157) <Jeff Saremi>
- **[2c03cff0](https://github.com/onnx/onnx/commit/2c03cff0)**: [New Operator] CumSum (#2030) <Jeff Saremi>
- **[220b8300](https://github.com/onnx/onnx/commit/220b8300)**: Fix globalpool output shape (#2147) <daquexian>

Reviewed By: benoitsteiner

Differential Revision: D16341736

fbshipit-source-id: 7e7a2684d8c821991231bfd6558f9f6cb4fb05fb
2019-07-17 14:05:14 -07:00
031b406c38 Update ROCm CI to python3.6 (#22322)
Summary:
Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Open tasks/questions:
* RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped?
* for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322

Differential Revision: D16199862

Pulled By: ezyang

fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8
2019-07-17 13:42:30 -07:00
5adba33c01 Use integer floor division for pooling shape computation (#22304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21935 by using the integer floor division that was introduced for convolution shapes in https://github.com/pytorch/pytorch/issues/9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't.

Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22304

Differential Revision: D16181955

Pulled By: ezyang

fbshipit-source-id: a2405372753572548b40616d1206848b527c8121
2019-07-17 13:23:29 -07:00
332824551c Fix F.one_hot doc signature
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22929

Differential Revision: D16290741

Pulled By: ezyang

fbshipit-source-id: d8b979e64d92b94c5a70bb4ffe2a83042ed6abfc
2019-07-17 13:23:25 -07:00
074afd7143 Remove unneeded IValue copy in unpickler.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22883

Test Plan: Imported from OSS

Differential Revision: D16270330

Pulled By: zdevito

fbshipit-source-id: ffd05b8c6860889d75172a288f339a434af76d45
2019-07-17 11:00:38 -07:00
b6adb568fb Cleanup some logic in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22882

Test Plan: Imported from OSS

Differential Revision: D16270332

Pulled By: zdevito

fbshipit-source-id: 714f293493965b13e471945fde11831a04875604
2019-07-17 11:00:34 -07:00
3c0814ffeb add docs to onnx APIs (#22938)
Summary:
Add docs to onnx APIs, including
  - export
  - export_to_pretty_string
  - is_in_onnx_export

Fix https://github.com/pytorch/pytorch/issues/14698
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22938

Differential Revision: D16296182

Pulled By: zhangguanheng66

fbshipit-source-id: 1a1fa769b430db6428e6dfafba5447e6e2a75517
2019-07-17 10:50:41 -07:00
4861527446 Cleanup API and remove 'experimental' warning (#21786)
Summary:
This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc ezyang lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21786

Reviewed By: natalialunova

Differential Revision: D15854892

Pulled By: orionr

fbshipit-source-id: 06b8498826946e578824d4b15c910edb3c2c20c6
2019-07-17 10:34:00 -07:00
2630109727 always restore dlopen flag in dyndep (#22958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22958

When we use `extension_loader.DlopenGuard()` to dyndep or import modules, it sets a `RTLD_GLOBAL` flag, and restores the original flags after the `yield`. However, if the modules is not there, yield will fail, and the flags won't be restored, creating all kinds of symbol conflict problems.

Reviewed By: bddppq

Differential Revision: D16311949

fbshipit-source-id: 7b9ec6d60423ec5e78cae694b66c2f17493840b0
2019-07-17 10:26:25 -07:00
35b6cdc2eb Rewriting hypothesis_utils (#22830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22830

Separating the tensor generation and the generation of the quantization parameters

- Introducing hypothesis filter `assume_not_overflowing`, which makes sure that the generated tensor and qparams play well with each other. **Note: This is an expensive filter!**
- `qtensor` -> Renameed to `tensor`
- `qtensor_conv` -> Renamed to `tensor_conv2d`
- The tensors don't return the quantization parameters anymore, use `qparams` for it
- The `dtypes` argument is just a quantized dtype now.
- The enforcement for zero_point is predefined as before. As before, if set to `None` the zero_point will be sampled. However, if `None`, you can override sampling with `zero_point_min` and `zero_point_max`
- Scale sampling can also be overriden using `scale_min` and `scale_max`

Reviewed By: jerryzh168

Differential Revision: D16234314

fbshipit-source-id: 5b538a5aa9772b7add4f2ce5eff6fd0decd48f8e
2019-07-17 10:16:13 -07:00
b96610bf5a fix the CI job for onnx (#22946)
Summary:
ONNX uses virtualenv, and PyTorch doesn't. So --user flag is causing problems in ONNX ci...

Fixing it by moving it to pytorch only scripts. And will install ninja in onnx ci separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22946

Reviewed By: bddppq

Differential Revision: D16297781

Pulled By: houseroad

fbshipit-source-id: 52991abac61beaf3cfbcc99af5bb1cd27b790485
2019-07-17 09:50:06 -07:00
f72d754877 qlinear operator level benchmark (#22914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914

Adding op level benchmarking for qlinear operator

Reviewed By: mingzhe09088

Differential Revision: D16285204

fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb
2019-07-17 09:13:17 -07:00
7a99f3987b Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16286198

Pulled By: zou3519

fbshipit-source-id: a5a6ec829084bdb752ca6006b8795227cbaf63b1
2019-07-17 07:38:23 -07:00
5911cb8e5c Make load() create only one CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22727

Differential Revision: D16197603

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 3eaefe6f229032b109d63a151fe0a20268b5cf56
2019-07-16 20:08:10 -07:00
ec57d9215f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 7eb29a58ff20b8ff0b793a84eb2f00e0a1bbe4b5
2019-07-16 19:53:06 -07:00
7ed82ea461 Added generation of transpose and dilated 2D and 3D for LongTensor (#22594)
Summary:
Added implementations: transpose2D transpose3D dilated2D and dilated3D for LongTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22594

Differential Revision: D16155462

Pulled By: ifedan

fbshipit-source-id: af57330314bc2c3e0a38b9e75105b20030a1f9bb
2019-07-16 18:58:39 -07:00
bcfa023a00 hardshrink_cpu and hardshrink_backward_cpu refactoring with at::native::cpu_kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22459

Differential Revision: D16132625

Pulled By: pbelevich

fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5
2019-07-16 18:58:35 -07:00
ef36046ad7 Better error message for using Python builtin_function_or_method (#22935)
Summary:
* better error in `toSugaredValue`
* removes a bunch of periods on error messages, `ErrorReport` already adds a `:` at the end of the message](https://our.intern.facebook.com/intern/diff/16291079/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22935

Pulled By: driazati

Differential Revision: D16291079

fbshipit-source-id: 478724fc7d1ae79093f4ede18553ffeafa2c7964
2019-07-16 16:49:04 -07:00
25b69997c3 Tensorboard Metrics (#22492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22492

Collect metrics about Tensorboard usage
[internal] fbcode/pytorch/tensorboardX/tensorboardX/writer.py
[OSS] fbcode/caffe2/torch/utils/tensorboard/writer.py
Tensorboard Ondemand
https://fb.quip.com/JRvqAKtzgy6z

Reviewed By: dzhulgakov

Differential Revision: D16105544

fbshipit-source-id: de14e6ec781889e367a6eba39fc777f707628263
2019-07-16 16:18:00 -07:00
7793ab0871 More documentation about the pyobj field.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22885

Test Plan: Imported from OSS

Differential Revision: D16283076

Pulled By: ezyang

fbshipit-source-id: 4f6a87d900c4d430eedc90661de89e0f6916347e
2019-07-16 14:47:38 -07:00
cd11109c2e Fix messed up tests for dropout (#22893)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22109.

I've confirmed with suo that this wasn't intentional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22893

Differential Revision: D16288640

Pulled By: Chillee

fbshipit-source-id: 00fd6fe418ecefb304866a723051d0e5451ba4d5
2019-07-16 14:17:11 -07:00
8ced53d62b Correct the check of whether src is defined in copy_. (#22715)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22715

Differential Revision: D16205243

Pulled By: ezyang

fbshipit-source-id: 9bf5a7885691d057198ae482259b36c1773457dd
2019-07-16 14:03:43 -07:00
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
7586ffdc57 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 30926ecb8fabee3f020ae183bb568a11145bcada
2019-07-16 13:37:24 -07:00
01f03d56ee Revert D16283037: Add sanity checks for NCCL detection.
Differential Revision:
D16283037

Original commit changeset: fc09c9443a56

fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e
2019-07-16 13:20:43 -07:00
7a370dbb41 Enable recursive script mode as the default (#22887)
Summary:
This fixes up the test suite (mostly just adding `ignore` decorations
to tests that need to call Python function) so that it all passes with
recursive script enabled.

The main user-facing result of this change is that Python functions are
compiled without any decorators, so non-TorchScriptable code must be
decorated with `torch.jit.ignore` (or
`torch.jit.ignore(drop_on_export=True` to maintain the functionality of
the current `ignore`)

Details can be found in #20939
](https://our.intern.facebook.com/intern/diff/16277608/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22887

Pulled By: driazati

Differential Revision: D16277608

fbshipit-source-id: 0abd0dc4291cf40651a1719bff813abb2b559640
2019-07-16 13:00:08 -07:00
eaee0c6cd9 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22902

Test Plan: Imported from OSS

Differential Revision: D16278159

Pulled By: suo

fbshipit-source-id: 6aa682e347847e808b44218d38ff1dae66945a07
2019-07-16 12:04:20 -07:00
b6a88b3344 Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901

Test Plan: Imported from OSS

Differential Revision: D16278160

Pulled By: suo

fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a
2019-07-16 12:04:16 -07:00
c6fe864db3 Add key_padding_mask kwarg to Transformer (#22588)
Summary:
Motivation:
The forward method of MultiheadAttention has a kwarg a key_padding_mask. This mask is of shape (N,S) where N is batch and S is sequence length. This mask is applied prior to attention softmax where True values in the mask are set to float('-inf'). This allows you to mask position j from attention for all position i in input sequence. It's typically used to mask padded inputs. So for a sample in a batch we will be able to make sure no encoder outputs depend on padding inputs. Currently the Transformer, TransformerEncoder, and TransformerEncoderLayer do not have this kwarg, and only have options for a (S,S), (T,T), and (S,T) masks which are applied equally across the batch for source input, target output, and target-source memory respectively. These masks can't be used for padding and are instead used for things like subsequent masking in language modeling, by masking the attention of position i to position j.

This diff exposes the key_padding_mask to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods which is ultimately passed to MultiheadAttention forward.

Open question: should we also allow a key_padding_mask for the decoder layer? As padding is usually at the end of each sentence in a batch and sentences are usually decoding from left to right, usually people deal with padding on decoded outputs by just masking those outputs at the loss layer. There might be some scenarios where it's needed though I don't think it would be common. People can also still just subclass and override the layers. We could also pass the input key_padding_mask to the memory <> decoder attention layer. Not sure if that's necessary though because the output of position i from each attention encoder layer won't depend on any masked positions in the input (even if position i is a masked position itself) so there's not really any point in masking position i again.
Adds the key_padding_mask kwarg to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods.
The standard TransformerEncoderLayer uses a MultiheadAttention layer as self_attn. MultiheadAttention forward method has a key_padding_mask kwarg that allows for masking of values such as padding per sequence in a batch, in contrast to the attn_mask kwarg which is usually of shape (S,S) and applied equally across the batch.

MultiheadAttention calls functional.multi_head_attention_forward, which has the same key_padding_mask kwarg of shape (N,S). Masked (True) values are set to float('-inf').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22588

Test Plan:
buck test mode/dev caffe2/test:nn -- 'test_transformerencoderlayer \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_Transformer_cell \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_transformer_args_check \(test_nn\.TestNN\)'

Differential Revision: D16112263

Pulled By: lucasgadams

fbshipit-source-id: dc4147dd1f89b55a4c94e8c701f16f0ffdc1d1a2
2019-07-16 11:57:22 -07:00
9b9546a498 replace ByteTensor with bool in fill_test (#22913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913

as title

Reviewed By: hl475

Differential Revision: D16285248

fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828
2019-07-16 11:51:55 -07:00
31497799b9 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16283037

Pulled By: ezyang

fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6
2019-07-16 11:32:36 -07:00
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
3ea04b59c0 Resolve the doc issue in which two asterisks have weird links. (#22896)
Summary:
Asterisks start emphases in rst. We should either escape them or put them as interpreted text.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22896

Differential Revision: D16282869

Pulled By: zou3519

fbshipit-source-id: 15ec4286434db55fb8357b1a12e6f70ef54f8c66
2019-07-16 11:23:06 -07:00
3f3f5d042a Revert D16227440: [pytorch][PR] Update note about tensors on CPU for certain MAGMA functions, elimina…
Differential Revision:
D16227440

Original commit changeset: 97d5537c5da9

fbshipit-source-id: 2dacfcc821e1fb64466e185efa0f6abd0c9ba526
2019-07-16 11:13:59 -07:00
52de340629 Export torch.masked_fill with onnx::where
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22521

Reviewed By: zrphercule

Differential Revision: D16155168

Pulled By: houseroad

fbshipit-source-id: 5d419f08213324d474b839ba1ae13c799aeee92a
2019-07-16 10:55:30 -07:00
6c997538b7 Unwrap sccache post-build for ROCm compilations. (#22743)
Summary:
The sccache wrapping strategy causes problems for at-runtime kernel
compilation of MIOpen kernels. We therefore - after the builds of
caffe2/pytorch are complete - unwrap sccache again by moving the clang-9
actual binary back into its original place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22743

Differential Revision: D16283329

Pulled By: bddppq

fbshipit-source-id: 4fcdc92be295d5ea9aba75c30e39af1a18a80c13
2019-07-16 10:28:16 -07:00
ba38445cfd Fix alias annotations for dict ops (#22900)
Summary:
Fixes #22553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22900

Pulled By: driazati

Differential Revision: D16277794

fbshipit-source-id: 657f18c50c9a87597ec1a7d568cc532638cfe386
2019-07-16 10:28:12 -07:00
8482efb203 pin_memory malloc now uses existing context if available. (#22229)
Summary:
This is achieved by using `cuDevicePrimaryCtxGetState` as a way to check whether a primary context exists on a device. It is not too slow, from this benchmark of a single call to it on CUDA 10.1, Titan Xp, driver 415.27:
```
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_cuDevicePrimaryCtxGetState        301 ns        301 ns    2319746
```

Commits:

1. Add `CUDAHooks::getDeviceWithPrimaryContext` which returns a device index with primary context (if exists).
    Link `c10/cuda` against `libcuda` for device API calls.
2. Use `getDeviceWithPrimaryContext` to check primary context in `pin_memory`.
    Fix `OptionalDeviceGuard` doc.
3. Refactor `test_cuda_primary_ctx.py` to support multiple tests.
    Add test for this in that file.

Fixes https://github.com/pytorch/pytorch/issues/21081.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22229

Differential Revision: D16170194

Pulled By: zou3519

fbshipit-source-id: 485a45f211b7844c9e69c63f3b3b75194a796c5d
2019-07-16 10:18:30 -07:00
054c7eb0f4 Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16227440

Pulled By: zou3519

fbshipit-source-id: 97d5537c5da98c0ed3edc4668a09294794fc426b
2019-07-16 10:09:10 -07:00
f8ad65adb1 Fix torch.triu / torch.tril on contiguous tensors with non-default st… (#22730)
Summary:
…rides

Changelog:
- Fix behavior of `torch.triu` / `torch.tril` on certain unsqueezed tensors that lead to uninitialized values on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22730

Test Plan:
- Add tests for these cases in test_triu_tril in test_torch

Fixes https://github.com/pytorch/pytorch/issues/22581

Differential Revision: D16222897

Pulled By: zou3519

fbshipit-source-id: b86b060187797e5cd2a7731421dff1ba2b5c9596
2019-07-16 10:09:03 -07:00
0ea8e61f03 For consistent CUDA_HOME behavior (#22845)
Summary:
Align the behavior of `torch.utils.cpp_extension.CUDA_HOME` with that of `tools.setup_helpers.cuda.CUDA_HOME`.

Typically, I swapped the position of guess 2 and guess 3 in `torch.utils.cpp_extension.CUDA_HOME` .

Fixing issue https://github.com/pytorch/pytorch/issues/22844
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22845

Differential Revision: D16276241

Pulled By: zou3519

fbshipit-source-id: 3b62b439b2f794a6f3637a5fee58991f430985fe
2019-07-16 09:55:56 -07:00
560d847da6 add benchmark for PT fill_ op (#22867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867

as title

Reviewed By: hl475

Differential Revision: D16263458

fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d
2019-07-16 09:50:41 -07:00
3b1c3996e1 remove RTTI check for TensorImpl shadow copy (#22773)
Summary:
We introduced RTTI in recent change: https://github.com/pytorch/pytorch/pull/21613

For internal mobile build we don't enable '-frtti' yet. This diff is trying to replace
RTTI with alternative approach.

According to dzhulgakov we could compare two tensors' type_id directly in most cases -
which is more strict than comparing TensorImpl subclass type as TensorImpl -> type_id
mapping is 1-to-n but it's more proper for this use case.

The only two cases where we can relax direct type comparison (for legacy reason) are:
1. CPUTensor <-> CUDATensor;
2. SparseCPUTensor <-> SparseCUDATensor;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22773

Differential Revision: D16277696

Pulled By: ljk53

fbshipit-source-id: 043e264fbacc37b7a11af2046983c70ddb62a599
2019-07-15 23:21:57 -07:00
c5afdd0b55 Revert D16197605: [jit] Make traced fns also go into the global python CU
Differential Revision:
D16197605

Original commit changeset: d32c975486b0

fbshipit-source-id: a00f0490cc23824792f3e745d7b5a003b1a33d20
2019-07-15 22:31:33 -07:00
a326aad816 Revert D16197608: [jit] Make classtypes hold a weak_ptr to their CU
Differential Revision:
D16197608

Original commit changeset: 22250d6f0d24

fbshipit-source-id: 47a8cdeb62b1033252070ecb92906358014b551a
2019-07-15 19:49:41 -07:00
5f05037de6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: a7af5bc022abbfb81af31dbb653e25a3b8d54c4f
2019-07-15 18:07:21 -07:00
94d99f2522 add num_runs flag to the benchmark (#22892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892

Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations.

Reviewed By: hl475

Differential Revision: D16271597

fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd
2019-07-15 17:18:25 -07:00
6ffacd5f02 Use original module's class name for ScriptModules (#22873)
Summary:
Since recursive script creates a ScriptModule from an `nn.Module`,
there's no ties to the original module to pull a type name from, so we
have to explicitly pass it in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22873

Pulled By: driazati

Differential Revision: D16268547

fbshipit-source-id: 902a30e6e36427c6ba7033ded027a29d9dcbc1ee
2019-07-15 15:27:29 -07:00
248336946e remove stray print
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22825

Differential Revision: D16266401

Pulled By: Krovatkin

fbshipit-source-id: 214f90578061aad83eab143381b3c05386edee3d
2019-07-15 14:54:10 -07:00
f7de9be3c0 Add FakeQuantize Module (#21767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767

Adding FakeQuantize Module
for quantization aware training

Reviewed By: dzhulgakov

Differential Revision: D15728503

fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6
2019-07-15 14:08:55 -07:00
0cddd3e751 update README (#21312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312

This diff updates the README of op-bench.

Reviewed By: zheng-xq

Differential Revision: D15612665

fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c
2019-07-15 13:34:05 -07:00
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
260b0e8476 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22726

Differential Revision: D16197608

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 22250d6f0d249f61f269afb4fe8e7d1af0be1205
2019-07-15 13:13:16 -07:00
5fc1260e0a Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22725

Differential Revision: D16197605

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d32c975486b0cb4808687f0aa89325571f2817c4
2019-07-15 13:13:12 -07:00
16aa235f43 _script_compile and _script_class_compile add to the python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22724

Differential Revision: D16197609

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: e12b31f8c8ce14b0968f4ac9445e7d225126b210
2019-07-15 13:13:08 -07:00
f2f80744be Close loophole to create untyped tuples (#22518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22518

-

Reviewed By: dzhulgakov

Differential Revision: D16115216

fbshipit-source-id: 1afae3666f7acd7d7833db8a72168364fed4879d
2019-07-15 11:33:45 -07:00
800f4936f0 Deprecate untyped Lists (#22517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22517

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Reviewed By: dzhulgakov

Differential Revision: D16115214

fbshipit-source-id: 2c8d0e4e375339c699d583995f79c05c59693c3e
2019-07-15 11:33:35 -07:00
bd88fd0793 Added .bfloat16() (#22852)
Summary:
Add conversion method for bfloat16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22852

Differential Revision: D16256760

Pulled By: izdeby

fbshipit-source-id: 01d75495f9df513a0cdf78791c3eb013ab92bd95
2019-07-15 09:32:18 -07:00
8399197df6 Set up CI with Azure Pipelines (#22839)
Summary:
Introduce Azure Pipelines for the linting checks.  This is meant to be equivalent to the existing Travis linting phase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22839

Differential Revision: D16260376

Pulled By: ezyang

fbshipit-source-id: 1e535c3096358be67a0dad4cd920a92082b2d18e
2019-07-15 06:41:56 -07:00
535c5540bc Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794)
Summary:
Original commit changeset: 227df3b85316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794
ghstack-source-id: 86400904

Differential Revision: D16222777

fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a
2019-07-15 06:28:56 -07:00
317cf7c874 Remove tensor_data() call in Python Variable() and nn.Parameter() constructors (#22821)
Summary:
As part of the Variable/Tensor merge, `variable.tensor_data()` should be removed in favor of `variable.detach()`. This PR removes  `tensor_data()` call sites in Python `Variable()` and `nn.Parameter()` constructor paths.

Note that this PR is BC-breaking in the following way:
- For Python `Variable()` constructor:
Previously, in-place updating a tensor after it's been used to create a Variable does not bump the Variable's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.autograd.Variable(t).requires_grad_()
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create a Variable will also bump the Variable's version counter, thus preserving the correctness of the Variable's version counter.

- For Python `nn.Parameter()` constructor:
Previously, in-place updating a tensor after it's been used to create an nn.Parameter does not bump the nn.Parameter's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.nn.Parameter(t)
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create an nn.Parameter will also bump the nn.Parameter's version counter, thus preserving the correctness of the nn.Parameter's version counter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22821

Differential Revision: D16258030

Pulled By: yf225

fbshipit-source-id: 9a6d68cea1864893193dbefbb6ef0c1d5ca12d78
2019-07-14 21:09:29 -07:00
14e8fb70a1 Make the signature of fill_out consistent with fill_.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22761

Test Plan: Imported from OSS

Differential Revision: D16257779

Pulled By: ezyang

fbshipit-source-id: b1201500042ae1f4678835da957de1777c1038a3
2019-07-14 19:20:59 -07:00
1c266c2738 Move the body of fill_kernel_impl into fill_kernel_cuda
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22760

Test Plan: Imported from OSS

Differential Revision: D16257782

Pulled By: ezyang

fbshipit-source-id: d214d2d77affd937109b33ca841af76004f85834
2019-07-14 19:20:53 -07:00
fc297b8e83 Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22758

Test Plan: Imported from OSS

Differential Revision: D16257781

Pulled By: ezyang

fbshipit-source-id: 9e5ed06e95ef65b036eb388488faad981f1e8012
2019-07-14 19:20:46 -07:00
815e73bc20 make_variable consumes the Tensor if it only has one reference
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22705

Test Plan: Imported from OSS

Differential Revision: D16192220

Pulled By: jamesr66a

fbshipit-source-id: 9c42bb759077b74a1370d3a2d7114ed3593f333b
2019-07-14 18:36:20 -07:00
b5fa9a340a Temporarily skip mypy-0.720 to unbreak master type checks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22835

Differential Revision: D16239190

Pulled By: bddppq

fbshipit-source-id: e97fd3aae0676de8a06dc9fb498f36ed28dc92c3
2019-07-14 09:49:24 -07:00
1a93b96815 Revert da315a4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22837

Differential Revision: D16239667

Pulled By: llyfacebook

fbshipit-source-id: 1a625d78d633927129dd2791e65b333b3902f94f
2019-07-13 01:54:20 -07:00
92468f0a6b Revert D16238204: Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16238204

Original commit changeset: a6b5eb3f4820

fbshipit-source-id: bdfae93c522c1ce734ab8dc736ced66411fe50ee
2019-07-12 22:58:50 -07:00
da315a4e2a Revert D16037021: Support GRU module quantization in Pytorch
Differential Revision:
D16037021

Original commit changeset: 71145c67d869

fbshipit-source-id: 33cd2e57eba30ea33cc4f3116732a721c26f6efb
2019-07-12 21:05:34 -07:00
fcfefc3439 Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16224780

Original commit changeset: 331dafbb7689

fbshipit-source-id: a6b5eb3f4820fbb58d4a329aa4c93b40a111ff27
2019-07-12 20:55:05 -07:00
ead1193241 Transfer Learning: Caffe2 load op changes to return shape inference (#22829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829

Sending out caffe2 load op changes separately since we want pick it to open source.

This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group.

Reviewed By: boryiingsu

Differential Revision: D16229465

fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2
2019-07-12 19:45:13 -07:00
d8c1b86135 Support GRU module quantization in Pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22498

Reviewed By: BIT-silence

Differential Revision: D16037021

fbshipit-source-id: 71145c67d8696e525b686cd3313033e5b6771718
2019-07-12 18:31:08 -07:00
ba9d559a12 Get rid of torch.mean shape analysis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22810

Test Plan: Imported from OSS

Differential Revision: D16226973

Pulled By: jamesr66a

fbshipit-source-id: ad23f48782e8d21788ecae39fc512ff4502716bf
2019-07-12 17:50:10 -07:00
9eb039334f Use Linear Operator with fp16 weights in JIT (#22323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22323

This diff adds an interface to use quantized Linear op in JIT.

Reviewed By: jamesr66a

Differential Revision: D16040724

fbshipit-source-id: 90e90aff9973c96ea076ed6a21ae02c349ee2bcf
2019-07-12 15:59:17 -07:00
573d9e6975 Support Linear operation with fp16 weights in ATen (#22023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023

This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y  =   X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y

Reviewed By: jianyuh

Differential Revision: D15921768

fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d
2019-07-12 15:59:13 -07:00
35ee4bf4e5 Revert D16204820: [pytorch][PR] optimize topk on cpu using parallel and partial sort
Differential Revision:
D16204820

Original commit changeset: ea70562c9149

fbshipit-source-id: c8f8e262c7c681593d243f035bf1f0d84675c9dc
2019-07-12 15:14:06 -07:00
cf2889ad8f add support for breaks and continues (#21692)
Summary:
Add support for breaks and continues in the jit. We do with a Graph transform pre-SSA.

A graph of the form
```
def test():
    while i < 5:
        if i == 3:
            break
        i += 1
        print(i)
```
has the body of the loop transformed to
```
if i == 3:
    did_break = True
else:
    did_break = False
if did_break:
    loop_exit = True
else:
    i += 1
    print(i)
    loop_exit = i < 5
```

I am going to add more tests but I think it is ready for review now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21692

Differential Revision: D16215807

Pulled By: eellison

fbshipit-source-id: 365102f42de4861d9323caaeb39a96de7619a667
2019-07-12 15:02:44 -07:00
b3147bc674 PyTorch export to ONNX Opset 7 and 8 - Cont (#22421)
Summary:
This is an extension to the original PR https://github.com/pytorch/pytorch/pull/21765

1. Increase the coverage of different opsets support, comments, and blacklisting.
2. Adding backend tests for both caffe2 and onnxruntime on opset 7 and opset 8.
3. Reusing onnx model tests in caffe2 for onnxruntime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22421

Reviewed By: zrphercule

Differential Revision: D16225518

Pulled By: houseroad

fbshipit-source-id: 01ae3eed85111a83a0124e9e95512b80109d6aee
2019-07-12 14:52:48 -07:00
9f8e2c067f MIOpen integration into pytorch RNN operators (#22774)
Summary:
This PR enables pytorch RNN operators to use MIOpen engine

ezyang bddppq

cc: lcskrishna iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22774

Differential Revision: D16224780

Pulled By: bddppq

fbshipit-source-id: 331dafbb76892d7390b620a95d8f384d38ee5533
2019-07-12 14:47:48 -07:00
30e03df638 Speeds up fast-path for 1D tensors (#22756)
Summary:
Using PMCTest (https://www.agner.org/optimize/) to measure
TensorIterator construction, this results in ~600 fewer instructions
retired (~300 fewer cycles) for constructing TensorIterator on a 1D
tensor. (Should be roughly ~100 ns, but it's hard to measure that
precisely end-to-end).

```
Before:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      5082       2768       5690       7644          3

After:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      4518       2437       5109       6992          0
```

Note that Instruct is reliable, Core cyc is a little noisy, and Clock
is a little more noisy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22756

Differential Revision: D16207777

Pulled By: VitalyFedyunin

fbshipit-source-id: bcc453a90472d9951a1c123bcb1b7a243fde70ac
2019-07-12 12:33:38 -07:00
02bc06a683 avoid kernel launches for zero-sized tensor inputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790

Test Plan: Imported from OSS

Differential Revision: D16226168

Pulled By: wanchaol

fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc
2019-07-12 12:24:52 -07:00
b1b65f34a9 Make PythonArgs::tensor and PythonArgs::scalar faster (#22782)
Summary:
Speeds up the common case where Tensor is a torch.Tensor (not a
subclass). This reduces the number of executed instructions for a
torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster).

Note that most of the PythonArgs accessors are too large to be inlined.
We should move most of them to the cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782

Differential Revision: D16223592

Pulled By: colesbury

fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1
2019-07-12 11:57:29 -07:00
10c14ad17c optimize topk on cpu using parallel and partial sort (#19736)
Summary:
This PR aims at improving `topk()` performance on CPU. This is useful when computing **beam search** during `Transformer` and `BERT`.

Given a tensor x of size `[N, C]`, and we want to apply `x.topk(K)`, the current logic is **sequentially** loop on the dimension of `N` and do **quick select** on the dimension of `C` so as to find out top K elements.

Performance can be further improved from:

- On the dimension of `N`, it can be paralleled
- Maybe a faster sorting algorithm for `topk`. (After a bunch of experimenting, `std::partial_sort` seems to be the most promising)

So i compared 3 versions:

1. vanilla: sequential + quick select
2. reference PR https://github.com/pytorch/pytorch/issues/19737: parallel + quick select
3. this PR: parallel + partial sort

with the following benchmark, on `Xeon 8180, 2*28 cores@2.5 GHz`:
```python
import torch
from time import time

num_iters = 1000

def bench_topk(N=8, C=168560, k=10):
    a = torch.randn(N, C)
    # warm up
    for i in range(100):
        torch.topk(a, k)

    t = 0
    for i in range(num_iters):
        a = torch.randn(N, C)
        start = time()
        value, indice = torch.topk(a, k)
        t += time() - start
    print("#[%d, %d] times: %f ms" % (N, C, t / num_iters * 1000))

Ns = [10, 20, 30]
Cs = [10000, 20000, 40000, 80000, 160000, 320000]

for n in Ns:
    for c in Cs:
        bench_topk(N=n, C=c)

```
### vanilla: sequential + quick select
```
#[10, 10000] times: 0.746740 ms
#[10, 20000] times: 1.437399 ms
#[10, 40000] times: 2.832455 ms
#[10, 80000] times: 5.649426 ms
#[10, 160000] times: 11.309466 ms
#[10, 320000] times: 22.798765 ms
#[20, 10000] times: 1.511303 ms
#[20, 20000] times: 2.822024 ms
#[20, 40000] times: 5.564770 ms
#[20, 80000] times: 11.443044 ms
#[20, 160000] times: 22.747731 ms
#[20, 320000] times: 46.234449 ms
#[30, 10000] times: 2.214045 ms
#[30, 20000] times: 4.236179 ms
#[30, 40000] times: 8.418577 ms
#[30, 80000] times: 17.067578 ms
#[30, 160000] times: 33.826214 ms
#[30, 320000] times: 68.109420 ms
```
### reference PR: parallel + quick select
```
#[10, 10000] times: 0.271649 ms
#[10, 20000] times: 0.593016 ms
#[10, 40000] times: 1.133518 ms
#[10, 80000] times: 2.082355 ms
#[10, 160000] times: 4.049928 ms
#[10, 320000] times: 7.321285 ms
#[20, 10000] times: 0.315255 ms
#[20, 20000] times: 0.539054 ms
#[20, 40000] times: 1.000675 ms
#[20, 80000] times: 1.914586 ms
#[20, 160000] times: 4.437122 ms
#[20, 320000] times: 8.822445 ms
#[30, 10000] times: 0.347209 ms
#[30, 20000] times: 0.589947 ms
#[30, 40000] times: 1.102814 ms
#[30, 80000] times: 2.112201 ms
#[30, 160000] times: 5.186837 ms
#[30, 320000] times: 10.523023 ms
```
### this PR: parallel + partial sort
```
#[10, 10000] times: 0.150284 ms
#[10, 20000] times: 0.220089 ms
#[10, 40000] times: 0.521875 ms
#[10, 80000] times: 0.965593 ms
#[10, 160000] times: 2.312356 ms
#[10, 320000] times: 4.759422 ms
#[20, 10000] times: 0.167630 ms
#[20, 20000] times: 0.265607 ms
#[20, 40000] times: 0.471477 ms
#[20, 80000] times: 0.974572 ms
#[20, 160000] times: 3.269645 ms
#[20, 320000] times: 6.538608 ms
#[30, 10000] times: 0.204976 ms
#[30, 20000] times: 0.342833 ms
#[30, 40000] times: 0.589381 ms
#[30, 80000] times: 1.398579 ms
#[30, 160000] times: 3.904077 ms
#[30, 320000] times: 9.681224 ms
```
In summary, `2` is **5x** faster than `vanilla` on average and `3` is **8.6x** faster than `vanilla`.
On `Fairseq Transformer`, the default parameter on dataset `wmt14` would have a `topk` size of `[8, 168560]`, and this operator gets `3x` faster with this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19736

Differential Revision: D16204820

Pulled By: VitalyFedyunin

fbshipit-source-id: ea70562c9149a0d832cf5872a891042ebd74fc63
2019-07-12 11:10:20 -07:00
fc23d7f3bd Speed up TensorIterator::compute_strides a little (#22779)
Summary:
For three 1-D operands, compute_strides now takes 298 instructions instead
of 480. (Saves ~36 ns).  We'll want to make Tensor::sizes(), strides(), and
element_size() trivially inlinable to speed this up more.

(Using PMCTest from https://www.agner.org/optimize/ to measure instructions retired)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22779

Differential Revision: D16223595

Pulled By: colesbury

fbshipit-source-id: e4730755f29a0aea9cbc82c2d376a8e6a0c7bce8
2019-07-12 10:57:32 -07:00
f266a63eeb Initiate checkCuda90Bug warning (#22757)
Summary:
Initiate checkCuda90Bug warning to THCudaBlas_Sgemm and THCudaBlas_Hgemm.
https://github.com/pytorch/pytorch/pull/22034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22757

Differential Revision: D16223085

Pulled By: zhangguanheng66

fbshipit-source-id: 470c6cbaba16a3cec295993c2673f02008a602a6
2019-07-12 09:55:09 -07:00
ccb28939bf Revert D16222539: [pytorch][PR] Let users pass CMake-specific options starting with CMAKE_ to CMake.
Differential Revision:
D16222539

Original commit changeset: 1cc6e69c85cd

fbshipit-source-id: c79d68976ac1047c54b32c093429b23e9482cd8f
2019-07-12 07:57:57 -07:00
612eed31a9 Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16222539

Pulled By: ezyang

fbshipit-source-id: 1cc6e69c85cdf0d7f8074653445410d85746847c
2019-07-12 07:28:32 -07:00
7eb0319339 add new tests to benchmark_all_test (#22787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787

as title

Reviewed By: hl475

Differential Revision: D16219329

fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b
2019-07-11 22:50:55 -07:00
1878800f47 make custom op work in OSS environment (#22781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781

The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required.

Reviewed By: hl475

Differential Revision: D16214430

fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376
2019-07-11 21:17:17 -07:00
8ec712da30 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_linear (#22403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22403

- C10 Operator Registration (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/op_registration/op_registration.cpp) supports None type.

- ATen has None Tensor support, e.g., https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L1078

Reviewed By: zafartahirov

Differential Revision: D16069522

fbshipit-source-id: 3acaec783fc138ff36b14ffc0582d0764be4ad34
2019-07-11 17:33:08 -07:00
9d11004ee4 Update ONNX constant folding to support opset 10. (#22515)
Summary:
Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 of ONNX. For opset 10, it is a no-op. This change enables ONNX constant folding for opset 10. Specifically there are three main changes:
1) Turn on constant folding ONNX pass for opset 10.
2) Update support for opset 10 version of `onnx::Slice` op for backend computation during constant folding.
3) Enable constant folding tests in `test/onnx/test_utility_funs.py` for multiple opsets (9 and 10).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22515

Reviewed By: zrphercule

Differential Revision: D16189336

Pulled By: houseroad

fbshipit-source-id: 3e2e748a06e4228b69a18c5458ca71491bd13875
2019-07-11 16:29:03 -07:00
291570e085 make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22723

Test Plan: Imported from OSS

Differential Revision: D16197604

Pulled By: suo

fbshipit-source-id: b22491a58aa9ea476acab06614093ff004291407
2019-07-11 14:55:43 -07:00
de819be93e refactor self to be a class again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22722

Test Plan: Imported from OSS

Differential Revision: D16197607

Pulled By: suo

fbshipit-source-id: b4dd96b3f9cc46b48678aab0ff89afc3666e2185
2019-07-11 14:55:39 -07:00
22d70e0d4b Give functions qualified names
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22721

Test Plan: Imported from OSS

Differential Revision: D16197606

Pulled By: suo

fbshipit-source-id: 94718fcdb0d3b651f16674af3cfd6249ed4533ae
2019-07-11 14:55:34 -07:00
4b48ae4aec Suppress progress bar only for pip install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22708

Test Plan: Imported from OSS

Differential Revision: D16206329

Pulled By: ezyang

fbshipit-source-id: 4ec29e0e9e48a168e88ec716ee8e270c56a38cdb
2019-07-11 13:50:29 -07:00
05d56bd1b6 Remove hard-coded NVRTC specific constant from fuser header
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699

Test Plan: Imported from OSS

Differential Revision: D16192290

Pulled By: bwasti

fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf
2019-07-11 13:44:25 -07:00
513b7a7a06 assert_no_internal_overlap pass op name by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22729

Test Plan: Imported from OSS

Differential Revision: D16205448

Pulled By: jamesr66a

fbshipit-source-id: b383c461dd58e8a3d0bfeae43ebfd1e021668f80
2019-07-11 13:38:10 -07:00
9690f8629d Move the storage in empty_cpu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22728

Test Plan: Imported from OSS

Differential Revision: D16205449

Pulled By: jamesr66a

fbshipit-source-id: 6fd198d0d526b5de393e2988906dac2a63064f24
2019-07-11 13:38:07 -07:00
a797815198 bucketize op shape inference (#22716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22716

add shape inference func to bucketize op

Reviewed By: ipiszy

Differential Revision: D16193718

fbshipit-source-id: 6e893356b6408255538545673047dd5124837e70
2019-07-11 12:44:29 -07:00
ac78a86e1d Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749)
Summary:
Original commit changeset: add2ee8a8865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749
ghstack-source-id: 86323899

Differential Revision: D16203552

fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e
2019-07-11 12:21:21 -07:00
8bdda03ae1 optimize RNN on CPU (#22512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22512

optimize RNN on CPU

Reviewed By: llyfacebook

Differential Revision: D16113360

fbshipit-source-id: 9ee53b3b4bb9b636e7be1ccdf25420e2caa60762
2019-07-11 12:16:27 -07:00
Jie
3135298dde (#22602)
Summary:
1. update on restricting block.z <= 64, compliant to CUDA maximum z-dimension of
a block;
2. clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22602

Differential Revision: D16203857

Pulled By: ezyang

fbshipit-source-id: 567719ae175681a48eb0f818ca0aba409dca2550
2019-07-11 12:02:58 -07:00
1682d38a25 Improve hypothesis_utils.py for qtensor (#22693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22693

change np.finfo to torch.finfo

Differential Revision: D16185556

fbshipit-source-id: 594f8ba1d6317ac2de47af754a8bd6015d40ea15
2019-07-11 11:56:01 -07:00
3fabb9f105 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22737

Differential Revision: D16200090

Pulled By: bddppq

fbshipit-source-id: 3819716a9b01f073966fc8b420c6a0b8d13232ac
2019-07-11 11:09:24 -07:00
45cf33a731 add fill_diagonal function (#21892)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21892

Differential Revision: D16164678

Pulled By: colesbury

fbshipit-source-id: 85df8ae9b7a6a91b6023fe7295b3a8124e4526ea
2019-07-11 09:20:44 -07:00
89d6e88042 Add environment variables used in CONTRIBUTING example (#22736)
Summary:
Some other environment variables can be added to speed things up for development.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22736

Differential Revision: D16200904

Pulled By: soumith

fbshipit-source-id: 797ef91a863a244a6c96e0adf64d9f9b4c9a9582
2019-07-11 04:15:51 -07:00
5147819f9d enabled MIOpen depthwise convolutions (#22696)
Summary:
They mistakenly got removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22696

Differential Revision: D16191442

Pulled By: bddppq

fbshipit-source-id: 7ceda274c557879e11f84596040efe9e0c9b861f
2019-07-11 00:14:58 -07:00
d21e476dcd Quantized Conv2d Module (#21323)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21808 Quantized conv avoid functional usage&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15835572/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)

Quantized Conv2d Module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21323

Test Plan:
Tests are split into two parts: functional and API.

`buck test mode/dev caffe2/test:quantized -- test_conv_api` : https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491

```
Parsing buck files: finished in 1.4 sec
Building: finished in 4.6 sec (100%) 7136/7136 jobs, 2 updated
  Total time: 6.1 sec
Trace available for this run at /tmp/testpilot.20190703-153023.392592.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 7149de230b9e1cdc7a872bb31fe099f0616dee09 fbpkg e59e6ab0fe8e47a496f915d34555c3ad at Fri Jun 28 12:20:54 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/647/t.par
Discovering tests
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.044 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.109 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
Summary (total time 9.08s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D15551835

Pulled By: zafartahirov

fbshipit-source-id: 481a7df4b8a88e485437e1596eefb08d5e6766fa
2019-07-10 21:31:24 -07:00
ad634875d0 Mark Unpickler data ptr arg as const
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22690

Differential Revision: D16184299

Pulled By: mrshenli

fbshipit-source-id: 332954028533952dad01df03eca8e95bf6fe67a9
2019-07-10 20:07:13 -07:00
4240220926 Revert D16183577: Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Differential Revision:
D16183577

Original commit changeset: f86838c407db

fbshipit-source-id: bbf53ce52a20b1e90b1fe522d73e558d8044c4ba
2019-07-10 18:29:22 -07:00
1ecc945ab2 Revert D15998762: [jit] Give functions qualified names
Differential Revision:
D15998762

Original commit changeset: bc2b734f626a

fbshipit-source-id: a118cc4e9a34233279e8380529a8d8120a25839d
2019-07-10 16:10:28 -07:00
a1ca32409f Revert D15998758: [jit] refactor self to be a class again
Differential Revision:
D15998758

Original commit changeset: 14bad87bb6e4

fbshipit-source-id: f2c29974d4afc4d8f88a36e9c266e6d5a22a6191
2019-07-10 16:10:24 -07:00
e6eb17303f Revert D16184799: [jit] make CompilationUnit::define return defined functions
Differential Revision:
D16184799

Original commit changeset: 9f77a7ca2223

fbshipit-source-id: a0e08220d924a6ca55bf2f1f77754553d0133595
2019-07-10 16:10:20 -07:00
fffa7200c1 fixing lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22703

Differential Revision: D16188326

Pulled By: izdeby

fbshipit-source-id: 72e6b6f957068c3995010a1b811f24cd2304ff6f
2019-07-10 16:02:21 -07:00
67c634d58e add a comment to native_functions explaining softmax interfaces (#22651)
Summary:
Address the review comment made by gchanan here:

https://github.com/pytorch/pytorch/pull/22456#discussion_r300715866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22651

Differential Revision: D16181828

Pulled By: nairbv

fbshipit-source-id: 0d41a9024c2664298c281e198a997be73e7f8499
2019-07-10 15:34:29 -07:00
0196e0bafb add line numbers to jit_log.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22630

Differential Revision: D16172090

Pulled By: Krovatkin

fbshipit-source-id: 26cdb0077a0bfbf9981e39359472f3251546db53
2019-07-10 15:28:29 -07:00
c49a71f91f make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22667

Test Plan: Imported from OSS

Differential Revision: D16184799

Pulled By: suo

fbshipit-source-id: 9f77a7ca2223237fbcb4b12a4734b7d334f7be13
2019-07-10 15:19:11 -07:00
ee9c8a75f4 refactor self to be a class again (#22207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22207
ghimport-source-id: 36ee8bd17411a2e220665ad2a27364653061070e

Test Plan: Imported from OSS

Differential Revision: D15998758

Pulled By: suo

fbshipit-source-id: 14bad87bb6e44bf1a43ae86339d8cc7b311c76dd
2019-07-10 15:19:07 -07:00
c0674cebf1 Give functions qualified names (#22206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22206
ghimport-source-id: d453219d907e048f24eb7f63c096b2c300307c83

Test Plan: Imported from OSS

Differential Revision: D15998762

Pulled By: suo

fbshipit-source-id: bc2b734f626ab07f97dc50ddf1b021e8b46de312
2019-07-10 15:19:03 -07:00
86fc417147 Move Quantization Models to common_quantization (#22706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22706

Moved the models used for quantization test from the test_quantization.py file to common_quantization.py

Reviewed By: jerryzh168

Differential Revision: D16189865

fbshipit-source-id: 409b43454b6b3fe278ac16b1affb9085d6ed6835
2019-07-10 15:05:49 -07:00
ebafa2e15f Turn on USE_DIRECT_NVRTC in fbcode again. (#22685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22685
ghstack-source-id: 86247780

Reviewed By: bddppq

Differential Revision: D16182352

fbshipit-source-id: fc51aa7c1112904b8cccd055dc87e10c836cf2fb
2019-07-10 15:05:45 -07:00
edeb4dbdcb register __getitem__ builtin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22276

Test Plan: Imported from OSS

Differential Revision: D16060595

Pulled By: wanchaol

fbshipit-source-id: e1e27d6be8d62fc1a841860a783aff108980d9d3
2019-07-10 14:53:35 -07:00
368dbb9ab3 Fix a FIXME in test_nn (#22675)
Summary:
https://github.com/pytorch/pytorch/issues/17262 is already resolved, so this should pass now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22675

Differential Revision: D16188003

Pulled By: zou3519

fbshipit-source-id: 32693229a0590b274ed1bf76b815f17e77c2d3ea
2019-07-10 13:12:50 -07:00
00df49c984 Fix Trace inlining of graphs with optional inputs (#22686)
Summary:
Previously in tracing when we called a script function we would inline the graph and set the graph inputs equal to the types the graph was invoked with.

This breaks for optional arguments invoked with None since we rely on None being set to Optional[T] in schema matching.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22686

Differential Revision: D16186372

Pulled By: eellison

fbshipit-source-id: e25c807c63527bf442eb8b31122d50689c7822f5
2019-07-10 12:57:06 -07:00
3e3e6ee335 Add common_quantized test case utilities (#22694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22694

Move quantization and quantized utility functions for testing to common_quantized.py and common_quantization.py.  Addditionally, add a quantized test case base class which contains common methods for checking the results of quantization on modules.  As a consequence of the move, fixed the import at the top of test_quantized.py, and test_quantization to use the new utility

Reviewed By: jerryzh168

Differential Revision: D16172012

fbshipit-source-id: 329166af5555fc829f26bf1383d682c25c01a7d9
2019-07-10 12:23:36 -07:00
7750cae722 Refactor and improve randperm tests.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121

Test Plan: Imported from OSS

Differential Revision: D16153794

Pulled By: li-roy

fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9
2019-07-10 12:23:33 -07:00
32709af8f4 Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22103

Test Plan: Imported from OSS

Differential Revision: D16153585

Pulled By: li-roy

fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da
2019-07-10 12:23:29 -07:00
0f7c3710dd Support Half type in randperm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102

Test Plan: Imported from OSS

Differential Revision: D16153586

Pulled By: li-roy

fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff
2019-07-10 12:23:25 -07:00
9c4c9c3af0 Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22326

Test Plan: Imported from OSS

Differential Revision: D16183577

Pulled By: colesbury

fbshipit-source-id: f86838c407db4ded9ce70998bf1ab1ffd75b3b58
2019-07-10 12:17:52 -07:00
574e808680 Add a bitwise NOT operator for integer and Boolean types (CUDA).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320

Test Plan: Imported from OSS

Differential Revision: D16183578

Pulled By: colesbury

fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034
2019-07-10 12:17:48 -07:00
e2dc1fc715 Add a bitwise NOT operator for integer and Boolean types (CPU).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22283

Test Plan: Imported from OSS

Differential Revision: D16183576

Pulled By: colesbury

fbshipit-source-id: 2e539fab8ff885dddb9bff334d1d784b28d65b8f
2019-07-10 12:17:44 -07:00
mal
58e20638f7 Refactoring _wrap_outputs to remove python dependence.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631

Test Plan:
test suite

Imported from OSS

Differential Revision: D16185040

fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b
2019-07-10 12:12:16 -07:00
ec1b669d23 fix dce over loops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22632

Test Plan: Imported from OSS

Differential Revision: D16184469

Pulled By: suo

fbshipit-source-id: b7cc2d20a7dd8b287e1b6128ddb70d3936032a7e
2019-07-10 12:03:19 -07:00
9b8d771733 skip import nccl and gloo_gpu in cpu machine (#22522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22522

Skip importing nccl and gloo_gpu modules in cpu machine

Reviewed By: bddppq

Differential Revision: D16115827

fbshipit-source-id: 329b7a0bb5eccb78c9e772bdab5db7c79b546d55
2019-07-10 11:56:56 -07:00
b984b0ab4b fix print (#22689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22689

att

Reviewed By: Lucaskabela

Differential Revision: D16184260

fbshipit-source-id: 1a6ad51a37918d0c81d6e3baa0ca0baa32cb9673
2019-07-10 11:26:34 -07:00
f81395b3e3 Enable more passes in ProfilingGraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22079

Differential Revision: D16119322

Pulled By: Krovatkin

fbshipit-source-id: 301fcc42d0e1f031d9de5bcd9679fb8c2d742fef
2019-07-10 10:44:18 -07:00
10c60b601a Added Bfloat16 tensor for cpu with very limited support (#21860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21860
ghimport-source-id: 5290755b63033cdfdeb911a4ecf4aa282b3db02d

Test Plan: Imported from OSS

Differential Revision: D15856091

Pulled By: izdeby

fbshipit-source-id: 54e7e17be1b5c5a2e80a41feaeaeba75dbb8108f
2019-07-10 09:08:52 -07:00
6eb3969ac7 keep reuqires_grad unchanged after converting bn to syncbn (#22569)
Summary:
After converting BN layers to SyncBN layers, the function will set all `requires_grad = True` regardless of the original requires_grad states. I think it is a bug and have fixed it in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22569

Differential Revision: D16151647

Pulled By: zou3519

fbshipit-source-id: e2ad1886c94d8882485e7fb8be51ad76469ecc67
2019-07-10 08:38:04 -07:00
cbb0b8166d Revert D16161144: [pytorch][PR] Add traces to LowerGradOf and SpecializeAutoGrad
Differential Revision:
D16161144

Original commit changeset: 9e206fcfb179

fbshipit-source-id: 8f9eecb5cd6ca715bd0c647c32cf77cd9d88e6ac
2019-07-10 06:55:01 -07:00
3a8d7463bd Enabled BFloat16 storage (#21523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523
ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0

Test Plan: Imported from OSS

Differential Revision: D15819368

Pulled By: izdeby

fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c
2019-07-09 21:51:06 -07:00
932ec8aa9f Updating submodules
Reviewed By: zpao

fbshipit-source-id: f5636ab0457c1b2e15df95a5677a7194978d9cd0
2019-07-09 21:39:57 -07:00
e72b617eb5 Intoducing bfloat16 type (#21522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21522
ghimport-source-id: 4803f197ec04938501fdb10c1741280331c349d2

Test Plan: Imported from OSS

Differential Revision: D15819369

Pulled By: izdeby

fbshipit-source-id: 46408dc316a5c4dc644a736dc42da2422b34bcb9
2019-07-09 21:14:10 -07:00
de5a481c6e add forward declaration in stateful dataset (#22562)
Summary:
Addressing potential dependency issue by adding forward declaration for OutputArchive/InputArchive.

This change follows the same pattern in base.h in 'torch/csrc/api/include/torch/data/samplers/base.h'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22562

Differential Revision: D16161524

Pulled By: soumith

fbshipit-source-id: d03f8a2ece5629762f9fa8a27b15b0d037e8f07b
2019-07-09 16:41:56 -07:00
3cf5f22f02 Enable C2 operators running with {cpu, gpu} * {forward, backward} (#22664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664

This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}.

Reviewed By: hl475

Differential Revision: D15781789

fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b
2019-07-09 16:41:53 -07:00
95a5da175d change c2 bench to use new tensor creation interface (#22663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663

as title

Reviewed By: hl475

Differential Revision: D15744502

fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016
2019-07-09 16:41:49 -07:00
e1fdf8a46f Add comments about adding new build options. (#22641)
Summary:
Also revert the change of cmake.py in
c97829d7011bd59d662f6af9c3a0ec302e7e75fc . The comments are added to
prevent future similar incidents in the future (which has occurred a couple of times in the past).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22641

Differential Revision: D16171763

Pulled By: ezyang

fbshipit-source-id: 5a65f9fbb3c1c798ebd25521932bfde0ad3d16fc
2019-07-09 16:41:46 -07:00
e2216ada65 Properly formats errors rising up from C++ extension compilation (#22445)
Summary:
Here's a C++ extension with a missing semicolon:
```python
torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }')
```
which currently generates this error
```
RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
-isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In
function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23:
error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int
main() { return 0 }\n                       ^\nninja: build stopped: subcommand failed.\n'
```

After this PR, the error is
```
RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
FAILED: main.o
c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test -
DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-
packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
/tmp/torch_extensions/test/main.cpp: In function ‘int main()’:
/tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token
 int main() { return 0 }
                       ^
ninja: build stopped: subcommand failed.
```
which is a lot easier to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445

Differential Revision: D16094205

Pulled By: ezyang

fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 16:41:42 -07:00
50901be9fb Add traces to LowerGradOf and SpecializeAutoGrad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22599

Differential Revision: D16161144

Pulled By: Krovatkin

fbshipit-source-id: 9e206fcfb1796e9448e80f178b75d0c277bd348f
2019-07-09 16:41:39 -07:00
0c2cd93e43 Avoid potential extra copy in _lu_with_info_cuda (#22634)
Summary:
No need to `clone` if the expanded size matches original size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22634

Differential Revision: D16171091

Pulled By: ezyang

fbshipit-source-id: 3d8f116398f02952488e321c0ee0ff2868768a0c
2019-07-09 16:41:36 -07:00
45aad2e680 change unary, pool, max ops to use new interface (#22661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661

as title

Reviewed By: hl475

Differential Revision: D16170825

fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217
2019-07-09 16:41:32 -07:00
2b2fe525b9 introduce a new interface to add a list of operators (#21209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209

This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface:

- create op_list:
```unary_ops_list = op_bench.op_list(
    attr_names=["op_name", "op_function"],
    attrs=[
         ["abs", torch.abs],
         ["abs_", torch.abs_],
   ],
)
```
-  create a bench class:
```
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
    def init(self, M, N, op_function):
        self.input_one = torch.rand(M, N)
        self.op_func = op_function

    def forward(self):
        return self.op_func(self.input_one)
```
- 3. register those ops
``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
 ```

Reviewed By: zheng-xq

Differential Revision: D15514188

fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f
2019-07-09 16:41:29 -07:00
164388150a fix lint (#22654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22654

att

Reviewed By: bddppq

Differential Revision: D16168219

fbshipit-source-id: db1a5e2161e7be70b2f6e6b4beaa27ea91f853f2
2019-07-09 16:41:26 -07:00
8a233b99cb Report errors through call stack (#22280)
Summary:
The error for `test_error_stack_module`:

```
Traceback (most recent call last):
  File "../test.py", line 35, in <module>
    scripted = torch.jit.script(M())
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1119, in script
    return _convert_to_script_module(obj)
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1825, in _convert_to_script_module
    raise e
RuntimeError:

d(int x) -> int:
Expected a value of type 'int' for argument 'x' but instead found type 'str'.
:
at ../test.py:11:12
def c(x):
    return d("hello") + d(x)
           ~ <--- HERE

'c' is being compiled since it was called from 'b'
at ../test.py:14:12
def b(x):
    return c(x)
           ~~~ <--- HERE

'b' is being compiled since it was called from 'forward'
at ../test.py:22:16
    def forward(self, x):
        return b(x)
               ~~~ <--- HERE

'forward' is being compiled since it was called from 'forward'
at ../test.py:31:20
    def forward(self, x):
        return x + self.submodule(x)
                   ~~~~~~~~~~~~~~~~ <--- HERE
```

This also unifies our error reporting in the front end with `ErrorReport`

TODO
* Include module names in message, #22207 should make this easy

](https://our.intern.facebook.com/intern/diff/16060781/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22280

Pulled By: driazati

Differential Revision: D16060781

fbshipit-source-id: c42968b53aaddb774ac69d5abbf7e60c23df8eed
2019-07-09 16:41:22 -07:00
13d58fd9f5 README for the quantized op creation (#22165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22165

Workflow description for the quantized ops design.

Reviewed By: jerryzh168

Differential Revision: D15975977

fbshipit-source-id: ef73b172f609adef149c157c404bb452b5457a9f
2019-07-09 16:41:19 -07:00
dd4982e287 Cleanup integer sign warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22560

Test Plan: Imported from OSS

Differential Revision: D16151479

Pulled By: bwasti

fbshipit-source-id: a54139d8f95ed964530862f96723e4365724b2da
2019-07-09 16:41:16 -07:00
046c4589df lu: When not using pivoting, return the identity permutation instead of zeros (#22242)
Summary:
Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :)

These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option.

Here's a MWE to reproduce the broken behavior, and my fix.

```python
torch.manual_seed(0)

n = 4
L = torch.randn(n,n)
A = L.mm(L.t()).unsqueeze(0)
b = torch.randn(1, n)

A_lu_cpu = torch.lu(A)
A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False)
A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True)
print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot)
print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot)

x_cpu = b.lu_solve(*A_lu_cpu)
x_cuda_nopivot = b.cuda().lu_solve(*A_lu_cuda_nopivot)
x_cuda_nopivot_fixed = b.cuda().lu_solve(
    A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int())
x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot)

print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot)
```

Output:

```
A_lu_cuda_nopivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

-----

A_lu_cuda_pivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

(tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]),
 tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242

Differential Revision: D16049334

Pulled By: ezyang

fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d
2019-07-09 11:16:50 -07:00
7fcfed19e7 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:43 -07:00
5040d52a5a torch.quantization conversion utilities, observers for eager mode quantization (#22010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22010

torch.quantization module with observers and conversion routines

Reviewed By: zafartahirov

Differential Revision: D15554183

fbshipit-source-id: 05a3fabe28dd701978b8ecebf5bfc3a4c044ba5c
2019-07-09 10:51:38 -07:00
073fa6f411 add GRAPH_UPDATE logging to guard_elimination.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22497

Differential Revision: D16165106

Pulled By: Krovatkin

fbshipit-source-id: aeb48d81d92c71f7038903b1656d760b6b95c562
2019-07-09 10:09:35 -07:00
a3346e100e Performance improvements for depthwise convolutions in FP16 (#22302)
Summary:
This PR activates faster depthwise convolution kernels for Volta and Turing GPUs using cudnn >= 7600.
The script to benchmark the current PyTorch master branch and this PR branch can be found [here](https://gist.github.com/ptrblck/4590cf20721d8f43296c9903abd4a774).
(50 warmup iterations, 1000 iterations for timing)

I've used https://github.com/pytorch/pytorch/issues/3265 to create a similar benchmark and added a few additional setups.
Since the results are quite long, I've uploaded them in a spreadsheet [here](https://docs.google.com/spreadsheets/d/13ByXcqg7LQUr3DVG3XpLwnJ-CXg3GUZJ3puyTMw9n2I/edit?usp=sharing).
Times are given in ms per iteration.
We've benchmarked this PR on a DGX1 using V100 GPUs.

The current workload check in `check_cudnn_depthwise_workload` is quite long and can be moved to another file, if wanted.

CC ngimel (Thanks for the support while benchmarking it ;) )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22302

Differential Revision: D16115057

Pulled By: ezyang

fbshipit-source-id: bad184658518e73b4d6b849d77e408f5a7a757de
2019-07-09 07:28:31 -07:00
31d821e267 Move thnvrtc and DynamicLibrary to ATen (#22362)
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229.

`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362

Differential Revision: D16131787

Pulled By: ezyang

fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
2019-07-09 07:28:27 -07:00
74883d4865 Fix typos in gradcheck error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22357

Differential Revision: D16065935

Pulled By: ezyang

fbshipit-source-id: f131655eaca27f9df4cd6c511faabf0b8f2bf0de
2019-07-09 07:12:56 -07:00
92e8d04098 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 488
LARGE: 29
XXLARGE: 2

Updated actions:
From MEDIUM to LARGE: 227
From XLARGE to MEDIUM: 1
From XLARGE to LARGE: 1
From XLARGE to XXLARGE: 1
From LARGE to MEDIUM: 2
From LARGE to XLARGE: 2

Differential Revision: D16161669

fbshipit-source-id: 67a4e0d883ca3f1ca3185a8285903c0961537757
2019-07-09 05:24:19 -07:00
9fb4386c14 Add a higher-level log traces to DCE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22511

Differential Revision: D16153694

Pulled By: Krovatkin

fbshipit-source-id: 5edbc04bdccfa89f5ad0bf37f51e1bd2cb28962a
2019-07-08 21:55:02 -07:00
5395db22a4 Typo fixed in documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22600

Differential Revision: D16156989

Pulled By: mrshenli

fbshipit-source-id: e491b083d872eaceb829028dadbab2e28ecfc785
2019-07-08 19:29:07 -07:00
b5860b5572 torchvision version changed to the latest one (#22598)
Summary:
Version changed to 487c9bf4b7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22598

Differential Revision: D16155471

Pulled By: ifedan

fbshipit-source-id: 2d54883c91add28c0f076858f292363eb95340a9
2019-07-08 17:13:59 -07:00
738aba171b use caffe2_dnnlowp_force_slow_path in FC (#22143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22143

Like Conv DNNLOWP operator, allow FC to run the slow path to debug numerical issues caused by Intel's int8 instruction that does horizontal addition of 2 int8 multiplication results in 16 bit

Reviewed By: hx89

Differential Revision: D15966885

fbshipit-source-id: c6726376a3e39d341fd8aeb0e54e0450d2af8920
2019-07-08 17:01:04 -07:00
905b9a89b2 fix uninitialized variable in BailOutInserter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22596

Differential Revision: D16156883

Pulled By: Krovatkin

fbshipit-source-id: 8926262a2d3115f34400c9ebb0c98c540e1cc623
2019-07-08 16:45:51 -07:00
c97829d701 Adding FC and Relu QNNPACK ops to C10 registry (#22174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174

This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on.

The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models.

Reviewed By: ljk53

Differential Revision: D15806325

fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae
2019-07-08 14:21:42 -07:00
0e7b65e48a Convert VariableVersion counter to intrusive_ptr, saving a memory allocation on every Tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22514

Differential Revision: D16114714

fbshipit-source-id: 441043d18938710869b64cb67884f49cd6060727
2019-07-08 13:40:58 -07:00
0c1ecf19e1 Simplify the flow control in div_kernel_cuda. (#22555)
Summary:
Some duplicated code is removed. It also becomes clear that there is only one special case `div_kernel_cuda` is handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22555

Differential Revision: D16152091

Pulled By: zou3519

fbshipit-source-id: bb875370077c1f84efe4b766b3e1acc461e73e6c
2019-07-08 12:13:10 -07:00
478d480d37 Add Module.requires_grad_ (#22576)
Summary:
addresses https://github.com/pytorch/pytorch/issues/20241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22576

Differential Revision: D16149314

Pulled By: zou3519

fbshipit-source-id: 1cc4c1ec084df30e00e9ae73ce1a53494a034d5c
2019-07-08 12:13:07 -07:00
456d27dff0 Update module.h (Fix a grammatical error of the comment in line 233) (#22548)
Summary:
Fix a grammatical error of the comment in line 233.
change from " Returns an `OrderedDict` of he submodules of this `Module`"
to " Returns an `OrderedDict` of the submodules of this `Module`"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22548

Differential Revision: D16134534

Pulled By: zou3519

fbshipit-source-id: 33b1dd0fbc3a24bef99b6e0192566e2839292842
2019-07-08 11:51:49 -07:00
3a12520844 Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473)
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.

Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.

This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473

Differential Revision: D16099042

Pulled By: yf225

fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
2019-07-08 11:31:10 -07:00
304552b61a Enabled masked fill and scatter for bool tensors (#22491)
Summary:
Enabled masked_fill, masked_fill_, masked_scatter_, masked_scatter on bool tensors.
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22491

Differential Revision: D16108817

Pulled By: izdeby

fbshipit-source-id: 8b1808f41d7a4f65fe6d3797a3c83b2dac3446c7
2019-07-08 10:49:40 -07:00
a48cf8f52d Fixed RNNImplBase reset and flat_weights methods to handle bidirectional flag correctly (#22493)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/19545:
Changed torch/csrc/api/src/nn/modules/rnn.cpp to be consistent with torch/nn/modules/rnn.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22493

Differential Revision: D16111433

Pulled By: pbelevich

fbshipit-source-id: edfa41e8a9889d64918998dc7c46b8763fdf5765
2019-07-08 10:34:04 -07:00
595e344769 Add type stubs to import 'nn' modules (#22411)
Summary:
Forgot to mirror the `nn/ __init__.py` semantics in the new `nn` type stub.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22411

Differential Revision: D16149798

Pulled By: ezyang

fbshipit-source-id: 0ffa256fbdc5e5383a7b9c9c3ae61acd11de1dba
2019-07-08 09:22:37 -07:00
9bafe5d4da Fix torch.normal with CUDA tensors (#22533)
Summary:
`addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`.

Changelog:
- Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533

Test Plan:
- Enable tests for test_normal on all devices

Fixes https://github.com/pytorch/pytorch/issues/22529

Differential Revision: D16141337

Pulled By: ezyang

fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62
2019-07-08 08:27:38 -07:00
80e2fab952 Deprecate and set a date for removing NO_* and WITH_* (user) build options (#22474)
Summary:
Currently specifying different build options in respect to the "USE_"
series is in quite a disarray. There are a lot of build options that
accept three variants: USE_OPTION, WITH_OPTION, and NO_OPTION. Some
build options only accept USE_ and NO_ variant. Some accept only USE_.
This inconsistency is quite confusing and hard to maintain.

To resolve this inconsistency, we can either let all these build options
support all three variants, or we only support the USE_ variant.

This commit makes a step to the latter choice, i.e., deprecates and sets
a date for removing the NO_ and WITH_ variants and keeps only the
USE_ variant. This is likely better than the former solution because:

- NO_ and WITH_ variants are not documented.
- CMakeLists.txt only has the USE_ variants for relevant build options
  defined. It would be a surprise that when user pass these variables to
  CMake during rebuild and find them ineffective.
- Multiple variants are difficult to maintain.
- The behavior is confusing if more than one variant is passed. For
  example, what to be expected if one sets "NO_CUDA=1 USE_CUDA=1"?

The downside is that this will break backward compatibility for existing
build scripts in the future (if they used the undocumented build
options).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22474

Differential Revision: D16149396

Pulled By: ezyang

fbshipit-source-id: 7145b88ad195db2051772b9665dd708dfcf50b7d
2019-07-08 08:22:08 -07:00
43d36415b9 torch.utils.data.Dataloader: documentation about RNG state consumption (#22540)
Summary:
the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631

The discussion is here: https://github.com/pytorch/pytorch/pull/20749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540

Differential Revision: D16131777

Pulled By: ezyang

fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848
2019-07-08 08:22:04 -07:00
ce8c9d9bd5 Fix cuda detection script (#22527)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22507
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22527

Differential Revision: D16126220

Pulled By: ezyang

fbshipit-source-id: eb05141282b0f058324da1b3d3cb34566f222a67
2019-07-08 07:06:59 -07:00
d4464d3418 Use system locale in collect_env.py (#22579)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22570.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22579

Differential Revision: D16147304

Pulled By: soumith

fbshipit-source-id: db73447bffbfdf54f7b830447d4b9584f363f05f
2019-07-07 20:55:31 -07:00
d48cbd62cd Fix spectral_norm load_state_dict with strict=False (#22545)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21251

also fixes some missing hook removals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22545

Differential Revision: D16139506

Pulled By: soumith

fbshipit-source-id: 552a9f9f91be328a47ee8f1e1d29c1f59b0ebca3
2019-07-07 19:08:48 -07:00
94bd5ddf7f Add some essentials for building c++ extensions on Windows (#22563)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22489.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22563

Differential Revision: D16142615

Pulled By: ezyang

fbshipit-source-id: d7c27a874f788dd27065fad6699485e4a6372ec4
2019-07-06 19:29:25 -07:00
9db7bc8bc7 fix uninitialized variable warning (#22477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477

There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together.

Reviewed By: hx89

Differential Revision: D16100211

fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea
2019-07-06 00:36:45 -07:00
42c6ea5faa ONNX Export Topk with Dynamic k (+ add test cases)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21104

Differential Revision: D16061592

Pulled By: houseroad

fbshipit-source-id: 855b310a138fdde9c25869ffe9f127189dc2eaf5
2019-07-05 23:46:36 -07:00
221af09ca7 Move GradMode / AutoGradMode / NoGradGuard to ATen core (#18573)
Summary:
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen.

Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning:
Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example:
```python
#  torch/tensor.py
with torch.no_grad():
   ...
   new_tensor = self.new()    # `at::GradMode` is false at this point
   ...
```
```cpp
// tools/autograd/templates/python_variable_methods.cpp
static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error)
  return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs));
  ...
}
```
For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573

Differential Revision: D16134413

Pulled By: yf225

fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
2019-07-05 23:41:37 -07:00
39a4ec4141 Make device_of take tensor by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22556

Test Plan: Imported from OSS

Differential Revision: D16137540

Pulled By: jamesr66a

fbshipit-source-id: 8a6be6edd602c53edc954508ea27d8a6071bd964
2019-07-05 23:34:43 -07:00
7730346853 Make shuffling optional in DistributedSampler (#22479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479

In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples.

This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case.

Reviewed By: soumith

Differential Revision: D16100388

fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d
2019-07-05 18:56:28 -07:00
9e1ae4b264 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 8932f509ab9b14988428a1b9a42d3388ff5a4ad5
2019-07-05 18:36:03 -07:00
577042a3cc Better Constant Propagation through Tuples (#22561)
Summary:
Replaces https://github.com/pytorch/pytorch/pull/21501 because ghimport had errors when i tried to import the stack that i couldn't figure out :'(

has the two commits that were previously accepted and the merge commit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22561

Differential Revision: D16135743

Pulled By: eellison

fbshipit-source-id: f0a98842ccb334c7ceab04d1437e09dc76be0eb1
2019-07-05 18:06:46 -07:00
a09150adc0 Deprecate untyped Dicts (#22516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22516

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Differential Revision: D16115215

fbshipit-source-id: 2ef4cb443da1cdf4ebf5b99851f69de0be730b97
2019-07-05 18:00:13 -07:00
595d2dbb4d Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 1a6c249837151564f48f675ced6a221ec739aae9
2019-07-05 15:39:56 -07:00
91706d1044 Primitive Jit Logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22278

Differential Revision: D16134598

Pulled By: Krovatkin

fbshipit-source-id: e64b14d0d68801189fc78c059a4e8b322acce3fa
2019-07-05 15:27:38 -07:00
ed60d9fcf9 List/Dict remember and check their element type (#22005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005

When a Dict or List is created with type information, it will remember that.
If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type.

Differential Revision: D15914462

fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7
2019-07-05 15:17:51 -07:00
mal
042da2171e Skip test_slogdet_sign if LAPACK library is not installed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22551

Test Plan:
ran test locally

Imported from OSS

Differential Revision: D16132182

fbshipit-source-id: 5b9efbf883efa66c4d8b7c400bdb804ac668a631
2019-07-05 11:57:24 -07:00
3507eaf3ea Add clone() implementation for QTensor (#22510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22510

Added a new function to implement clone operation on quantized tensors. Also added a test case which can be tested as shown in test plan.

This change is required to be able to call torch.jit.trace on quantized models.
Clone implementation calls copy_ on QTensor internally.

Differential Revision: D16059576

fbshipit-source-id: 226918cd475521b664ed72ee336a3da8212ddcdc
2019-07-05 11:24:53 -07:00
mal
0140a756d8 Prioritize reentrant tasks and execute them recursively until close to limit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22397

Test Plan:
Added test for reentrant backwards with checkpoint and a test for a recursive backwards function (which should fail if we run all the reentrant tasks recursively in the same thread) and for testing priority of reentrant tasks.
~~Will add a test for priority of reentrant tasks in future pr.~~

Imported from OSS

Differential Revision: D16131955

fbshipit-source-id: 18301d45c1ec9fbeb566b1016dbaf7a84a09c7ac
2019-07-05 08:51:06 -07:00
e5d640341f Set stream for softmax kernel launch (#22470)
Summary:
Currently, the **stream** parameter is not set when launching these two kernels: softmax_warp_forward() and softmax_warp_backward(), i.e. the kernels are always put on the default stream, which may fail to respect the stream that was set previously. Add **at::cuda::getCurrentCUDAStream()** as a launch argument to fix this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22470

Differential Revision: D16115051

Pulled By: izdeby

fbshipit-source-id: 38b27e768bb5fcecc1a06143ab5d63b0e68a279e
2019-07-05 07:33:55 -07:00
d2ceab2766 update video input (#22471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22471

update C2 video input with latest augmentation

Reviewed By: HengCV

Differential Revision: D16096127

fbshipit-source-id: bb07394e211cd52b50005d801b6d03250248ea9e
2019-07-05 00:56:33 -07:00
4ba1c4f798 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_conv (#22472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22472

As Title says.

Reviewed By: dskhudia, bddppq

Differential Revision: D16097594

fbshipit-source-id: 7f56b7906dd9c2792e21a8aa553c0b8d05b19012
2019-07-04 19:37:37 -07:00
57dbc79674 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22543

Test Plan: Imported from OSS

Differential Revision: D16127755

Pulled By: suo

fbshipit-source-id: 5f6ba507bf5b671987e2cabf03c2118271800595
2019-07-04 18:26:04 -07:00
b952011bae use save/load for emitFunctionHook
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22314

Test Plan: Imported from OSS

Differential Revision: D16121781

Pulled By: suo

fbshipit-source-id: b376afb082726d78f082a0ff6902c4b501435d4b
2019-07-04 17:12:16 -07:00
bc24593a80 remove unused argument in import.cpp (#22205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22205
ghimport-source-id: afdf13f6515a1352cde4cbb7b45bf01e717bcf4d

Test Plan: Imported from OSS

Differential Revision: D15998763

Pulled By: suo

fbshipit-source-id: 6e5c1c668b9de5e352d2aa7ca27197deb42ca94b
2019-07-04 17:12:12 -07:00
4b9b7d6f03 improvements to QualifiedName (#22204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22204
ghimport-source-id: 319afc622f7137ca9075efefca1a05acedc19a4a

Test Plan: Imported from OSS

Differential Revision: D15998759

Pulled By: suo

fbshipit-source-id: 4534443aef61255af0fa3d2ed1be5e87266e2f2c
2019-07-04 17:12:08 -07:00
f5919dba45 refactoring of module/object (#22203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22203
ghimport-source-id: 6b3807ac8aa53df2fdd770b43d8e54b8f0d69c20

Test Plan: Imported from OSS

Differential Revision: D15998760

Pulled By: suo

fbshipit-source-id: dd51edbcb66561189ae9d94a129434092bcad01b
2019-07-04 17:12:04 -07:00
3b2844eeea Make CompilationUnit own Functions (#22202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22202
ghimport-source-id: de6c963af1df76d2d6357155e64a5913ab879f76

Test Plan: Imported from OSS

Differential Revision: D15998761

Pulled By: suo

fbshipit-source-id: 5414a6424953738d823b265d20dc67dde6e5b2d8
2019-07-04 17:12:00 -07:00
66378c7025 make test context managers exception safe
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22502

Test Plan: Imported from OSS

Differential Revision: D16121783

Pulled By: suo

fbshipit-source-id: e5f991b189261f596355541cc331ef92667bd1a5
2019-07-04 17:11:56 -07:00
2b06e7cd50 add #pragma once to jit headers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22505

Differential Revision: D16119310

Pulled By: Krovatkin

fbshipit-source-id: 8b742411f40d66690ce28726c213741e0c2de618
2019-07-04 11:10:59 -07:00
6f6a680481 remove erase_fork_wait.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22509

Differential Revision: D16119307

Pulled By: Krovatkin

fbshipit-source-id: 3251f594be6d365652b847bdc35dd4f4b62c35e6
2019-07-03 22:47:57 -07:00
799633e4cd move casting ops from prim to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22275

Test Plan: Imported from OSS

Differential Revision: D16060597

Pulled By: wanchaol

fbshipit-source-id: a11d8ad3b037e15bd670cc7cd3fefd4f0abd0bba
2019-07-03 22:22:28 -07:00
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
10c4b98ade Remove weak script (#22212)
Summary:
* Deletes all weak script decorators / associated data structures / methods
   * In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
   * Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand

This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212

Differential Revision: D15988346

Pulled By: driazati

fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
2019-07-03 17:28:25 -07:00
b93f29ded3 add JIT path to the benchmark (#22309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309

This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.

In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT.  With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.

Reviewed By: zheng-xq

Differential Revision: D16033082

fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
29ec4769bb Fix SyncBatchNorm running var update issue (#22248)
Summary:
## Fix https://github.com/pytorch/pytorch/issues/22192

+ change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)`
+ change cuda & cuda head
```cuda
std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean,
                                                        const Tensor& running_var, double momentum, double epsilon, int64_t count) {
                                                        const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
+ change python interface
```python
class SyncBatchNorm(Function):
    def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size):
        ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248

Differential Revision: D16002146

Pulled By: mrshenli

fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956
2019-07-03 17:17:59 -07:00
325ec2327f create tensor based on provided datatype (#22468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468

as title

Reviewed By: ajauhri

Differential Revision: D15744503

fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9
2019-07-03 17:08:23 -07:00
319ef3bcbb Fix onnx custom op export & add initial test case (#21321)
Summary:
- Fix typo in ```torch/onnx/utils.py``` when looking up registered custom ops.
- Add a simple test case
    1. Register custom op with ```TorchScript``` using ```cpp_extension.load_inline```.
    2. Register custom op with ```torch.onnx.symbolic``` using ```register_custom_op_symbolic```.
    3. Export model with custom op, and verify with Caffe2 backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21321

Differential Revision: D16101097

Pulled By: houseroad

fbshipit-source-id: 084f8b55e230e1cb6e9bd7bd52d7946cefda8e33
2019-07-03 16:59:12 -07:00
9c44f6c723 generate tests based on op metadata (#21432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432

This diff introduce a new interface to generate tests based on the metadata of operators.

Reviewed By: ajauhri

Differential Revision: D15675542

fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c
2019-07-03 16:48:41 -07:00
2732a5e534 Another dce fix (#22499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22499

Another place where onnx export is running dead code elimination after making the jit graph invalid. Fixing it.

Reviewed By: houseroad

Differential Revision: D16111969

fbshipit-source-id: 5ba80340c06d091988858077f142ea4e3da0638c
2019-07-03 16:37:53 -07:00
d9e15bccb0 Perform weight re-init for embedding table in sparse_lookup.py (#22348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348

This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient.

Reviewed By: itomatik

Differential Revision: D16044736

fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb
2019-07-03 10:33:40 -07:00
c9f41e9bc0 Add device guard around MPI operations (#22446)
Summary:
If the current CUDA device is not the same as the device that hosts
the tensor the operation works on then OpenMPI will segfault, as
reported in https://github.com/pytorch/pytorch/issues/21922. This changes adds a device guard for every
operation to ensure the correct device is set.

Fixes https://github.com/pytorch/pytorch/issues/21922.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22446

Differential Revision: D16106823

Pulled By: pietern

fbshipit-source-id: 99d762eb3851c0a0e0b4fe81cf27c1c8d35596cc
2019-07-03 02:01:53 -07:00
abb2e68989 Don't construct a single element array for unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22469

Test Plan: Imported from OSS

Differential Revision: D16105794

Pulled By: jamesr66a

fbshipit-source-id: 6bb5a6703c8dba3cda20a6db192d2a15711751a1
2019-07-02 23:29:56 -07:00
7fef0b7b72 Take const refs in TensorIterator::mark_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22465

Test Plan: Imported from OSS

Differential Revision: D16105795

Pulled By: jamesr66a

fbshipit-source-id: 22945fc3f02f8450ae6b92492a0f7baad80c5cb5
2019-07-02 23:29:52 -07:00
0d63619414 Deprecate vector/unordered_map again (#22478)
Summary:
The deprecation was temporarily removed in https://github.com/pytorch/pytorch/pull/21999 because it threw warnings on our codebase.

https://github.com/pytorch/pytorch/pull/22162 fixed these internal call sites so we can now re-enable the deprecation.

I checked locally that this PR doesn't add any new warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22478

Differential Revision: D16101342

Pulled By: smessmer

fbshipit-source-id: 378eb208f6a3dd3a28d2efe2e001fd71a9569297
2019-07-02 21:59:05 -07:00
17cc79865d Fix dead code elimination in onnx export (#22476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22476

Dead code elimination assumes a valid jit graph because it checks if operators have side effects.
The onnx export path destroys the jit graph right before calling dead code elimination, but it actually doesn't care about side effects.
We can just call dead code elimination and disable side effect lookup and things should work.

Reviewed By: houseroad

Differential Revision: D16100172

fbshipit-source-id: 8c790055e0d76c4227394cafa93b07d1310f2cea
2019-07-02 21:28:57 -07:00
76e14c1e51 remove caffe2/core dependency from ATen/core/jit_type.h (#22441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22441

This include doesn't seem to be needed. Remove it to simplify mobile build dependency.

Reviewed By: dreiss

Differential Revision: D16088224

fbshipit-source-id: f6aec21655e259726412e26a006d785912436c2a
2019-07-02 21:07:59 -07:00
e210c65097 Add torch.where overload with only condition argument (#21986)
Summary:
Requested in https://github.com/pytorch/pytorch/issues/21798
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21986

Differential Revision: D16081577

Pulled By: zhangguanheng66

fbshipit-source-id: 658c0f451b833aceb1a41ee424c7990eec00bc02
2019-07-02 18:18:15 -07:00
dcd902bdde provide "size" parameter in torch.normal when called with two floats (#20545)
Summary:
This has been requested in https://github.com/pytorch/pytorch/issues/20323

(It is still not exactly the same as NumPy, which allows you to pass tensors at mean/std and broadcast them with size, but the present PR is extremely simple and does the main thing people are asking for)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20545

Differential Revision: D15358736

Pulled By: zhangguanheng66

fbshipit-source-id: 762ea5eab5b8667afbac2df0137df017ba6e413c
2019-07-02 18:18:11 -07:00
bb0f299f27 Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288)
Summary:
The changes include:

1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions.
2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision.
3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model.
4. Add more test cases to cover the arguments.

Note: current users should not be affected by the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288

Differential Revision: D15738808

Pulled By: zhangguanheng66

fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c
2019-07-02 18:06:25 -07:00
d684112ec9 Output sequence probability with CTC beam search, optional multiple output sequences (#21927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21927

Add `OUTPUT_PROB` output to CTCBeamSearchDecoderOp to return a probability for each sequence.

Add argument to output top-k instead of top-1 decoded sequences.

Reviewed By: SuperIRabbit

Differential Revision: D15797371

fbshipit-source-id: 737ca5cc4f90a0bcc3660ac9f58519a175977b69
2019-07-02 17:29:13 -07:00
830c6590ef EraseNumberTypes cleans itself up (#22461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22461

We shouldn't call dead code elimination after EraseNumberTypes because dead code elimination assumes a valid jit graph which EraseNumberTypes just broke.
Let's have it clean up itself isntead.

Reviewed By: houseroad

Differential Revision: D16094656

fbshipit-source-id: f2752277d764e78ab276c57d56b2724b872b136f
2019-07-02 17:06:53 -07:00
a6441c00d6 Remove build variable NCCL_EXTERNAL (#22467)
Summary:
It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL
build. See 30da84fbe1614138d6d9968c1475cb7dc459cd4b
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467

Differential Revision: D16098581

Pulled By: ezyang

fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815
2019-07-02 15:36:44 -07:00
34f950c800 Create C2 operator to replace values in embedding table (#22279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22279

This new operator is used for embedding table weight re-init. After we get the evicted indices, they will be the rows need reseting in embedding table. Then we can create a 1d tensor with default values, and apply this operator to copy the tensor to all evicted rows in embedding table

Will add gradient op in next diff

Reviewed By: itomatik

Differential Revision: D15709866

fbshipit-source-id: 2297b70a7326591524d0be09c73a588da245cc08
2019-07-02 15:26:22 -07:00
040a4bd914 include conv_op_impl.h from conv_dnnlowp_op.cc (#22458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22458

To make sure template instantiation.

Reviewed By: jianyuh

Differential Revision: D16094183

fbshipit-source-id: 7861df0b303bec42ab80a53477c4b608edebb61d
2019-07-02 15:09:34 -07:00
474dec4b00 Warn on conditions that can trigger cuBLAS sgemm bug (#22034)
Summary:
The sgemm in cuBLAS 9.0 has some issues with sizes above 2M on Maxwell and Pascal architectures. Warn in this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22034

Differential Revision: D15949930

Pulled By: zhangguanheng66

fbshipit-source-id: 0af977ec7900c76328d23898071de9c23778ff8b
2019-07-02 15:09:31 -07:00
f5b3f9ecd9 Remove unnecessary ROCm detection code in Python scripts. (#22464)
Summary:
ROCm is already detected in cmake/public/LoadHIP.cmake. No need to
detect twice. Plus, the Python script read environment variable
ROCM_HOME, but what is really used in CMake scripts is ROCM_PATH -- A
user must specify both environment variables right. Since ROCM_HOME is
undocumented, this commit completely eradicates it.

 ---

ezyang A remake of https://github.com/pytorch/pytorch/issues/22228 because its dependency has been dismissed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22464

Differential Revision: D16096833

Pulled By: bddppq

fbshipit-source-id: fea461e80ee61ec77fa3a7b476f7aec4fc453d5d
2019-07-02 14:46:03 -07:00
e68dc899d1 Fix compiler warnings (#22162)
Summary:
Fix various compiler warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162

Differential Revision: D16085339

Pulled By: smessmer

fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d
2019-07-02 14:12:55 -07:00
53a52f574f infer shape until no more change (#22425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22425

Currently, in bound_shape_inference.cc: InferBoundShapeAndType, we firstly infer ops in the order and then infer inputs of concat in reverse order. In ctr_instagram_model tiny version, concat is right before FC, so we can infer the inputs for concat. But in production version, we found there are some ops between concat and FC(or other ops we know the shape), so the shape of these ops cannot be inferred.
This diff is a tmp solution for this problem: infer shape in order and in reverse order repeatly until no more change.

Reviewed By: yinghai, ipiszy

Differential Revision: D16082521

fbshipit-source-id: d5066509368029c6736dce156030adf5c38653d7
2019-07-02 13:13:55 -07:00
07ef85e326 Add USE_MKLDNN_CBLAS build option. (#19014)
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014

Differential Revision: D16094090

Pulled By: ezyang

fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
2019-07-02 12:29:54 -07:00
6d5871300b Use concrete types on call sites for Dict/List (#22004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22004

In future, we want all dicts/lists to store information about the types they contain.
This is only possible if the creation API doesn't allow creating lists/dicts without type information.
This diff removes some call sites that don't specify type information and have it specify type information.

Reviewed By: dzhulgakov

Differential Revision: D15906387

fbshipit-source-id: 64766a2534b52c221e8a5501a85eaad13812e7bd
2019-07-02 11:52:35 -07:00
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
bb07f2d063 Pass LRU hash output evicted_values to SparseLookup (#21389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21389

As titled. To do weight re-init on evicted rows in embedding table, we need to pass the info of the evicted hashed values to SparseLookup, which is the layer model responsible for constructing the embedding table and do pooling.

To pass evicted values, we need to adjust the output record of lru_sparse_hash to include the evicted values, and add optional input to all processors that needs to take in sparse segment. For SparseLookup to get the evicted values, its input record needs to be adjusted. Now the input record can have type IdList/IdScoreList/or a struct of feature + evicted values

Reviewed By: itomatik

Differential Revision: D15590307

fbshipit-source-id: e493881909830d5ca5806a743a2a713198c100c2
2019-07-02 11:27:37 -07:00
869ce89474 use feenableexcept when glibc is available (#22241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387

glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace.

Reviewed By: jspark1105

Differential Revision: D15301095

fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb
2019-07-02 10:49:55 -07:00
e74b0fc87c Fix empty_like for quantized tensors. (#21978)
Summary:
empty_like uses the tensor options of `self`, rather than the passed in tensor options.  This means it messes up variable/tensor types, and ignores specifications like different dtypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21978

Differential Revision: D15903948

Pulled By: gchanan

fbshipit-source-id: f29946be01c543f888daef2e99fe928e7b7d9d74
2019-07-02 10:28:59 -07:00
7235532df3 Revert D16088193: Refactored math tests to iterate over all math ops
Differential Revision:
D16088193

Original commit changeset: 81b6e536b450

fbshipit-source-id: 81ee8857c3d5353955d75e05203cfbf2d3b8dacd
2019-07-02 10:18:46 -07:00
a845d02cd5 Revert D16088191: Added math.log2 and hypot
Differential Revision:
D16088191

Original commit changeset: 5d80c480243d

fbshipit-source-id: 12ea2617e3af5bf81b1f2a57f8633ca06a99db5b
2019-07-02 10:18:42 -07:00
2dd71b18c4 Fix PoolWindow crash from thread_local (#22405)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19394

See https://developercommunity.visualstudio.com/content/problem/124121/thread-local-variables-fail-to-be-initialized-when.html for context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22405

Differential Revision: D16090822

Pulled By: ezyang

fbshipit-source-id: 9fdd2c272fa7723fb62b906336d2e2620411b12b
2019-07-02 09:48:14 -07:00
7ca7edc307 ONNX Export LayerNorm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22265

Reviewed By: zrphercule

Differential Revision: D16076268

Pulled By: houseroad

fbshipit-source-id: 29b4ecab2fa0dc7250c9d1ad6924903181a66ab2
2019-07-02 09:37:07 -07:00
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
c9a8413306 Numerical stability of embedding kernels (#22401)
Summary:
Address the issue raised in https://github.com/pytorch/pytorch/issues/22377.

The PR https://github.com/pytorch/pytorch/issues/22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`.
In this PR, it now use a `float32` temporary tensor when the input is `float16`.

ngimel, can I get you to review? I think I have fixed the issues you have pointed out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22401

Differential Revision: D16077319

Pulled By: mrshenli

fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309
2019-07-02 08:53:22 -07:00
b76877728a Added math.log2 and hypot
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21512

Test Plan: Imported from OSS

Differential Revision: D16088191

Pulled By: Chillee

fbshipit-source-id: 5d80c480243d2644c96df26337cf65918d79443e
2019-07-02 06:28:34 -07:00
3d3d07b7dd Refactored math tests to iterate over all math ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21511

Test Plan: Imported from OSS

Differential Revision: D16088193

Pulled By: Chillee

fbshipit-source-id: 81b6e536b4505178c829a9d925c30cd185b7a706
2019-07-02 06:28:30 -07:00
0ffda97aa4 Make Gloo an optional c10d dependency (#22257)
Summary:
The CMake modifications include removal of some unnecessary paths
(e.g. find_package(CUDA) and friends) that are no longer used since
c10d is always part of the larger torch build. The macro
`C10D_USE_...` was ambiguous and is now removed in favor of only
having top level `USE_...`. The c10d test suite is changed to include
skip annotations for the tests that depend on Gloo as well.

Now, if you compile with `USE_DISTRIBUTED=1` and `USE_GLOO=0` you get
a functioning build for which the tests actually pass.

Closes https://github.com/pytorch/pytorch/issues/18851.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22257

Differential Revision: D16087993

Pulled By: pietern

fbshipit-source-id: 0cea66bd5cbd9736b06fa1d45ee13a18cab88adb
2019-07-02 02:39:48 -07:00
b9ede6600e Remove the USE_MIOPEN build option as MIOpen is always used when built with ROCm. (#22420)
Summary:
Close https://github.com/pytorch/pytorch/issues/22200
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22420

Differential Revision: D16087538

Pulled By: bddppq

fbshipit-source-id: ecf3e7eb8213bb093e1c5290d096c233284a2ff9
2019-07-02 00:05:59 -07:00
6721e67c10 Remove hacky stub for quantized ops (#22388)
Summary:
Effectively reverts https://github.com/pytorch/pytorch/pull/18267 - this was a temporary measure and is not used any more.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22388

Differential Revision: D16070725

Pulled By: dzhulgakov

fbshipit-source-id: ee5db11a608f248b0da981169d4cc90470fd482f
2019-07-01 23:21:42 -07:00
2dd1323379 Fix the GPU trainer for NoneCalibration and RNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22385

Reviewed By: Wakeupbuddy

Differential Revision: D16053190

fbshipit-source-id: 6304c5c51f33691c201c78d4c921a9c250d9b4f5
2019-07-01 22:55:18 -07:00
5bd97be309 Fix lint error in format_time() in throughput_benchmark.py and clean it up a bit. (#22424)
Summary:
The `assert False` lint error has been causing CI to fail:

    ./torch/utils/throughput_benchmark.py:14:13: B011 Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22424

Differential Revision: D16083464

Pulled By: bddppq

fbshipit-source-id: 6d96e36c8fcbb391d071b75fe79c22d526c1ba3c
2019-07-01 22:15:37 -07:00
edd5b770be Remove API-level guard on NeuralNetworks.h (#22429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22429

Android NDK r20 removes the guard `(__ANDROID_API__ <= __ANDROID_API_O_MR1__)`, so we do it here also. There is insufficient reason to keep these decls undefined for earlier API levels. NDK r15 and earlier don't even define `__ANDROID_API_O_MR1__`, so the preprocessor defaults it to 0 and the guard evaluates as TRUE.

Reviewed By: smeenai, hlu1

Differential Revision: D16084105

fbshipit-source-id: f0857b3eb0573fe219f0d6c5e6583f89e2b5518f
2019-07-01 22:09:11 -07:00
de84104059 Lint ONNX Related Code (#22423)
Summary:
Lint the code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22423

Differential Revision: D16086518

Pulled By: houseroad

fbshipit-source-id: c6e5143f42c73a70beeaa2e089df4164f6265c32
2019-07-01 21:44:16 -07:00
ffa15d2285 Load original SourceRanges on import (#22180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22180
ghimport-source-id: efa46dcb845c099f0a746f523901ab2c2cd3b004

Test Plan: Imported from OSS

Differential Revision: D15981425

Pulled By: jamesr66a

fbshipit-source-id: bef682bd13c1a5be95bdb97e025690c6f2d523d3
2019-07-01 21:14:39 -07:00
2c2a913a4f Preserve SourceRanges across serialization (#22179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22179
ghimport-source-id: 9879551127da09d78ca348b9e436db5a09a92a38

Test Plan: Imported from OSS

Differential Revision: D15981423

Pulled By: jamesr66a

fbshipit-source-id: a2506f5a2f05916b6e8226841b0229110e758671
2019-07-01 21:14:35 -07:00
e05942c09b Serialization methods for SourceRange and Source (#22178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22178
ghimport-source-id: 85ca4d4454c6d4b57a82f211004c4bb712d1c980

Test Plan: Imported from OSS

Differential Revision: D15981426

Pulled By: jamesr66a

fbshipit-source-id: f81f5ee3b66fc4a0d4a708b8109712b5df9f241a
2019-07-01 21:14:31 -07:00
671782d88a Refactor file:line:col to be less ugly (#22177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22177
ghimport-source-id: e35f068c2d39bd8fa2058a9bfc0b1a3856f9383d

Test Plan: Imported from OSS

Differential Revision: D15981424

Pulled By: jamesr66a

fbshipit-source-id: b7748c5cfd4f8ea594314cb601a2b8045173700a
2019-07-01 21:14:28 -07:00
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
2c18bf21be Fix ScriptModule.__dir__() (#22426)
Summary:
`_method_names` is on `_c`, so `dir(script_module)` calls previously
didn't work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22426

Differential Revision: D16085330

Pulled By: driazati

fbshipit-source-id: 6f9f1bef5da4306c0f26aa0be1bcec6dd3a6f0fb
2019-07-01 19:33:14 -07:00
f0f2331a1c Add support for cross-chunk shuffling in ChunkDataset (#22347)
Summary:
This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22347

Differential Revision: D16081378

Pulled By: zhangguanheng66

fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0
2019-07-01 19:13:34 -07:00
1f9c4fdb5e split onnx passes (#22413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22413

_jit_pass_erase_number_types invalidates the jit graph but parts of _jit_pass_onnx rely on having a valid jit graph.

This splits _jit_pass_onnx into _jit_pass_onnx_remove_print and _jit_pass_onnx_preprocess_caffe2 (which rely on the valid jit graph), runs these before _jit_pass_erase_number_types,
and then runs the rest of _jit_pass_onnx after _jit_pass_erase_number_types

Reviewed By: houseroad

Differential Revision: D16079890

fbshipit-source-id: ae68b87dced077f76cbf1335ef3bf89984413224
2019-07-01 18:16:53 -07:00
a54acd3755 Update the way boolean tensor are being printed (#22238)
Summary:
In case when the boolean tensor gets printed out, no need to specify the dtype.

Example:
```
>> x = torch.tensor([[True, True, True], [True, True, True]])
>> print(x)
tensor([[True, True, True],
        [True, True, True]])

>> x = torch.tensor([True])
>> print(x)
tensor([True])

>> x = torch.tensor(True)
>> print(x)
tensor(True)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22238

Differential Revision: D15996304

Pulled By: izdeby

fbshipit-source-id: 5699acf3e00abca8a2bbb5384f8271eeb063dce7
2019-07-01 18:04:42 -07:00
cbf572671d update mkldnn-bridge to avoid mem leak (#22392)
Summary:
fix the memory leak issue exposed by https://github.com/pytorch/pytorch/issues/21537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22392

Test Plan: {F164886124}

Reviewed By: yinghai

Differential Revision: D16074150

Pulled By: bddppq

fbshipit-source-id: b70192aad3d531f349fea5d2d477b827715a2363
2019-07-01 17:12:48 -07:00
402b9f9a6d add PT chunk op to the benchmark (#22409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409

as title

Reviewed By: hl475

Differential Revision: D16079031

fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a
2019-07-01 16:37:05 -07:00
8a726f5815 add PT split op to the benchmark (#22410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410

as title

Reviewed By: hl475

Differential Revision: D16078705

fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b
2019-07-01 16:37:01 -07:00
8281909e73 add PT cat operator to the benchmark (#22404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404

as title

Reviewed By: hl475

Differential Revision: D16078395

fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f
2019-07-01 16:36:57 -07:00
007fd01e9b Enable PT operators running with {cpu, gpu} * {forward, backward} (#22416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416

This diff tests the combination of cpu/gpu and forward/backward path for PT add operator.

Reviewed By: hl475

Differential Revision: D15770792

fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae
2019-07-01 16:30:58 -07:00
dfa6fca1c6 Supporting Manifold DB in Predictor Exporter (#22334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22334

Improve the function signatures of save_to_db and load_from_db in predictor_exporter.

Reviewed By: akyrola

Differential Revision: D16047208

fbshipit-source-id: a4e947f86e00ef3b3dd32c57efe58f76a38fcec7
2019-07-01 16:17:02 -07:00
30fedeae4a Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 26112fb218995b292bb28e65332f6259b3c289f6
2019-07-01 15:51:30 -07:00
10e4137396 Optimize InstanceNormGradientOp (#22288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22288

Optimize InstanceNormGradientOp

Benchmarks:

CPU with [N, C, H, W] = [128, 256, 56, 56],
NCHW order: 616ms -> 128ms
NHWC order: 1612ms -> 174ms

GPU with [N, C, H, W] = [128, 256, 112, 112],
NCHW order: 6450ms -> 37ms
NHWC order: 1419ms -> 82ms

Reviewed By: houseroad

Differential Revision: D16023630

fbshipit-source-id: 5af9bf1103cde2fc2bcb5cd5a057d039732f052e
2019-07-01 15:10:17 -07:00
d0348c0ef9 ThroughputBenchmark: improve formatting for ExecutionStats (#22293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22293

Just wrapping C class with nicer python interface which now
ust print dirrectly to get all the data. Later we can add various
visualizations there

Differential Revision: D16023999

fbshipit-source-id: 8436e37e36965821a690035617784dcdc352dcd1
2019-07-01 14:24:34 -07:00
d0db2a76a0 PyTorch ThroughputBenchmark: fix inaccuracy in number of iterations reporting (#22292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22292

as we do atomic fetch_add to validate if a thread should
finish, we should not take the last iteration into account. As a
result total number of iterations should be exactly the same as user
sets via config.num_iters

Now when running a unit test I see exact number of iterations reported

Differential Revision: D16023963

fbshipit-source-id: 3b12ee17276628ecd7b0979f28cd6deb777a1543
2019-07-01 14:24:29 -07:00
813b01e4a8 Use at::AutoNonVariableTypeMode before calling ATen tensor factory functions (#22364)
Summary:
As part of the Variable/Tensor merge, one invariant for tensor libraries such as ATen / Caffe2 / XLA is that they should only deal with Tensors, not Variables. However, currently in `variable_factories.h` we are potentially passing Variables into those tensor libraries without the `at::AutoNonVariableTypeMode` guard, which will cause those libraries to treat those Variables as Variables (i.e. their `is_variable()` is true), not Tensors.

Consider the following example for `full_like`:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
  //
  // When `self` is a Variable, since we are not using `at::AutoNonVariableTypeMode`,
  // `at::full_like` will also use `self` as a Variable (and it will see that `self.is_variable()` is true),
  // which breaks the invariant that ATen / XLA should never deal with Variables.
  at::Tensor tensor = at::full_like(self, fill_value, self.options().is_variable(false));
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```

Instead, the invariant-preserving implementation would be:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  at::Tensor tensor = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
    //
    // When `self` is a Variable, since we have `at::AutoNonVariableTypeMode` in the scope,
    // `at::full_like` will use `self` as a Tensor (and it will see that `self.is_variable()` is false),
    // which preserves the invariant that ATen / XLA should only deal with Tensors.
    return at::full_like(self, fill_value, self.options().is_variable(false));
  })();
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```
This PR makes the suggested change for all variable factory functions.

cc. ailzhang This should allow us to remove all `tensor_data()` calls in the XLA codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22364

Differential Revision: D16074862

Pulled By: yf225

fbshipit-source-id: 3deba94b90bec92a757041ec05d604401a30c353
2019-07-01 14:08:28 -07:00
d632b1ff3c Expose is_mkldnn to python and register it as torchscript prim op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22386

Differential Revision: D16074722

Pulled By: bddppq

fbshipit-source-id: b9b2a05a894847640084f063fba68d9db4e6aec1
2019-07-01 12:31:59 -07:00
2ab6ff42d1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: c9d3be641389f3c45a9e5a65280d8bfd20e38ea0
2019-07-01 12:25:21 -07:00
577c04c490 add mutation support for forward_pre_hook and forward_hook (#22285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22285

Previously forward hooks are expected to return None, this PR adds the support to overwrite input and output in `forward_pre_hook` and `forward_hook`, this is used to implement inserting quant/dequant function calls around forward functions.

Differential Revision: D16022491

fbshipit-source-id: 02340080745f22c8ea8a2f80c2c08e3a88e37253
2019-07-01 11:06:42 -07:00
f7421b82ad Remove versions constraints from external_deps (#22113)
Summary:
As per attached tasks, these are noops and are being deprecated/removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22113

Reviewed By: philipjameson

Differential Revision: D15901131

fbshipit-source-id: 3acf12208f692548afe4844be13717a49d74af32
2019-07-01 10:55:30 -07:00
bfeff1eb8f Stubs for torch.nn (#19089)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19089

Differential Revision: D16073654

Pulled By: ezyang

fbshipit-source-id: 5642179651ce45ab7c5a46cc1fcc4fd6b37fa71c
2019-07-01 09:50:17 -07:00
a43d9af52c Comment on why Windows build_pytorch.bat builds twice (#22363)
Summary:
I've noticed that Windows CI seems to build twice, e.g., https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/60304/console

This adds a comment explaining why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22363

Differential Revision: D16073609

Pulled By: zou3519

fbshipit-source-id: ddb422b7c7e18cc436caff2c5838373a82f69429
2019-07-01 09:45:01 -07:00
451c907a47 Adding qconv unpack operator for serialization (#22354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22354

qconv weight unpack operator

Reviewed By: zafartahirov, jianyuh

Differential Revision: D16059668

fbshipit-source-id: b068b1a13bcf6a9148d864db384db780d474bfbf
2019-07-01 09:39:14 -07:00
f894ef7263 Add smoke test for information fn/method/attrs to test_namedtensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22341

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22341 gh/zou3519/66/head

Imported from OSS

Differential Revision: D16053440

Pulled By: zou3519

fbshipit-source-id: 400f2e1c136cd7db4346a42b58813e42595ca755
2019-07-01 07:24:54 -07:00
496e35f76b More named inference rules for pointwise unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22308

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22308 gh/zou3519/65/head

Imported from OSS

Differential Revision: D16053441

Pulled By: zou3519

fbshipit-source-id: 2e8d4cc11d7a711d2b789752a316a11fffc0996e
2019-07-01 07:24:51 -07:00
2a698682e4 Remove Type dispatch (#21964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964
ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f

Test Plan: Imported from OSS

Differential Revision: D15897424

Pulled By: li-roy

fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c
2019-06-30 04:11:35 -07:00
6c454ff14c Stop using Type in Python bindings (#21963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963
ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e

Test Plan: Imported from OSS

Differential Revision: D15897423

Pulled By: li-roy

fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572
2019-06-30 04:11:32 -07:00
9c8f9f0ecb Remove many usages of Type (#21941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941
ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1

Test Plan: Imported from OSS

Differential Revision: D15893331

Pulled By: li-roy

fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a
2019-06-30 04:11:28 -07:00
3cba9e8aaa Error Message Paraphrasing (#22369)
Summary:
Saying `I` in an err msg is too subjective to be used in a framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369

Differential Revision: D16067712

Pulled By: soumith

fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6
2019-06-30 00:13:02 -07:00
41e51ce142 Fix QNNPACK and NNPACK settings (#22367)
Summary:
`setup.py` recommends setting `USE_QNNPACK=0` and `USE_NNPACK=0` to disable building QNNPACK and NNPACK respectively. However this wasn't reflected correctly because we were looking for `NO_QNNPACK` and `NO_NNPACK`. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22367

Differential Revision: D16067393

Pulled By: soumith

fbshipit-source-id: 6491865ade9a6d41b7a79d68fd586a7854051f28
2019-06-29 21:15:59 -07:00
d8de69d621 Adds symbolic op for logsumexp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22306

Differential Revision: D16046027

Pulled By: houseroad

fbshipit-source-id: 7319fd58321220941250c5b8eff024914798e392
2019-06-29 00:09:06 -07:00
b52621c870 Revise error message for invalid Reduction (#22160)
Summary:
Say the user inputs reduction=False. Of course, we can't add a bool and a string, so the ValueError itself will error -which is more confusing to the user. Instead, we should use string formatting. I would use `f"{reduction} is not..."` but unsure whether we are ok with using f"" strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22160

Differential Revision: D15981826

Pulled By: soumith

fbshipit-source-id: 279f34bb64a72578c36bdbabe2da83d2fa4b93d8
2019-06-28 22:37:04 -07:00
9e18234109 Automatic update of fbcode/onnx to 806aa863020fa180e57f576cb032ec44ce8ddcca (#22359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22359

Previous import was 355a4954ea4e5836a5e943589509951c44feb6b4

Included changes:
- **[806aa863](https://github.com/onnx/onnx/commit/806aa863)**: Expose ONNX_ML build option to python (#2138) <bddppq>
- **[8f6e60db](https://github.com/onnx/onnx/commit/8f6e60db)**: Missing newline fix   (#2128) <Chris Seymour>
- **[d94f99d2](https://github.com/onnx/onnx/commit/d94f99d2)**: Avoid unnecessary copies of names by checker (#2098) <Scott McKay>
- **[01f77251](https://github.com/onnx/onnx/commit/01f77251)**: update qlinear conv test (#2120) <Ashwini Khade>
- **[1f0c13d3](https://github.com/onnx/onnx/commit/1f0c13d3)**: Add shape inference for LinearClassifier (#2077) <Hariharan Seshadri>
- **[eb798fcf](https://github.com/onnx/onnx/commit/eb798fcf)**: Fix inconsistency in describing graph's initializer. The initializer (#2115) <xykong58>

Reviewed By: bddppq, zrphercule

Differential Revision: D16061494

fbshipit-source-id: 6ccb63c135c27b307048aa42c11313675027ffb7
2019-06-28 22:22:24 -07:00
7cc8f37f56 Reduce needless copying when returning lists of tensors in the JIT interpreter. (#21690)
Summary:
This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21690

Differential Revision: D15783709

fbshipit-source-id: 23bb4acda6b60c27e95667e1d53c7d261a87167d
2019-06-28 19:00:05 -07:00
737f8a7638 Fix onnx passes (#22319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22319

The onnx pass replacing ints with Tensors produces an invalid JIT graph. It should only be called right before the onnx pass.
Also, it should only be called if we actually export to onnx.

Reviewed By: houseroad

Differential Revision: D16040374

fbshipit-source-id: e78849ee07850acd897fd9eba60b6401fdc4965b
2019-06-28 17:08:55 -07:00
e76c9751c4 Use lazy initialization in autograd record_function to avoid static (#22317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22317

About to add an observer that is also statically initialized in a different
file, so we need to enforce initialization order.

Reviewed By: ilia-cher

Differential Revision: D16012275

fbshipit-source-id: f26e57149a5e326fd34cb51bde93ee99e65403c4
2019-06-28 14:52:56 -07:00
3a198400f8 modify pool benchmarks
Summary: as title

Reviewed By: hl475

Differential Revision: D16058193

fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849
2019-06-28 14:35:23 -07:00
89c709d217 modify unary operators benchmark
Summary: as title

Reviewed By: hl475

Differential Revision: D16057665

fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa
2019-06-28 14:03:41 -07:00
6cf4df5d06 add PT softmax ops to the benchmark suite (#21208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208

The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite.

Reviewed By: zheng-xq

Differential Revision: D15526265

fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68
2019-06-28 13:58:20 -07:00
2132ea1d8d Fix "python: can't open file '.jenkins/pytorch/print_sccache_log.py': [Errno 2] No such file or directory"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22315

Test Plan: Imported from OSS

Differential Revision: D16049862

Pulled By: ezyang

fbshipit-source-id: d9c83208e6b5ee7eb009ddb585dbfa0ea1cbb9e6
2019-06-28 07:15:33 -07:00
042a2fd810 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 445
LARGE: 354

Updated actions:
From MEDIUM to LARGE: 21
From LARGE to XLARGE: 34
From LARGE to MEDIUM: 9
From XLARGE to MEDIUM: 1

Differential Revision: D16047893

fbshipit-source-id: 7afab2ef879277f114d67fd1da9f5102ec04ed7f
2019-06-28 04:13:06 -07:00
e259894e83 Test raising TypeError in torch.from_numpy() (#21607)
Summary:
With some additional cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21607

Differential Revision: D16046063

Pulled By: li-roy

fbshipit-source-id: 15256a0e94afea39db3cb581c546c2a18a8a7fda
2019-06-27 23:54:47 -07:00
0804452709 fix lint in torch/nn/quantized/modules/linear.py (#22325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22325

att

Reviewed By: bddppq

Differential Revision: D16042464

fbshipit-source-id: 0610896c08667fdaa95983f49140193ecb9ede16
2019-06-27 23:18:42 -07:00
1bea27be9d Remove three cpu sigmoid functions that are identical to IMPLEMENT_UNARY_OP_VEC (#22271)
Summary:
This does not occur in CUDA code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22271

Differential Revision: D16024605

Pulled By: bddppq

fbshipit-source-id: bb4f16bacbdc040faa59751fba97958f4c2d33cd
2019-06-27 23:05:05 -07:00
5e77111486 nn.quantized.Relu and nn.quantize.Quantize/DeQuantize modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21930

Differential Revision: D15554224

fbshipit-source-id: 1de9ac7412468106be60e53852c23318ead37bc6
2019-06-27 16:15:17 -07:00
6f0f7e316d Support building caffe2 with clang-cl on Windows (#22307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22307

MSVC-specific pragma doesn't silence the warning about throwing constructor and therefore `clang-cl` fails to compile this file. This diff fixes the problem by adding additional check for `clang` compiler.

Reviewed By: smessmer

Differential Revision: D16032324

fbshipit-source-id: 6dbce0ebf0a533d3e42b476294720590b43a8448
2019-06-27 15:43:38 -07:00
83768f0756 Add ONNX export support for multidim torch.sum. (#22240)
Summary:
This change fixes the issue reported in https://github.com/pytorch/pytorch/issues/22066.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22240

Reviewed By: zrphercule

Differential Revision: D15996934

Pulled By: houseroad

fbshipit-source-id: 3a842ba26f54aa710233fbe87d727fc1f2568d9c
2019-06-27 15:02:33 -07:00
2832e33a94 Add serialization for nn.quantized.Linear module (#21925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21925

att

Differential Revision: D15483071

fbshipit-source-id: 3a218dad5b653b38a0885339889ff70c75a13bef
2019-06-27 14:57:22 -07:00
5c46e701fc Implementation of nn.quantized.linear module (#21921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21921

Call FBGEMM kernels to implement quantized linear operator. This operator is used only for inference.

Differential Revision: D15375695

fbshipit-source-id: b9ca6c156fd60481fea83e55603b2897f7bfc3eb
2019-06-27 14:09:48 -07:00
7a40412158 Delay reduction of unused parameters until first autograd hook is called (#22219)
Summary:
Reduction of gradients for unused parameters should happen as soon as
possible, because they potentially block reduction of gradients for
used parameters. This used to happen instantly when
`prepare_for_backward` was called and it found parameters that didn't
contribute. This meant that if you have a model with unused
parameters, and you want to discard the model output (i.e. not call
backward on some loss), reduction of the gradients of those unused
parameters would have been kicked off, and you'd see an error the next
time you called `forward`.

In this commit, this original approach is slightly changed to delay
reduction of the gradients of those unused parameters until the first
autograd hook is called. This means that you can now discard the model
output regardless of the model having unused parameters or not.

This is a prerequisite for making the `find_unused_parameters`
argument to DDP default to `True`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22219

Differential Revision: D16028698

Pulled By: pietern

fbshipit-source-id: c6aec2cd39c4a77746495d9cb1c9fb9c5ac61983
2019-06-27 14:09:44 -07:00
ac39869370 Fixed list() not making a copy (#22093)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22087
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22093

Differential Revision: D16036814

Pulled By: Chillee

fbshipit-source-id: 3c7106f907415ed0f600acaf45d2c61e1c60867a
2019-06-27 13:55:43 -07:00
b1096995d5 Update ThroughputBenchmark to reflect new script::Module API (no (#22291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22291

There was a race between landing the benchmark diff and
https://github.com/pytorch/pytorch/pull/21934 from zdevito. This PR
should fix the issue.

Reviewed By: zdevito

Differential Revision: D16023640

fbshipit-source-id: 931714352e656f045f9ef3cd17422db51b168384
2019-06-27 12:57:27 -07:00
177b8bf6e7 Named inference rule for more pointwise ops. (#22268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22268
ghimport-source-id: c722f9fbb3fc529c872dcccbf58ba1a8c5fcda8e

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

Imported from OSS

Differential Revision: D16030549

Pulled By: zou3519

fbshipit-source-id: 5cbb2c8626335a32a22ed8079245a5faa7cf553f
2019-06-27 12:49:36 -07:00
7732b1a604 Enable named inference for some unary pointwise ops (#22267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22267
ghimport-source-id: 1566df9a20712cada6ea1209e000c5ff757daa14

Test Plan: Imported from OSS

Differential Revision: D16030550

Pulled By: zou3519

fbshipit-source-id: 183ca1d14dc0fb6f1ee6e114b48c2703c61e11ce
2019-06-27 12:49:32 -07:00
69b702a6eb Implement unify_from_right (#22223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22223
ghimport-source-id: b88bd2a13c1c9c699945a69ec05300c6e598e95a

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

gh-metadata: pytorch pytorch 22223 gh/zou3519/62/head

Imported from OSS

Differential Revision: D16030551

Pulled By: zou3519

fbshipit-source-id: f3d53e3f9b2428a4926c61a02631e6cd29f89e4b
2019-06-27 12:49:29 -07:00
6386e4d244 Named inference rule for abs. (#22151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22151
ghimport-source-id: 54c1726b578ac162af817f78df6f540b764e46e3

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

Imported from OSS

Differential Revision: D15970326

Pulled By: zou3519

fbshipit-source-id: 4ea25f0a73bbc24b604d3ded2027eeb4ce800de0
2019-06-27 12:49:25 -07:00
2913f6a26d Adding modules for Python 3 compatibility (#22295)
Summary:
To improve the python 3 compatibility and make linter happy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22295

Reviewed By: zrphercule

Differential Revision: D16024957

Pulled By: houseroad

fbshipit-source-id: c0eddf731891b2f547ba619b3c2f6b2d7a32f034
2019-06-27 12:06:40 -07:00
6947e192f7 Remove unused param in Caffe2 LayerNormGradientOp (#22282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22282

Remove unused param in Caffe2 LayerNormGradientOp

Reviewed By: bddppq, houseroad

Differential Revision: D16017117

fbshipit-source-id: bdd0bd2aca009e549dfd2bf622494dfc791589e3
2019-06-27 11:22:44 -07:00
be0631b6ee Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D16023573

fbshipit-source-id: 3ea9bd905090e2a176af654a8ca98c7d965ea679
2019-06-27 11:08:18 -07:00
c9626a11cc Made a += b for lists do an in place add (#21896)
Summary:
In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()`

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:
```
Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.
```
Previously, `batch_gather jit` took 3x as long as `batch_gather`.

Some logic taken from https://github.com/pytorch/pytorch/pull/21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`.

Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21896

Differential Revision: D15998628

Pulled By: Chillee

fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3
2019-06-27 10:59:24 -07:00
bf677b8849 Set MKLDNN (default) build variables in CMakeLists.txt, not in Python build scripts (#22215)
Summary:
This is yet another step to disentangle Python build scripts and CMake
and improve their integration (Let CMake handle more build environment
detections, and less by our handcrafted Python scripts).

The processor detection logic also changed a bit: Instead of detecting
whether the system processor is PPC or ARM, this PR changes to detect
Intel CPUs, because this is more precise as MKL only supports Intel
CPUs. The build option `USE_MKLDNN` will also not be presented to
users on non-Intel processors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215

Differential Revision: D16005953

Pulled By: ezyang

fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763
2019-06-27 10:21:55 -07:00
d2bad941f4 Fix lint issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22303

Differential Revision: D16030302

Pulled By: ifedan

fbshipit-source-id: 5564f6f810382f31f9416e5881978b03f51e53a9
2019-06-27 09:27:16 -07:00
zaf
e9d1b852c4 Functional conv2d (#21225)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21225 Functional conv2d**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15544061/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225

Test Plan:
`buck test mode/dev caffe2/test:quantized -- test_conv_api`: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929

```
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.1 sec
Building: finished in 5.1 sec (100%) 6958/6958 jobs, 2 updated
  Total time: 6.3 sec
Trace available for this run at /tmp/testpilot.20190603-163323.4026295.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 17661db57af88ec71497f5c21efa86531c07662b fbpkg ce57c6c1c73f45c4aa890e9df65820c3 at Sat Jun  1 17:06:32 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/625/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 6.962 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
Summary (total time 10.65s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: dskhudia

Differential Revision: D15544061

Pulled By: zafartahirov

fbshipit-source-id: 700c0c78b5915bf7e54bda7c44f44b7b1e247f4d
2019-06-27 09:19:54 -07:00
59c42595e0 Enabled gather and scatter for bool tensor (#21924)
Summary:
- moving stuff around in order to enable bool.
- Added implementation of atomicAdd(bool, bool)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924

Differential Revision: D15883711

Pulled By: izdeby

fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e
2019-06-27 09:07:50 -07:00
f13fadd510 fix python2 corner-case in torch.distributed.launch (#20996)
Summary:
Small fix for the comment raised in 4cf76574b9 (r33134850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20996

Differential Revision: D15991510

Pulled By: pietern

fbshipit-source-id: 4e5a35864b5a4ec9402aa83a19c4a3ba0df2f01f
2019-06-27 05:19:37 -07:00
f39b6624ba ChunkDataset checkpoint support (#21889)
Summary:
When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint.

This change adds support for checkpoint save/load logic in ChunkDataset.

On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint.

The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889

Differential Revision: D16024582

Pulled By: ailzhang

fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5
2019-06-26 22:54:14 -07:00
30d890c672 Removed an outdated comment above IMPLEMENT_UNARY_OP_VEC(abs) (#22272)
Summary:
due to 82b570528db0a43fc04bb90f5d4538c01e4a5582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22272

Differential Revision: D16024544

Pulled By: bddppq

fbshipit-source-id: 37955bff3301975c0dd6abde8a3ba79af0555111
2019-06-26 22:24:13 -07:00
f144b9ebef Fix two overindent lint errors in test/common_nn.py. (#22287)
Summary:
This keeps causing lint tests to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22287

Differential Revision: D16024524

Pulled By: bddppq

fbshipit-source-id: a3e3780a55943283e9c854e94ac06ea4715e5319
2019-06-26 21:41:41 -07:00
e6d4a2d289 Remove unused file cmake/Modules/FindMIOpen.cmake (#22244)
Summary:
`cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244

Differential Revision: D16000771

Pulled By: bddppq

fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34
2019-06-26 21:21:46 -07:00
5e0a74dd70 Rename copy_tensor_data to copy_tensor_metadata (#22266)
Summary:
The original name `copy_tensor_data` could be confusing because users are not sure whether it deep-copies data in the tensor's storage or just copies the tensor's metadata. The renaming makes it more clear.

cc. ailzhang This might break XLA build, but I think the renaming makes it more clear why we use `copy_tensor_data` in XLATensorImpl's shallow-copy functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22266

Differential Revision: D16014724

Pulled By: yf225

fbshipit-source-id: f6ee966927d4d65d828b68264b3253b2f8fd768d
2019-06-26 21:16:57 -07:00
45c6fa0007 Refactor Tests for Multiple ONNX Opsets (#20036)
Summary:
Refactor tests for https://github.com/pytorch/pytorch/pull/19294.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20036

Reviewed By: zrphercule

Differential Revision: D16016593

Pulled By: houseroad

fbshipit-source-id: eaae324e347679acf3d0ac1c14be03919f54496e
2019-06-26 17:06:57 -07:00
f51de8b61a Back out "Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark" (#22185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22185

Original commit changeset: 72a0eac1658b

Differential Revision: D15981928

fbshipit-source-id: d2455d79e81c26ee90d41414cde8ac0f9b703bc3
2019-06-26 16:05:51 -07:00
3f2a839dda Add comments to bailoug_graph.*
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22161

Differential Revision: D15975355

Pulled By: Krovatkin

fbshipit-source-id: dca0095b4f05cff8277663ad38b65eeb44417f40
2019-06-26 15:39:38 -07:00
04fe2453c4 conv2d/conv3d for LongTensor (#20730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20730

Generates forward conv2d function for LongTensor

Differential Revision: D15423753

fbshipit-source-id: 0e770b61257cc4c6559581796bf104ef68155c84
2019-06-26 15:29:56 -07:00
3ba72a11db Revert D15999938: [jit] Add the rest of the dict API
Differential Revision:
D15999938

Original commit changeset: 7bc2a55e3f79

fbshipit-source-id: e377c00e990d6f058960936e69712b77851c06fa
2019-06-26 14:16:37 -07:00
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
8b02522b93 Avoid copy in ArrayRef<->vector comparison (#22218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22218

-

Differential Revision: D15990763

fbshipit-source-id: 53c98f915fadc8a65aea896c80292d5804d967a4
2019-06-26 13:36:21 -07:00
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
5bdc4db26e Refactor named tensor helper code (#22150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22150
ghimport-source-id: 460022febc24f49b86f0d5dbf8dc227564bde6cb

Test Plan: Imported from OSS

Differential Revision: D15970325

Pulled By: zou3519

fbshipit-source-id: 86a3e3ca82bbf4ff815431e25c5f9a35fcd23be0
2019-06-26 11:33:29 -07:00
29b53b0259 Fix bug in caffe2 transpose on GPU (#22233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233

Fix bug in caffe2 transpose on GPU

Reviewed By: hl475

Differential Revision: D15994973

fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d
2019-06-26 11:33:25 -07:00
2dc9643080 Better error message for mismatched dict key type (#22231)
Summary:
](https://our.intern.facebook.com/intern/diff/15993936/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22231

Pulled By: driazati

Differential Revision: D15993936

fbshipit-source-id: 6822ef01477a3b32beb8c037a621fa71abd022c8
2019-06-26 10:46:45 -07:00
af9e0085f2 Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D15999938

fbshipit-source-id: 7bc2a55e3f791015a0ff2e3731703075cf0770ee
2019-06-26 10:40:29 -07:00
25eae3ed08 Disable test_proper_exit flaky worker_kill (#22208)
Summary:
I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208

Differential Revision: D15990307

Pulled By: soumith

fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7
2019-06-26 09:47:40 -07:00
a4f281446b introduce flags to set omp and mkl threads (#21472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472

as title

Reviewed By: hl475

Differential Revision: D15695846

fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc
2019-06-26 09:33:05 -07:00
5f84f372a6 Use variable_data() in tensor_to_numpy (#22214)
Summary:
As part of the Variable/Tensor merge, we want to gradually remove call sites of `tensor_data()` and the API itself, and instead uses `variable_data()`. This PR removes the `tensor_data()` call in the tensor_to_numpy conversion path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22214

Differential Revision: D15997397

Pulled By: yf225

fbshipit-source-id: 6fcab7b14e138824fc2adb5434512bcf868ca375
2019-06-26 08:57:47 -07:00
f176950a67 Use lower case for strong wolfe option. (#22092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092
ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5

Test Plan: Imported from OSS

Differential Revision: D15955996

Pulled By: vincentqb

fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e
2019-06-26 08:20:25 -07:00
9f22805cc6 Refactor function_wrapper.create_generic (#22077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22077
ghimport-source-id: 39cf0a2e66e7fa2b6866af72782a22a4bd025e4c

Test Plan:
- Compared the build/aten/src folder before and after this change
locally and verified they are identical (`diff -r`).
- Wait for CI + Also, [namedtensor ci]

Imported from OSS

Differential Revision: D15941967

Pulled By: zou3519

fbshipit-source-id: d8607df78f48325fba37e0d00fce0ecfbb78cb36
2019-06-26 08:20:21 -07:00
b297552887 Make nn functions configurable for different scalar types (#20729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20729

Currently there is no way to specify what scalar types each nn function will support.

This change will allow to specify supported scalar types for each function/backward function and device. By default each function will support Float, Double, Half.

If you want to scpecify any extra supported scalar types, other then default, you will need to change nn.yalm:

- name: _some_func(Tensor self)
  cname: SomeFunction
  CPU:
    forward_scalar_types: ['Float', 'Double', 'Long']
    backward_scalar_types: ['Float', 'Double']

Differential Revision: D15423752

fbshipit-source-id: b3c157316d6e629bc39c1b377a3b23c71b1656cf
2019-06-26 07:53:38 -07:00
95b5718007 Prevent VS from emitting errors when using swap in Optional.h (#22182)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21706
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22182

Differential Revision: D15981740

Pulled By: ezyang

fbshipit-source-id: d58b3ca3aea8d3d383150208b87fa4bbd4f6fe33
2019-06-26 07:29:35 -07:00
fde75a33e1 update IterableDataset doc to be consistent with current behavior
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230

Differential Revision: D15994680

Pulled By: ezyang

fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4
2019-06-26 06:49:22 -07:00
655a370859 restoring HEADs for ideep and onnx to more recent versions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22250

Differential Revision: D16003227

Pulled By: Krovatkin

fbshipit-source-id: bf906a8e9e5e0f79391e5984c6cdfb9638d84981
2019-06-26 02:19:17 -07:00
17b37eb353 Bump gloo (#22225)
Summary:
This includes:
* Removal of builder classes
* Add allgatherv
* Add bcube allreduce algorithm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22225

Differential Revision: D16003629

Pulled By: pietern

fbshipit-source-id: fd062b82bfeeddb8190206d9931a781c7daff6f9
2019-06-26 00:55:36 -07:00
c1fc2f25c2 export deleteFunction in torch/csrc/autograd/function.h (#22236)
Summary:
In `torch/csrc/autograd/function.h` we define `torch::autograd::Function`, a (the?) central autograd record-holding class. `Function` is declared public API (`TORCH_API`).

We also define a custom deleter `deleteFunction` which we use throughout PyTorch's own use of `Function`. This trivial PR declares the deleter public API as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22236

Differential Revision: D16001335

Pulled By: yf225

fbshipit-source-id: 6ef0a3630e8f82f277a0e6e26cc64455ef7ee43e
2019-06-25 20:46:09 -07:00
e8bc992b03 print device when it's not on default device (#22094)
Summary:
we used to not print device when it's on xla. It's sometimes confusing as it looks the same as cpu tensor...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22094

Differential Revision: D15975405

Pulled By: ailzhang

fbshipit-source-id: f19ceb9e26f5f2f6e7d659de12716f0dfe065f42
2019-06-25 20:28:50 -07:00
1a164bf30b remove unused mkldnn include (#22217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22217

Seems introduced in https://github.com/pytorch/pytorch/pull/19209/files which is
no longer used. Remove it to simplify mobile build.

Reviewed By: bddppq

Differential Revision: D15983344

fbshipit-source-id: 37ee0bfbd022da09af6bc44c6e3fec1c99a8e732
2019-06-25 17:45:39 -07:00
de85abf226 Allow default construction of Dict/List (#22084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084

For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.

Differential Revision: D15948098

fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
2019-06-25 17:40:48 -07:00
e425789286 Fix "missing return statement" warning (#22216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216

-

Differential Revision: D15989670

fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043
2019-06-25 16:57:42 -07:00
f7a126f941 fix optional type subtype relation (#22186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22186
ghimport-source-id: 05ef8c3a176fe2a67d4835888e6db52b57a6d199

Test Plan: Imported from OSS

Differential Revision: D15994644

Pulled By: wanchaol

fbshipit-source-id: 7c5c4eebd421f6c9470661c2c2eb38bafdff8bbd
2019-06-25 16:57:38 -07:00
defd23b8b9 Clean up old uses of checkScript (#22002)
Summary:
This cleans up the `checkScript` API and some old tests that were hardcoding outputs. It also now runs the Python function when a string is passed in to verify the outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22002

Differential Revision: D15924485

Pulled By: driazati

fbshipit-source-id: ee870c942d804596913601cb411adc31bd988558
2019-06-25 16:24:19 -07:00
7b1ffba3bf ArgumentStash for Scalar arguments (#21931)
Summary:
Scalars are being traced as constants.
This PR is to fix this issue.

The ONNX Graph for Test_Full_op() before and after this change:

def Test_Full_op():
  class Test_Full(nn.Module):
    def forward(self, x):
      return torch.full((3, 4), x, dtype=torch.long)
  model = Test_Full()
  x = torch.tensor(12)
  output = model(x)

Before this change:
graph(%input1 : Long()):
%output1 : Float(3, 4) = onnx::Constant[value=<Tensor>]
return (%output1)

After this change:
graph(%input1 : Long()):
%1 : int[] = onnx::Constant[value= 3 4 [ Variable[CPULongType]{2} ]]
%2 : Tensor = onnx::ConstantOfShape[value={0}]
%output1 : Float(3, 4) = onnx::Add(%2, %input1)
return (%output1)

Similar PR : https://github.com/pytorch/pytorch/pull/12939
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21931

Reviewed By: zrphercule

Differential Revision: D15950066

Pulled By: houseroad

fbshipit-source-id: 3470665d88fa34faa600940ef16b069a06002cd5
2019-06-25 15:22:08 -07:00
7ee82d48a8 Removed work around for convolution transpose op since the bug has be… (#22184)
Summary:
…en fixed in v0.18
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184

Differential Revision: D15982627

Pulled By: bddppq

fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638
2019-06-25 14:34:34 -07:00
5b87049c66 remove uses of std::shared_ptr<Module> (#21934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21934
ghimport-source-id: e64ab9096f43749ead3ac5567675b815da295664

Test Plan: Imported from OSS

Differential Revision: D15892401

Pulled By: zdevito

fbshipit-source-id: 6424139206593ff944556c69d8a54723884eacaf
2019-06-25 13:24:38 -07:00
1d705b4b07 Run clang-format on c10d bits (#22194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22194

TSIA

Differential Revision: D15983780

fbshipit-source-id: 1365bcf9bbc262a3657f646e81d2fc9c32f24c97
2019-06-25 12:34:52 -07:00
f5a1ea170b SIMD version average pooling added (#22148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22148

Average pooling is added into dnnlowp optimization code.

Reviewed By: jspark1105

Differential Revision: D15936556

fbshipit-source-id: 6177ee62529801898f230c6fb89e9c4b598593a5
2019-06-25 12:19:21 -07:00
a7cb07eb0f Add missing algorithm header to Array utility (#22157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22157

This header uses `std::swap_ranges` function which is defined in `<algorithm>` header (https://en.cppreference.com/w/cpp/algorithm/swap_ranges). Therefore this file isn't guaranteed to compile on all platforms.

This diff fixes the problem by adding the missing header.

Reviewed By: smessmer

Differential Revision: D15971425

fbshipit-source-id: e3edcec131f72d729161f5644ee152f66489201a
2019-06-25 12:19:17 -07:00
6ff0c6ca3f Remove THD (#22065)
Summary:
It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it.

Closes https://github.com/pytorch/pytorch/issues/18967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065

Reviewed By: mrshenli

Differential Revision: D15983669

Pulled By: pietern

fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165
2019-06-25 12:19:13 -07:00
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
4ec6fbefa6 Show deprecation warning when stateful lambdas are used as kernels (#21885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21885

If a kernel is defined as a stateful lambda

    static auto registry = torch::RegisterOperators().op("my::op", [some_closure] (Tensor a) {...});

this can have very unexpected behavior when kernels are instantiated. There is no guarantee that the state is kept.

In the options based API, state is already disallowed:

    // this is a compiler error
    static auto registry = torch::RegisterOperators().op("my::op", torch::RegisterOperators::options().kernel([some_closure] (Tensor a) {...}));

but we can't disallow it in the non-options-based API for backwards compatibility reasons.

We can, however, show a deprecation warning. This is what this diff introduces.

Differential Revision: D15867089

fbshipit-source-id: 300fa4772fad8e7d177eb7cb910063d360537a4a
2019-06-25 11:53:18 -07:00
c68119387d serialize torch.Size object (#20952)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20952

Differential Revision: D15514274

Pulled By: ailzhang

fbshipit-source-id: 8340a40fadfd06063f7f33b0d99d693e74d5defb
2019-06-25 10:44:35 -07:00
7daa96a3ce porting convtranspose3d to ATen (#22019)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/18353

CPU and GPU porting for convolution transpose 3d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22019

Differential Revision: D15985353

Pulled By: ezyang

fbshipit-source-id: 1c579577a32db24a1ce38f5ab9b3f1cb9c8f2a6e
2019-06-25 10:22:34 -07:00
9af8ea1ce5 Not expose mkldnn reshape and transpose (#22193)
Summary:
This PR is to make mkldnn reshape and transpose not exposes as Tensor API, please see the comments in https://github.com/pytorch/pytorch/pull/21943.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22193

Differential Revision: D15983434

Pulled By: bddppq

fbshipit-source-id: ad3514dfd8a3b0d89442eef752864e5d3f3d04f0
2019-06-25 09:52:47 -07:00
mal
c8b5f1d2f8 Switch autograd to use a pool of workers for each device (#21911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911
ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f

Test Plan:
Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error.
Performance testing:
[word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change)
[mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change)
[Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected)

Imported from OSS

Differential Revision: D15985852

fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c
2019-06-25 09:08:26 -07:00
94e83da55c Optimization of the Embedding and Embedding-Bag CUDA Kernel (#22016)
Summary:
Re-implementation of the `embedding_dense_backward_cuda()` and the `embedding_bag_backward_cuda_sum_avg()` functions.

#### Performance
Running a [Mortgage Workflow](https://github.com/EvenOldridge/MortgageWorkflowA) with a block size of 100K on a DXG-2 (single GPU), we see a 270% speedup:
```
Original version:    370,168 example/s
Optimized version: 1,034,228 example/s
```
The original version is bounded by the `EmbeddingBag_accGradParametersKernel_sum_avg`, which takes 70% of the CUDA execution time. In the optimized version, the optimized kernel now takes only 17% of the time.

#### Greater Numerical Stability
An added benefit is greater numerical stability. Instead of doing a flat sum where a single variable are used to accumulate the weights, this code uses two-steps where each GPU-thread computes a sub-result defined by `NROWS_PER_THREAD` before the final result are accumulated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22016

Differential Revision: D15944339

Pulled By: mrshenli

fbshipit-source-id: 398d5f48826a017fc4b31c24c3f8b56d01830bf0
2019-06-25 08:14:15 -07:00
b0bd8758fc Further remove redundant CMake option passing code for those CMake variables that are directly controlled by environment variables but with a different name. (#22154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22154
ghimport-source-id: 714b98566e70063925c4c9e10940a4fe46fb5a3d

Test Plan: Imported from OSS

Differential Revision: D15985376

Pulled By: ezyang

fbshipit-source-id: 60710125009cd8bf60b5600a3f05854d931d9844
2019-06-25 07:23:06 -07:00
ce1a9653a8 Remove more build options not needed to be explicitly set in Python build scripts. (#22153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22153
ghimport-source-id: 129d90626a8e64079477a744fbbaba58e139a852

Test Plan: Imported from OSS

Differential Revision: D15985375

Pulled By: ezyang

fbshipit-source-id: 925bb1c886633b002beb1da0754bb055aa971e21
2019-06-25 07:23:03 -07:00
839b496fbd Fixes bugs in torch.multinomial without replacement (#22183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22183
ghimport-source-id: f03c17178de115adbe983953a8f9f205e3df7721

Test Plan: Imported from OSS

Differential Revision: D15985324

Pulled By: ezyang

fbshipit-source-id: 6e9dc3b54d448f4bb374b004d7f1dd1ac5c014f6
2019-06-25 07:15:18 -07:00
b61693c0ed Optimize InstanceNormOp forward (#22130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22130

Optimize InstanceNormOp forward

For InstanceNormOp on CPU with order = NHWC, N = 128, C = 256, H = 56, W = 56: 183ms -> 115ms.

For InstanceNormOp on GPU with N = 256, C = 256, H = 112, W = 112:
NCHW: 1475ms -> 45ms
NHWC: 1597ms -> 79ms

Reviewed By: houseroad

Differential Revision: D15963711

fbshipit-source-id: 3fa03109326456b9f301514fecbefa7809438d3e
2019-06-25 01:04:53 -07:00
ac4913ee62 support both regularizable and sofmax re-weighting on sparse features in dot product (#22176)
Summary:
In order to select more important features in dot product among a list of candidate sparse features, we can assign one learnable weight on each feature, reweight each feature by multiplying the weight onto its embedding before dot product. We finally select features based on the weight magnitude after training.

We can perform L1 and/or L2 regularization on the weights. To summarize, the weights tend to shrink their values (avoiding overfitting) due to L2 regularization, and some weights will vanish to zero as L1. To avoid sparse feature embedding being ignored due to early collapse of weights, a piece lr warm up policy is used in optimizing regularization term, such that regularization is weak at first stage and gets stronger afterwards (a small lr constant in iters less than threshold 1, a medium lr constant in stage 2, and a final reasonable large lr constant in all iters after threshold 2). The features with nonzero and relatively large weights (in absolute value) will be selected for the module.

We can also apply softmax on the original weights to make it sum to 1. We can even boosting the softmaxed weights by multiply the number of softmax components, which essentially make them sum to the number of softmax components and avergae to 1. In this idea, all the weights are positive and sum to a constant. Regularization is not a must since we can count on the competition between softmax weights themselves to achieve reasonable re-weighting. We expect those weights be more dense, comparing with sparse ones from L1 regularization and we can select features based on top K weights.

Overall, we aim to demonstrate the selected feature set outperform current v0 feature set in experiments. Special acknowledgement goes to Shouyuan Chen, who initiated the work of regularizable weighting.

 ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22176

The diff will export updates to Github repository, as stated below.

{F162787228}

Basically, the updates on the files are summarized as below:

- adding logger messages
`caffe2/python/layer_model_helper.py`
- add ElasticNet regularizer, which combines both L1 and L2 regularization
`caffe2/python/regularizer.py`
- implement piecewarmup, specifically warm up with three constant pieces
`caffe2/sgd/learning_rate_functors.h, caffe2/sgd/learning_rate_op.cc, caffe2/sgd/learning_rate_op.h`

Differential Revision: D15923430

fbshipit-source-id: ee18902cb88c23b1b7b367cc727d690a21e4cda9
2019-06-24 21:27:33 -07:00
299ea84a70 Use latest stable flake8-bugbear in CI and fix B011 flake8 error. (#21944)
Summary:
- PyCQA/flake8-bugbear#53 has been fixed (but not yet closed on their side) and a new version of flake8-bugbear has been released on Mar 28, 2019. Switch CI to use the latest stable version.
- Fix the new B011 errors that flake8-bugbear catches in the current codebase.

 ---

B011: Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21944

Differential Revision: D15974842

Pulled By: soumith

fbshipit-source-id: de5c2c07015f7f1c50cb3904c651914b8c83bf5c
2019-06-24 20:48:15 -07:00
f5df0c9104 Don't end on inplace operators in einsum (#22111)
Summary:
Returning the result of an inplace `squeeze_` in `einsum` (which itself is traced) interacts badly with `autograd.Function`.

I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall.

Fixes: https://github.com/pytorch/pytorch/issues/22072
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22111

Differential Revision: D15974990

Pulled By: soumith

fbshipit-source-id: 477e7f23833f02999085f665c175d062e7d32acd
2019-06-24 20:39:20 -07:00
ede08492e1 Enabled mul for bool tensors on CUDA (#21771)
Summary:
Enable mul_cuda for bool tensors.
This is a helper PR to fix a [test failure](https://circleci.com/gh/pytorch/pytorch/1992191?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) in the other [PR](https://github.com/pytorch/pytorch/pull/21113).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21771

Differential Revision: D15883737

Pulled By: izdeby

fbshipit-source-id: 4c39644bbe8e80da4d14570862589944285d4bfe
2019-06-24 18:37:29 -07:00
3b700a43d5 Add missing whitespace in error message (#21904)
Summary:
The current error message displays as:
     `RuntimeError: index koccurs twice in output`
A whitespace is missing between the index and 'occurs'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21904

Differential Revision: D15878941

Pulled By: colesbury

fbshipit-source-id: 163dda1829bf4956978cd01fd0e751673580722d
2019-06-24 15:32:46 -07:00
88cdc16835 AveragePool: expand incomplete kernel_size for the C++ API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22075

Differential Revision: D15945260

Pulled By: mrshenli

fbshipit-source-id: 827660c19ebbdb5f0aae2f4eadb6025ae2f93674
2019-06-24 15:32:41 -07:00
2372e7ed2e DilatedMaxPool: expand incomplete kernel_size for the C++ API (#22073)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22032.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22073

Differential Revision: D15944471

Pulled By: mrshenli

fbshipit-source-id: 84b265be00d67aa7f13508ede0646763d2339f1d
2019-06-24 15:32:36 -07:00
b2a39314e7 Make Dropout.__repr__ consistent with other modules (#22110)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22106.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22110

Differential Revision: D15958821

Pulled By: ezyang

fbshipit-source-id: 89381dc3bfa79544580e20fea906cef4f5101b61
2019-06-24 15:27:06 -07:00
273b6c5bae Cast return value of vector.at() to void to avoid nodiscard warning in MSVC. (#22061)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22061

Differential Revision: D15957983

Pulled By: ezyang

fbshipit-source-id: e4416c5f0db2bc6b8bfaa27be52b942148ec7b3d
2019-06-24 15:27:02 -07:00
0ac28c8966 Quick fix for #18215, the CPU case (#21910)
Summary:
The bug is that when target_length == 0, there is no preceding BLANK state and the original implementation will lead to out of bound pointer access.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21910

Differential Revision: D15960239

Pulled By: ezyang

fbshipit-source-id: 7bbbecb7bf91842735c14265612c7e5049c4d9b3
2019-06-24 15:26:58 -07:00
41d0525de3 Improve repr for IncompatibleKeys (#22119)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20128.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22119

Differential Revision: D15961965

Pulled By: ezyang

fbshipit-source-id: 9cc397726e6bea5580e79d291cfc1ee75337fa0c
2019-06-24 15:26:54 -07:00
f1775796dd Fix minor issues with #21736 (#22074)
Summary:
cc mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22074

Differential Revision: D15965376

Pulled By: mrshenli

fbshipit-source-id: 50ff96de6390817d8ea52c04322c6bee3d649b32
2019-06-24 15:18:26 -07:00
a45898931c Document the Boolean tensor type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21601

Differential Revision: D15971573

Pulled By: gchanan

fbshipit-source-id: c07c57f989980149cb1307dcca6ba64dce52d0ef
2019-06-24 14:16:36 -07:00
7c4206499e Fix in ivalue::Future (#22114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22114
ghimport-source-id: 249b76c078e7af8ebb6cab113dd48dbd3e31e8dc

Test Plan:
ran intra_inter_benchmark with PARALLEL_BACKEND=NATIVE build

Imported from OSS

Differential Revision: D15958901

Pulled By: ilia-cher

fbshipit-source-id: 1c3dedc4cf1ff8166aeb26899a06c7287a499562
2019-06-24 12:56:46 -07:00
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
21da33f0f9 Better trace comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22090

Differential Revision: D15968440

Pulled By: Krovatkin

fbshipit-source-id: e55e03a4303adbaa576c4384e7a42410bd99da6e
2019-06-24 12:51:27 -07:00
f1c7fa0503 De-deprecate some warnings that hurt usability (#21999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21999
ghimport-source-id: a77b3aea3d3ed33f328e143203730f2655371837

Test Plan: Imported from OSS

Differential Revision: D15925892

Pulled By: bwasti

fbshipit-source-id: 2b4e0af40bc1c6d12c617ba8701d3a5f7a6d833d
2019-06-24 12:35:00 -07:00
2347a4032b Fix tracing docs and add more comprehensive examples (#22082)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21857
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22082

Differential Revision: D15968306

Pulled By: Krovatkin

fbshipit-source-id: a76e500b0b7192bd814931ec48bbe9c37b8b92e0
2019-06-24 12:10:19 -07:00
85cbe0d825 Fix Concat Dimension Bug (#22088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22088

This diff is similar to D14163001. We need to handle the edge case when add_axis=1.

Reviewed By: jspark1105

Differential Revision: D15949003

fbshipit-source-id: 328d1e07b78b69bde81eee78c9ff5a8fb81f629b
2019-06-24 10:32:48 -07:00
322261a4de Fix dispatching of backwards kernel for ROCm. (#22125)
Summary:
Use WARP_SIZE consistently also for the dispatch dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22125

Differential Revision: D15966661

Pulled By: bddppq

fbshipit-source-id: 93eb663e01aff3b49474504a2f96f060919edf0c
2019-06-24 10:32:44 -07:00
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
6edaa11e5a fix broken link
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22064

Differential Revision: D15951107

Pulled By: mrshenli

fbshipit-source-id: 0b8f97bd2bbac26855cd2889e1fc619770974ee2
2019-06-24 07:34:16 -07:00
77eda8de8e Support sparse gradients in DistributedDataParallel (#22037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22037

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15926383

fbshipit-source-id: 39c0d5dbd95bf0534314fdf4d44b2385d5321aaf
2019-06-24 07:34:12 -07:00
a7ec889de4 Add sparse tensor allreduce (#22036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22036

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

This is a resubmission of #19146.

Reviewed By: mrshenli

Differential Revision: D15926384

fbshipit-source-id: b6ee5d81606bfa8ed63c3d63a9e307613491e0ae
2019-06-24 07:34:09 -07:00
313960d52e Use at::detail::* instead of detail::* to avoid ambiguity in windows (#22029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22029
ghimport-source-id: d1a26a07faf101c644775267a141ba56cbd3f1c9

Test Plan: Imported from OSS

Differential Revision: D15965039

Pulled By: ezyang

fbshipit-source-id: 31baf405da6f7c6d9e31f5954ec827889dadf769
2019-06-24 07:18:02 -07:00
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
cd0d8480d3 Remove many build options redundantly specified in Python build scripts. (#21877)
Summary:
Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost.

For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877

Differential Revision: D15964996

Pulled By: ezyang

fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007
2019-06-24 07:17:54 -07:00
1b34ccfc78 Porting SpatialDilatedConvolution and VolumetricDilatedConvolution to ATen (#20983)
Summary:
This PR tackles issue https://github.com/pytorch/pytorch/issues/18352 .

Progress:
- [x] conv_dilated2d CPU
- [x] conv_dilated3d CPU
- [x] conv_dilated2d CUDA
- [x] conv_dilated3d CUDA
- [x] RocM port
- [x] Port of CUDA gemm and gemv
- [x] Refactored 2d and 3d functions as well as output and gradient computations into a single C++ template function
- [x] Cleanup
  + [x] eliminate forward functions
  + [x] eliminate buffers `columns` and `ones` from functions API
  + [x] eliminate out functions
  + [x] eliminate using `ones`

Note that col2im, im2col, col2vol, vol2col implementations are exposed in `ATen/native/im2col.h` and `ATen/native/vol2col.h`. The corresponding operators (not ported in this PR) should use these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20983

Differential Revision: D15958088

Pulled By: ezyang

fbshipit-source-id: 1897f6e15abbf5710e9413cd1e443c2e1dc7d705
2019-06-24 07:12:54 -07:00
3ba654e6d5 Add finding thnvrtc_library into torchconfig.cmake (#22126)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/21861#issuecomment-504805368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22126

Differential Revision: D15964930

Pulled By: ezyang

fbshipit-source-id: 0fb749784bec9af5a8ccbcf775fa7d9d4d34a4c6
2019-06-24 07:04:44 -07:00
08060e898b Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark
Differential Revision:
D15435461

Original commit changeset: db08829dc3f4

fbshipit-source-id: 72a0eac1658b2d3f885bc9a21c49fcc23030ae3e
2019-06-23 22:55:05 -07:00
d96ce9b9fe add for in dict support (#22006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22006
ghimport-source-id: d9686c0b61b0eea3787f48adce567249e4e8faf0

Test Plan: Imported from OSS

Differential Revision: D15948548

Pulled By: wanchaol

fbshipit-source-id: 4227502ca050099085ad481aef725ac2cab06d74
2019-06-23 20:49:35 -07:00
c9344fc9c4 add for in string support (#21990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21990
ghimport-source-id: 69b4882f8602c4088e7a833c43fd3cd37501a3c0

Test Plan: Imported from OSS

Differential Revision: D15948547

Pulled By: wanchaol

fbshipit-source-id: 057e7f4fb67c6dca98458ceb14414368e1a86260
2019-06-23 20:49:30 -07:00
eab35756d8 support iteration tuple unpacking (#21985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21985
ghimport-source-id: 1f20a8db7b6bad23b18ac1caefcb46b3fa141697

Test Plan: Imported from OSS

Differential Revision: D15948549

Pulled By: wanchaol

fbshipit-source-id: 758c9c3dfad40c4158aee21ddebcd25b711111d7
2019-06-23 20:49:26 -07:00
9b45237618 PyTorch ThroughputBenchmark (#20766)
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.

Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766

Test Plan: Added a new unit test

Differential Revision: D15435461

Pulled By: salexspb

fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6
2019-06-23 13:03:18 -07:00
c0f96aaf01 Restore default values on premature test exit (#22115)
Summary:
Previously any assert failures would leave the updated setting, making
the test suite semantics dependent on the order in which the tests are run.

The diff is large only due to the indentation change (might be good to review without whitespace changes).

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22115

Differential Revision: D15960875

Pulled By: soumith

fbshipit-source-id: 9313695277fc2d968786f13371719e03fff18519
2019-06-23 12:55:00 -07:00
887ecf797c Fix DictType isSubtypeOf (#22104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22104
ghimport-source-id: 9db14020f424cf2e021d63e9c0fe4017ac7cd6c8

Test Plan: Imported from OSS

Differential Revision: D15956726

Pulled By: jamesr66a

fbshipit-source-id: 85448deab70c5e5b7ab1132652836ed575581868
2019-06-22 16:36:34 -07:00
45b91bd326 refactor all for in range/tensor tests to be together with other for loop tests (#21950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21950
ghimport-source-id: b2491313bc2e0fcc10f77167c261cbae4d884ebb

Test Plan: Imported from OSS

Differential Revision: D15948546

Pulled By: wanchaol

fbshipit-source-id: 34dde28902ae5b8affbf6e4deaaffdb1d8ddd6ec
2019-06-22 01:38:14 -07:00
e0f5ab2c2e Tree based Iterator infrastructure: for in range/list/tensor/zip/enumerate (#21801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21801
ghimport-source-id: b019d3e9a6f9bf152991a01b40e424dff176ffaa

Test Plan: Imported from OSS

Differential Revision: D15948545

Pulled By: wanchaol

fbshipit-source-id: 6110a0f3ab08cbbb398441e8330f56083ecd2d99
2019-06-22 01:00:42 -07:00
a256b09ce9 Backout Liveness Tests again :-(
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22100

Differential Revision: D15956214

Pulled By: Krovatkin

fbshipit-source-id: 9b0c8ecf5b479bf878ffc31acc416bd8dbfe4b50
2019-06-22 00:18:21 -07:00
7b1d6c8912 Update intra_inter_benchmark (#22051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051
ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15933951

Pulled By: ilia-cher

fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba
2019-06-21 23:06:27 -07:00
91bf0a9f9d Move quantized tensor tests in test_torch.py to test_quantized_tensor.py (#22089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22089

att

Reviewed By: jianyuh

Differential Revision: D15950101

fbshipit-source-id: 70acdeeef3a05201d72f986d5a0005832efd75ff
2019-06-21 22:48:34 -07:00
b19b20efef fix minor comment (#21576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21576

Fix comment regarding original_tensor

Reviewed By: jianyuh

Differential Revision: D15733294

fbshipit-source-id: e2957f32dcf90859b77e61c931b64abdd066aabb
2019-06-21 22:23:53 -07:00
f7b2778cb1 s/uniqueName/debugName/ (#22096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22096
ghimport-source-id: 8f1d994b98432942b5beeb10bf6d30e447d51997

Test Plan: Imported from OSS

Differential Revision: D15956004

Pulled By: jamesr66a

fbshipit-source-id: 319d2d20ef0863249a8a2bdd228b4f792d37bfab
2019-06-21 20:54:53 -07:00
7d637de771 Reduce excessive CI printing in TestHub (#22043)
Summary:
https://github.com/pytorch/pytorch/pull/21132 reverted https://github.com/pytorch/pytorch/pull/19606.

Now these tests again print like 40% lines of CI outputs (e.g., https://circleci.com/gh/pytorch/pytorch/2041825?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link)

This PR now uses the functionality introduced in https://github.com/pytorch/vision/issues/862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22043

Differential Revision: D15947268

Pulled By: ailzhang

fbshipit-source-id: f84f4d6b86203dbe8687e04ae3ed8c99df0bdff8
2019-06-21 20:08:44 -07:00
63ca908026 Updating submodules
Reviewed By: yns88

fbshipit-source-id: d3374d2ee514cc0526559ffbac6dc11918ea71cf
2019-06-21 18:51:07 -07:00
856268c716 Revert D15947873: [JIT] s/uniqueName/debugName
Differential Revision:
D15947873

Original commit changeset: 31a2b30d0ce9

fbshipit-source-id: ef1c0f120c1835184d8106d176cea58ec6ad40b7
2019-06-21 18:51:03 -07:00
36e4b54420 s/uniqueName/debugName (#22048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22048
ghimport-source-id: a82d80ceec1d8055ce4cf62df10ade4a224109f8

Test Plan: Imported from OSS

Differential Revision: D15947873

Pulled By: jamesr66a

fbshipit-source-id: 31a2b30d0ce911edf5791ca10040a1e968750b06
2019-06-21 17:59:38 -07:00
4bc89bd5a6 Implement tensor.select(Dimname,int) (#21795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21795
ghimport-source-id: d13af6078a47de1d6045cfbb7d278c378fe734fe

Test Plan: Imported from OSS

Differential Revision: D15833457

Pulled By: zou3519

fbshipit-source-id: fa52aff25ce0e12f31da3eef83ea948b4f7a5d9f
2019-06-21 16:16:45 -07:00
18a904c12e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 494a8fe00cbdb782bbdb05eefb17e9166d117599
2019-06-21 15:14:46 -07:00
f164c01f9c Adding liveness test cases back
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21762

Differential Revision: D15943509

Pulled By: Krovatkin

fbshipit-source-id: 4b65bf63ab15a2347da5f7269cc0f2dbb226b330
2019-06-21 15:09:09 -07:00
38aa5a519e Experimental option to use single thread pool (#22047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22047
ghimport-source-id: 8731538a091997fd31d6aff59152dc9241de2ba4

Test Plan:
EXPERIMENTAL_SINGLE_THREAD_POOL=1 PARALLEL_BACKEND=NATIVE_TBB
USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0
BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake
./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel
./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15931188

Pulled By: ilia-cher

fbshipit-source-id: 1ca1b190b6e16ce5398f2dad72deaf3cb083a43b
2019-06-21 14:54:16 -07:00
5ff06a7b0b more complete tuple assignments (#21949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21949
ghimport-source-id: 458793d74af3728bf0338867b081157905a7635a

Test Plan: Imported from OSS

Differential Revision: D15948550

Pulled By: wanchaol

fbshipit-source-id: 9ed69e0859e052816f06fc9c288b905551b2e48c
2019-06-21 14:49:38 -07:00
4009089d1f Sparse BLAS: Remove workaround to check zero length inputs. (#22080)
Summary:
Fix was released with ROCm 2.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22080

Differential Revision: D15947427

Pulled By: bddppq

fbshipit-source-id: b6b66f4cfc334ddc6140d1d519792d4783ba0efa
2019-06-21 14:45:06 -07:00
04e9278306 First round of optimizations for segment_reduction_op kernels. (#22081)
Summary:
Apply launch bounds annotations for ROCm as the maximum threads per
block (1024) is higher than the ROCm internal default (256).

Reduce the minBlocksPerMultiprocessor for ROCm to 8 from 16 as this
improves performance in some microbenchmarks by (statistically
significant) 4%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22081

Differential Revision: D15947426

Pulled By: bddppq

fbshipit-source-id: b4b7015417f99e14dfdedb62639e4d837c38e4fd
2019-06-21 14:33:12 -07:00
1c5fe2e8c4 Add support for Python 3.8 Constant node (#22007)
Summary:
We can't really test these until we get Python 3.8 in the CI, but these all work locally and won't be invoked at all for Python 3.7 and lower so this should be pretty safe.

Fixes #21710
](https://our.intern.facebook.com/intern/diff/15914735/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22007

Pulled By: driazati

Differential Revision: D15914735

fbshipit-source-id: 83833cebe7e38b162719a4f53cbe52c3fc638edd
2019-06-21 14:22:06 -07:00
f9b3989206 handle slice with negative indices and indices exceeding tensor dimen… (#21811)
Summary:
handle slice with negative indices and indices exceeding tensor dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21811

Reviewed By: zrphercule

Differential Revision: D15944243

Pulled By: houseroad

fbshipit-source-id: f7d987e9d8d704ade9d489599df14afbf1333428
2019-06-21 13:37:54 -07:00
38c9bb8261 Remove most usages of THCHalfAutoNumerics. (#21878)
Summary:
This was originally introduced between at::Half, which overloaded a number of operators; since this isn't necessary anymore, get rid of it.

Note in many cases, these files still need THCNumerics.cuh (which was included by THCHalfAutoNumerics); I was not careful about isolating these usages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21878

Differential Revision: D15941236

Pulled By: gchanan

fbshipit-source-id: 65f30a20089fcd618e8f3e9646cf03147a15ccba
2019-06-21 12:40:38 -07:00
06c3bd0302 Improve ListPtr::extract() (#21753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21753

- it accidentally didn't move non-IValue-based lists before. This is fixed now.
- it only needs to recreate a T() for IValue-based lists

Reviewed By: resistor

Differential Revision: D15809220

fbshipit-source-id: 944badf1920ee05f0969fff0d03284a641dae4a9
2019-06-21 12:26:01 -07:00
fe580e850e Rewrite lerp operator to use TensorIterator and support compile-time vectorization. (#22038)
Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
2019-06-21 11:39:27 -07:00
28630529ac Limit overall number of threads used by TBB (#22045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22045
ghimport-source-id: ea49ae04d86677f7a73a07968ce454eb1128fb84

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel

Imported from OSS

Differential Revision: D15930319

Pulled By: ilia-cher

fbshipit-source-id: 4c33ae395965e5708f8d7ceb67495b303fc4d22c
2019-06-21 11:39:18 -07:00
82dd69326b Split nn.Module._save_to_state_dict to make it overridable (#21933)
Summary:
# Motivation

We allow to override JIT module serialization with `__getstate__/__setstate__` in order to cover cases where parameters are not serializable. Use cases include: MKLDNN integration: a388c78350/torch/utils/mkldnn.py (L18-L26)
and also fbgemm prepacked format integration for quantized tensors.

However many Eager scripts use `torch.save(module.state_dict())` form of serialization. There are several ways to make it work:

* make packed_weight itself pickleable (e.g. by binding `__getstate__/__setstate__` on C++ UDT level)
    * change: we’d need to allow module buffers to be of arbitrary, non-Tensor types
    * pro: no change to state_dict behavior
    * cons: might not be directly inspectable by user calling .state_dict(), especially if packed weights represent several tensors fused together
* make packed_weight being proper Tensor layout
    * pro: no change to state_dict or buffers behavior
    * cons: adding new tensor layouts is pretty costly today
    * cons: doesn’t work if multiple tensors are packed in one interleaved representation
* *[this approach]* allow Modules to override state_dict and return regular tensors
    * pro: most flexible and hackable
    * pro: maintains semantic meaning of statedict as all data necessary to represent module’s state
    * cons: complicates state_dict logic
    * cons: potential code duplication between `__getstate__/__setstate__`

Based on discussions with zdevito and gchanan we decided to pick latter approach. Rationale: this behavior is fully opt-in and will impact only modules that need it. For those modules the requirement listed above won't be true. But we do preserve requirement that all elements of state_dict are tensors. (https://fburl.com/qgybrug4 for internal discussion)

In the future we might also implement one of the approaches above but those are more involved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21933

Differential Revision: D15937678

Pulled By: dzhulgakov

fbshipit-source-id: 3cb5d1a8304d04def7aabc0969d0a2e7be182367
2019-06-21 09:55:22 -07:00
b2197ef2b0 Adding support for JIT Fusion on Windows for CUDA (#21861)
Summary:
This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861

Differential Revision: D15940939

Pulled By: soumith

fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1
2019-06-21 09:44:17 -07:00
edb5a1662e Remove getDeviceFromPtr and allocator from Type (#21940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21940
ghimport-source-id: 0a618878ae030f663b05662f83ac4b549a90ba29

Test Plan: Imported from OSS

Differential Revision: D15893330

Pulled By: li-roy

fbshipit-source-id: a3dfb6b4ed0c72f7f3efd00192fb63aabc9c5967
2019-06-21 01:05:33 -07:00
b36a041d6f Move UnsafeTensorFromTH and UnsafeStorageFromTH off Type (#21923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21923
ghimport-source-id: f015c8521ef9071eaa982cbf73c13aa925035956

Test Plan: Imported from OSS

Differential Revision: D15883390

Pulled By: li-roy

fbshipit-source-id: 6a7a7ffbe6000199d41cdca5efb97371f46dd8fe
2019-06-21 01:05:29 -07:00
5d7cf66862 add Int8SpatialBNRelu (#22014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22014

Add Int8SpatialBN + Relu fused operator.

Reviewed By: dskhudia

Differential Revision: D15916551

fbshipit-source-id: a938e0f0e105ab5f823a3cb6144f50aa2ab944c1
2019-06-20 23:23:04 -07:00
7d81e62562 Add mkldnn tests for running end to end resnet models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22041

Differential Revision: D15928786

Pulled By: bddppq

fbshipit-source-id: 4b12e5bda2da13aba2d63d357a0a854d59317362
2019-06-20 22:42:49 -07:00
71741ba115 rename test to be more consistent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057

Differential Revision: D15936870

Pulled By: soumith

fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69
2019-06-20 22:02:36 -07:00
a3fc6ed046 Hook up liveness into profiling pipeline.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21881

Differential Revision: D15931627

Pulled By: Krovatkin

fbshipit-source-id: dc825a563c7aceb5f66a2ed2a600d550b70941b2
2019-06-20 21:23:16 -07:00
3838324539 Add max/min/argmax/argmin/sort/argsort for quantized Tensor (#21546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21546

Added following methods for QTensor:
- max, min
- argmax, argmin
- sort, argsort

Reviewed By: dzhulgakov

Differential Revision: D15718117

fbshipit-source-id: 746b978d5722cb75e216fc65585bf206d45a7969
2019-06-20 21:00:03 -07:00
95aee81dd7 more general fusion logic (#22015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22015

Previous fusion logic only works for operators back-to-back in the linear order of protobuf file.
This diff generalizes to work for any predecessor-successor operators in the graph without any "interfering" use/def of the related blobs.

Reviewed By: csummersea

Differential Revision: D15916709

fbshipit-source-id: 82fe4911a8250845a8bea3427d1b77ce2442c495
2019-06-20 20:44:26 -07:00
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
d4119f8fcb Automatic update of fbcode/onnx to 355a4954ea4e5836a5e943589509951c44feb6b4 (#22030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22030

Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](https://github.com/onnx/onnx/commit/355a4954)**: Update codeowners to have community folder changes assigned to steering committee (#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](https://github.com/onnx/onnx/commit/ceaa5da7)**: Fix Resize/Upsample Shape inference function (#2085) <Raymond Yang>
- **[4de8dc0d](https://github.com/onnx/onnx/commit/4de8dc0d)**: Clarify shape inference requirements for new operators (#2088) <Hariharan Seshadri>
- **[52aa1fad](https://github.com/onnx/onnx/commit/52aa1fad)**: Fix NN defs file (#2083) <Hariharan Seshadri>

Reviewed By: bddppq

Differential Revision: D15924221

fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73
2019-06-20 15:52:45 -07:00
84a2d5d7aa Add hashing to bucket-weighted pooling (#20673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20673

Add option to bucket-weighted pooling to hash the bucket so that any cardinality score can be used.

Reviewed By: huginhuangfb

Differential Revision: D15003509

fbshipit-source-id: 575a149de395f18fd7759f3edb485619f8aa5363
2019-06-20 15:12:36 -07:00
1aae4b02df Fix 'error : detail is ambiguous' on Windows (#22025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22025
ghimport-source-id: 0fb408ad185a989507f7509a2a3574e1a7e60ab2

Test Plan: Imported from OSS

Differential Revision: D15926651

Pulled By: ezyang

fbshipit-source-id: 298340bfbfe44dcd81cde8f0d56f8dbde92fb7bd
2019-06-20 13:23:21 -07:00
19ef15709f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0be0694d6adf1ae9baa408a4b372101a26a14ba4
2019-06-20 12:59:31 -07:00
4cd7d78718 correct arange docs (#21992)
Summary:
https://github.com/pytorch/pytorch/issues/21579 correctly points out an inaccuracy in the docs for `arange`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21992

Differential Revision: D15914411

Pulled By: umanwizard

fbshipit-source-id: 3eb1734b29af3f3858f0f4d54c71e28dbda5c75b
2019-06-20 12:36:00 -07:00
08facca1a1 Support accumulating DDP grads using a context manager (#21736)
Summary:
The first attempt and more discussions are available in https://github.com/pytorch/pytorch/issues/19577

#### Goal

Allow toggling DDP gradient synchronization across iterations. With this feature, users may accumulate grads in module variables, and only kick off expensive grad synchronize every a few iterations.

#### Concerns

Our first attempt in https://github.com/pytorch/pytorch/issues/19577 tries to do it using a variable or a function. But apaszke made a good point that it will not be error prone, and favors a context manager instead.

#### Proposed Solution

Instead of providing a `accumulate_grads` variable/function/context, we provide a `DistributedDataParallel.no_sync()` context manager. And it does exactly what the name suggests, i.e., disable DDP grad synchronization within the context. Note that `accumulate_grads` means `no_sync` + no optimizer step, where the latter is not controlled by DDP.

It is true that users need to call another `model(input).backward()` after exiting the context, and this is indeed more verbose. But I think it is OK as one major concern in the previous discussion is to prevent users from running into errors without knowing it. This API should reaffirm the expected behavior, and does not mess up with other use cases if accumulating grads is not required..

The application would then look like:

```python
with ddp.no_sync():
  for input in inputs:
    ddp(input).backward()

ddp(one_more_input).backward()
optimizer.step()
```

chenyangyu1988 myleott
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21736

Differential Revision: D15805215

Pulled By: mrshenli

fbshipit-source-id: 73405797d1e39965c52016af5cf45b15525ce21c
2019-06-20 12:23:52 -07:00
40b9f8f0a0 Added more descriptive error message for index out of range (#21758)
Summary:
https://github.com/pytorch/pytorch/issues/21535
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21758

Differential Revision: D15922915

Pulled By: Chillee

fbshipit-source-id: dcb301a661c359f27869200ee241ec272ef50d3a
2019-06-20 12:12:03 -07:00
6bd58b7548 Move list / dict tests to TestList and TestDict (#22000)
Summary:
There aren't any substantive changes aside from some test renames (e.g. `TestScript.test_dict_membership` -> `TestDict.test_membership`) and the addition of `TestDict.dict()`.

Adding the rest of the dict ops was making the tests a mess and `TestScript` is already > 10000 lines by itself, so breaking them up should make things cleaner
](https://our.intern.facebook.com/intern/diff/15911383/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22000

Pulled By: driazati

Differential Revision: D15911383

fbshipit-source-id: 614428e03fbc14252f0e9cde74ab9a707169a860
2019-06-20 11:17:35 -07:00
0702b5f345 Partially parallelize randperm on CPU. (#21529)
Summary:
This commit parallelizes the variable initialization (from 1 to n) step on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21529

Differential Revision: D15855402

Pulled By: VitalyFedyunin

fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d
2019-06-20 10:44:01 -07:00
e388f70499 Move cppdocs build to CircleCI (#19768)
Summary:
The cppdocs build job (originally run on Chronos as a cron job) was frequently broken because it was not run on every PR. This PR moves it to CircleCI and enables it on every PR, so that we can get the build failure signal much earlier.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19768

Differential Revision: D15922289

Pulled By: yf225

fbshipit-source-id: e36ef59a2e42f78b7d759ee02f2d94dc90f88fff
2019-06-20 10:24:21 -07:00
76fe91bb2f Revert D14889547: Add sparse tensor allreduce
Differential Revision:
D14889547

Original commit changeset: 34f3de4d6a2e

fbshipit-source-id: 24d2239da0b865280af88dce3d8fb25883fc0174
2019-06-20 10:07:27 -07:00
cb4c213f55 Revert D15007365: Support sparse gradients in DistributedDataParallel
Differential Revision:
D15007365

Original commit changeset: f298e83fd3ca

fbshipit-source-id: ef5e556d2df37f0c64652bd3563956afd8d9fd7f
2019-06-20 10:07:22 -07:00
f8f583cbae Port convtranspose2d (#20994)
Summary:
this PR will resolve partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20994

Differential Revision: D15876052

Pulled By: ezyang

fbshipit-source-id: 5896e0cbb656d0530e39fd681808adc685841b37
2019-06-20 07:11:38 -07:00
365de7bda1 Support sparse gradients in DistributedDataParallel (#19443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19443

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15007365

fbshipit-source-id: f298e83fd3ca828fae9e80739e1db89d045c99ac
2019-06-20 07:06:28 -07:00
aee6a412e9 Add sparse tensor allreduce (#19146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19146

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

Reviewed By: mrshenli

Differential Revision: D14889547

fbshipit-source-id: 34f3de4d6a2e09c9eba368df47daad0dc11b333e
2019-06-20 07:06:24 -07:00
97ea44b34a Fix issue in quantization error measurement when followed by Relu (#21890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21890

As title

Reviewed By: jspark1105

Differential Revision: D15739808

fbshipit-source-id: 8fbcca04f0711fd9f994d67e1f4a604ef9fa42c6
2019-06-19 22:29:54 -07:00
b6f542f8a1 Add aten mkldnn transpose (#21943)
Summary:
This PR is about:

1.  Make mkldnn reshape can share same memory fro plain format tensor.

2.  Add mkldnn transpose operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21943

Differential Revision: D15916063

Pulled By: bddppq

fbshipit-source-id: d1971c67341f277c1e80c1fa34e213b6c27f4062
2019-06-19 22:20:46 -07:00
3d44cd6d19 Replace Type dispatch with ATenDispatch (#22008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22008
ghimport-source-id: b0a5cc3da283b195f88636e2a61939d2facd11d9

Test Plan: Imported from OSS

Differential Revision: D15914756

Pulled By: li-roy

fbshipit-source-id: 5bc300ec525a3ee9e6491dd4c55e78bbd977d691
2019-06-19 21:42:54 -07:00
5d67c606ea Added error for classes that don't have an init function (#21880)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21761
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21880

Differential Revision: D15879205

Pulled By: Chillee

fbshipit-source-id: 8b614970196b381357b6032a73eeaab0b7a4f667
2019-06-19 21:33:37 -07:00
4fee532de6 Pass loop_over optional parameter for cached reader properly. (#21929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21929

Just need to pass `loop_over` argument properly.

Reviewed By: noname01

Differential Revision: D15885401

fbshipit-source-id: f1928277262a80e5b41f4c4f3945c2f378a4e233
2019-06-19 18:15:32 -07:00
96c0bd3722 ListPtr->List DictPtr->Dict step 3 (#21938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21938

After having changed all call sites, we can now remove the old naming scheme.

Reviewed By: zdevito

Differential Revision: D15892402

fbshipit-source-id: 1f5b53a12fa657f6307811e8657c2e14f6285d2f
2019-06-19 18:02:08 -07:00
275087383b ListPtr->List DictPtr->Dict step 2 (#21937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937

This changes call sites to use the new naming scheme

Reviewed By: zdevito

Differential Revision: D15892404

fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
2019-06-19 18:02:05 -07:00
093c78f854 ListPtr->List DictPtr->Dict step 1 (#21936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21936

This introduces torch::List and torch::Dict as aliases to ListPtr/DictPtr.
After this lands, we can step by step change the call sites to the new naming
and finally remove the old spellings.

Reviewed By: zdevito

Differential Revision: D15892405

fbshipit-source-id: 67b38a6253c42364ff349a0d4049f90f03ca0d44
2019-06-19 18:02:01 -07:00
cba79f4872 Revert D15637222: [wip] Replace Type dispatch with ATenDispatch
Differential Revision:
D15637222

Original commit changeset: fcfaea0b5480

fbshipit-source-id: 9bca7ebb91d7a3609b86663089140d7c5a33f58d
2019-06-19 17:36:52 -07:00
15be5483c0 Move NamedType method definitions into cpp file (#21983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21983
ghimport-source-id: fe9c1eba5f4c737e1442b877d396b9e8e5298cfb

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15907633

Pulled By: jamesr66a

fbshipit-source-id: bd2dfdca117cdc3ae35fdd9d29cf521d82636069
2019-06-19 16:43:11 -07:00
f6aac41391 Defining object destructor in c10 (#21984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21984
ghimport-source-id: 5767592e37ed388422eed5639f8ba0722aec66e2

Test Plan: Imported from OSS

Differential Revision: D15906530

Pulled By: zdevito

fbshipit-source-id: bec8c8b0b5b9dcc2e8fc69b5031fcfa6bb22d54e
2019-06-19 16:27:40 -07:00
24a6c32407 Replace Type dispatch with ATenDispatch (#21320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21320
ghimport-source-id: cc18f746a1c74df858cb0f6d8b7d4de4315683c7

Test Plan: Imported from OSS

Differential Revision: D15637222

Pulled By: li-roy

fbshipit-source-id: fcfaea0b5480ab966175341cce92e3aa0be7e3cb
2019-06-19 15:46:45 -07:00
00fdb2cf95 Enable XLA by default on pull requests. (#21991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21991
ghimport-source-id: 5077e62a613c36256d2b5a2427aa9c3887c4a797

Test Plan: Imported from OSS

Differential Revision: D15907913

Pulled By: ezyang

fbshipit-source-id: c67bb999f02760836d1568c1a3911add3f1538f0
2019-06-19 15:01:49 -07:00
effcc398c4 Refactor Random Number Generators in ATen (#21555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21555
ghimport-source-id: dd900a8c3e1ef9ef1e011b8bb5476626d18cc462

Test Plan: Imported from OSS

Differential Revision: D15875780

Pulled By: ezyang

fbshipit-source-id: 6e04e90af62ab9c9593d74f344a3a084aaaf6f43
2019-06-19 13:54:09 -07:00
34aee933f9 ONNX Export Interpolate (Resize) for opset version 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21434

Reviewed By: zrphercule

Differential Revision: D15777197

Pulled By: houseroad

fbshipit-source-id: 517b06a54a234ffdb762401e83f5a732023ed259
2019-06-19 13:40:27 -07:00
44128e09f0 Speed up op lookup and registration (#21806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806

Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it.

This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle)
and it also speeds up op registration since that needs to look if an op with the same name already eists.

Differential Revision: D15834256

fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1
2019-06-19 12:05:14 -07:00
d1c80300ce Better stringification of dispatch keys in error messages (#21809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21809

Many error messages show dispatch keys, for example when the dispatcher didn't find a kernel to dispatch to.
Previously, this was a string like "CPU" or "CUDA" for known backends and just an arbitrary number for other backends.

Now, tensor type id registration also registers a name for the dispatch key and shows that in the error messages.

There is no API change, just the error messages are better now.

Differential Revision: D15835809

fbshipit-source-id: 4f0c9d0925c6708b02d79c653a2fae75b6623bb9
2019-06-19 11:44:24 -07:00
dd046bef8d NamedTuple serialization (#21839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21839
ghimport-source-id: b9d82018fbf26b22d58cad3a033cbfe4e879a8fe

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860002

Pulled By: jamesr66a

fbshipit-source-id: 0fc97c4adefa9ae4937f21179c7afa817f4099e5
2019-06-19 10:43:55 -07:00
5a37f8c63f Refactor TupleType to take a NamedTupleSpec (#21836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21836
ghimport-source-id: 91cab735765ff875046b42864188e86b8487b0ae

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860003

Pulled By: jamesr66a

fbshipit-source-id: 62a99a212ae6f9af83a90305e443f2dd05588292
2019-06-19 10:43:51 -07:00
c0be6e6290 Introduce SerializableType (#21835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21835
ghimport-source-id: e674048a56b9a573ba89e484f4b41818d3f08234

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860004

Pulled By: jamesr66a

fbshipit-source-id: 2d2905296939903ed4586932bea0a504b542bbdb
2019-06-19 10:43:47 -07:00
74104f383e Some small fixes for NamedTuple (#21813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21813
ghimport-source-id: a1edca8ad0384a9e493ef2f3b0aa5005a668a8f3

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860005

Pulled By: jamesr66a

fbshipit-source-id: 4a43432d2dacebde1a676a93ac57f675db857154
2019-06-19 10:43:43 -07:00
6b972795e4 Add torch.__future__._overwrite_module_params_on_conversion global flag, and check it in nn.Module._apply() (#21613)
Summary:
https://github.com/pytorch/pytorch/pull/17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type:
```python
# 6dc445e1a8/torch/nn/modules/module.py (L192-L208)
def _apply(self, fn):
    ...
    for param in self._parameters.values():
        if param is not None:
            # Tensors stored in modules are graph leaves, and we don't
            # want to create copy nodes, so we have to unpack the data.
            param.data = fn(param.data)  # NOTE: this doesn't allow changing `param.data`'s TensorImpl type
            if param._grad is not None:
                param._grad.data = fn(param._grad.data)  # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type
   ...
```

yf225 TODO: fix the description here when we finish the implementation

To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in.

We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model.

This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher.

[xla ci]

cc. resistor ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21613

Differential Revision: D15895387

Pulled By: yf225

fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6
2019-06-19 10:30:02 -07:00
Jie
056a033cdc updating upsampling bilinear2d kernel: (#21879)
Summary:
1. faster atomicAdd trick for fp16 backward kernel
2. better launch configs for backward kernel
3. removed unnecessary buffer initialization for forward kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21879

Differential Revision: D15898680

Pulled By: ezyang

fbshipit-source-id: 1fc81e6c078f1538d82e4f36921b630499eb504f
2019-06-19 07:42:21 -07:00
34536e207a Fix: convert Onnx DynamicSlice operator with 4 inputs to caffe2 fa… (#20846)
Summary:
I reported an issue [https://github.com/pytorch/pytorch/issues/20743](url)
and make this pull request for it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20846

Reviewed By: zrphercule

Differential Revision: D15569135

Pulled By: houseroad

fbshipit-source-id: 96a2c818ef666a7d79b96decfa347d7154b34d5c
2019-06-19 00:09:15 -07:00
4b1df5c1f5 Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865)
Summary:
When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right.

Note that this PR is BC-breaking in the following way:

Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865

Differential Revision: D15881952

Pulled By: yf225

fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd
2019-06-18 20:45:40 -07:00
abd6cffe55 Added some extra tests for std_mean and var_mean for multiple dims. (#20650)
Summary:
Added some extra tests for std_mean and var_mean for multiple dims.
Some refactoring of previously created tests based on PR comments: https://github.com/pytorch/pytorch/pull/18731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20650

Differential Revision: D15396101

Pulled By: ifedan

fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9
2019-06-18 20:36:32 -07:00
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
e239e31da6 Fix lint error (#21932)
Summary:
https://github.com/pytorch/pytorch/issues/21916 has broken python lint on master
https://travis-ci.org/pytorch/pytorch/jobs/547354937
```
./tools/build_variables.py:167:39: E261 at least two spaces before inline comment
./tools/build_variables.py:379:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:379:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:17: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:19: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:20: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:23: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:22: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:24: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:15: E251 unexpected spaces around keyword / parameter equals
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21932

Differential Revision: D15892041

Pulled By: bddppq

fbshipit-source-id: f62949a7617f8ea9c036ea9b48ab1e340a7af83e
2019-06-18 19:08:24 -07:00
3bdde56907 Fix incorrect usage of __HIP_PLATFORM_HCC__ (#21757)
Summary:
This avoid using `__HIP_PLATFORM_HCC__` in case it changes in the future.

Following up https://github.com/pytorch/pytorch/issues/21718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21757

Reviewed By: xw285cornell

Differential Revision: D15891867

Pulled By: bddppq

fbshipit-source-id: 5de55687ab1c86eddf6b4d8d25fee48d96ec72ad
2019-06-18 18:56:32 -07:00
a388c78350 fix bug in CompilationUnit::define (#21886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21886
ghimport-source-id: fefbd758bbe2fbcaaad84a376ac5f69c40bccb80

Test Plan: Imported from OSS

Differential Revision: D15867647

Pulled By: suo

fbshipit-source-id: 3e0f5bbc98ec93ccf26442c4c574626e45e53888
2019-06-18 15:41:55 -07:00
52e1cea057 Fix recusive method compilation (#21862)
Summary:
The code in `python_sugared_value.cpp` to recursively compile methods
was not being tested, so this adds a test for it and fixes some errors
in it

It was necessary to disable any hooks set since (at least in our tests) they would try to export
a half-finished graph since they were being called on recursively
compiled methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21862

Differential Revision: D15860314

Pulled By: driazati

fbshipit-source-id: e8afe9d4c75c345b6e1471072d67c5e335b61337
2019-06-18 14:08:56 -07:00
eda08b0aae script::Module as a view. (#21814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21814
ghimport-source-id: 49cfea6101ad9ca438600c465762e23252e05ff3

Test Plan: Imported from OSS

Differential Revision: D15839583

Pulled By: zdevito

fbshipit-source-id: ab4ef31a523b3ac1477aa7e6d4d9513e7408c560
2019-06-18 13:58:49 -07:00
94c61d4f32 Fix infinite loop in del_post_hook (#21914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21914

https://github.com/pytorch/pytorch/pull/21591 added a needed feature to clean up grad accumulator post hooks when the DistributedDataParallel model object is cleaned up. There's a minor typo that causes it to loop infinitely over the first element.

Differential Revision: D15878884

fbshipit-source-id: b7fd0bbd51eb187579d639b1709c6f7b62b85e7a
2019-06-18 13:43:59 -07:00
c0f51142cd Added a test case for an index error for the index_copy_ (#21912)
Summary:
Follow up PR with a test for the [fixed](4b45f08f87) [bug](https://github.com/pytorch/pytorch/issues/20322).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21912

Differential Revision: D15878674

Pulled By: izdeby

fbshipit-source-id: c8fef2214606c796d174d0faaaf633531a7bea88
2019-06-18 13:43:56 -07:00
ad00c12379 Clean up //caffe2/torch-cpp to avoid duplicated symbols (#21916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21916

Hopefully fixes https://fb.workplace.com/groups/1405155842844877/permalink/2832659000094547/

Reviewed By: rutyrinott

Differential Revision: D15862128

fbshipit-source-id: 77c01a57bddc39b267e307b50942e029a381711b
2019-06-18 13:05:22 -07:00
081cd3a293 Change AT_CHECK to TORCH_CHECK in python_arg_parser.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21887

Differential Revision: D15869483

Pulled By: jerryzh168

fbshipit-source-id: f3d9d73078e7c1c08ec79694105e18084e7f9caf
2019-06-18 10:48:38 -07:00
28ecc104f4 Fix WeakIValueEq (#21891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21891
ghimport-source-id: a037850c96fe803540412db9a88548fa41f2d4f0

Test Plan: Imported from OSS

Differential Revision: D15871588

Pulled By: jamesr66a

fbshipit-source-id: ecfdece1285c0737d0b1dc2afe959c43d9413001
2019-06-18 10:35:35 -07:00
010f238d17 Retry pip install to make pytorch rocm CI more stable (#21895)
Summary:
pip install randomly core dumps, examples:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25720//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25723//console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21895

Differential Revision: D15873197

Pulled By: bddppq

fbshipit-source-id: 2c967bc0a47bef9d3f7af83e99514c93b54e353f
2019-06-18 10:10:56 -07:00
5eb25c3704 Support in membership checks (#21527)
Summary:
This PR adds support for `in` checks like `key in my_dict`

For now it leaves lists as a follow up due to the changes around `IValue` lists and it needing an `IValue` equality op.

For objects it uses the magic method `__contains__(self, key)`
](https://our.intern.facebook.com/intern/diff/15811203/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21527

Pulled By: driazati

Differential Revision: D15811203

fbshipit-source-id: 95745060394f8a9450efaaf8ab09d9af83bea01e
2019-06-18 09:49:12 -07:00
afad3e4954 Add support for class annotations (#21379)
Summary:
This adds support for inferred attributes (everything except empty lists, dicts, and tuples) as well as using the PEP 526 style annotations on a class, so this eliminates the need for `torch.jit.Attribute`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21379

Differential Revision: D15718537

Pulled By: driazati

fbshipit-source-id: b7481ae3d7ee421613e931b7dc3427ef2a99757f
2019-06-18 09:49:09 -07:00
85528feb40 Mark test_snli as a slow test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21908

Pulled By: driazati

Differential Revision: D15875846

fbshipit-source-id: 98b79e7beee5ffd72e1f41d22e07e618547b23e9
2019-06-18 09:44:12 -07:00
0998a32588 Backward function will set a flag if it released variables (#21533)
Summary:
This is a fix for https://github.com/pytorch/pytorch/issues/21469
Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables.
Functions that do not have saved variables can be called multiple times for BC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21533

Differential Revision: D15810481

Pulled By: ifedan

fbshipit-source-id: 5663e0c14f1b65727abc0d078aef348078d6a543
2019-06-18 09:21:17 -07:00
f363a33e10 Set __file__ for torch.ops (#21888)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19351 https://github.com/pytorch/lockdown/issues/93
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21888

Differential Revision: D15871142

Pulled By: ailzhang

fbshipit-source-id: 339e9d493e2e13f09e118814bdd1d7a5942804b8
2019-06-18 08:46:23 -07:00
31e1e63bc2 Port avg_pool3d() to ATen (#21732)
Summary:
This will need a conflict resolution once avg_pool2d() has been merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21732

Differential Revision: D15824923

Pulled By: ezyang

fbshipit-source-id: 83341e0209b660aecf788272079d8135d78b6ff1
2019-06-18 08:33:30 -07:00
Jie
c471a63a39 UpSample-nearest cuda kernel update (#21694)
Summary:
updating upsampling kernel:
1. avoids atomicAdd for better fp16 performance.
2. better launch configures for 2D input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21694

Differential Revision: D15875791

Pulled By: ezyang

fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4
2019-06-18 08:24:25 -07:00
998efb48c3 Add at::dimname_to_position helper. (#21789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21789
ghimport-source-id: 42c0a58280f3645dd38ea11d39311a0c53f90488

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

Imported from OSS

Differential Revision: D15833455

Pulled By: zou3519

fbshipit-source-id: 8dd51a7b785972668984a7c161b94b92039a1cb1
2019-06-18 07:44:04 -07:00
8f9e0f77dd Turn off non-default stream testing. (#21793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793
ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4

Test Plan: Imported from OSS

Differential Revision: D15874814

Pulled By: ezyang

fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25
2019-06-18 07:00:08 -07:00
08a0ac84d7 Removed unused variable from closure in range (#21897)
Summary:
This was some code I added :^)

Time for me to remove it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21897

Differential Revision: D15873213

Pulled By: Chillee

fbshipit-source-id: 769c3bd71c542be4afddc02dc2f65aa5c751b10d
2019-06-18 02:21:50 -07:00
6042012a93 Fixed "tried to access to nonexistent attribute" -> "tried to access nonexistent attribute"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21863

Differential Revision: D15873204

Pulled By: Chillee

fbshipit-source-id: c5d85487b287ee9dd8318161ef9399ffd1ee0b68
2019-06-18 02:13:09 -07:00
df787cf079 Fixed a warning in test_jit.py (#21898)
Summary:
What's the point of having warnings if we never fix them :^)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21898

Differential Revision: D15873280

Pulled By: Chillee

fbshipit-source-id: a8274bab2badd840d36a9d2e1354677a6114ae1d
2019-06-18 01:15:15 -07:00
f1c1d1a964 Export the cosine_similarity op as an ATenOp correctly (#21884)
Summary:
cosine_similarity has two non-tensor parameters, needs some special handling. Add the support for its export in this diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21884

Reviewed By: zrphercule

Differential Revision: D15866807

Pulled By: houseroad

fbshipit-source-id: a165fbc00c65c44b276df89ae705ca8960349d48
2019-06-17 23:34:59 -07:00
3ed8acdf59 Fixes lint error in py3 (#21883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21883
ghimport-source-id: c4330d71033929178ef10f2a0fcd8b0b2b468cb5

Test Plan: Imported from OSS

Differential Revision: D15866746

Pulled By: bwasti

fbshipit-source-id: c3d23f3396a95d5b1d689a07662e82e48cb3ab7a
2019-06-17 22:20:06 -07:00
2ba164b943 Future interface for ATen/Parallel (#21764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21764
ghimport-source-id: fca083c09d814a0411020871f49429509fc0e8b5

Test Plan:
Imported from OSS
see https://github.com/pytorch/pytorch/pull/21764

Differential Revision: D15816658

Pulled By: ilia-cher

fbshipit-source-id: 0e25ca6ff66a837d4f69f37a47e59927ab10e216
2019-06-17 22:05:59 -07:00
d8314a6260 Replace nullary/unary/binary loops with generic implementation (#21475)
Summary:
```
This replaces the kernel helpers in Loops.h/cuh with the following:

  cpu_kernel
  cpu_kernel_vec

  gpu_kernel
  gpu_kernel_with_scalars

These work with functions with any number of input arugments, with the
exception of 'gpu_kernel_with_scalars' which is limited to binary
operations. Previously, we only supported functions of 0, 1, or 2 input
arguments. Adding support for 3 or 4 input argument functions required
significant amount of additional code.

This makes a few other changes:

Remove 'ntensors' from the for_each/serial_for_each loop. Most loops
assume a fixed number of tensors, and the value is accessible from
TensorIterator::ntensors()

Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'.
Previously, we performed this recursively in gpu_unary_kernel and
gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)`
would specialize to a "nullary" kernel. Now, only the first
scalar input is lifted to a kernel parameter. Any additional scalar
inputs are copied to CUDA tensors. Note that operations like `x + 5`
and `5 + x` still work efficiently. This avoids generating an exponential
number of specializations in the number of input arguments.
```

**Performance measurements**
Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3):
[miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4)

**Generated assembly**
cpu_kernel and cpu_kernel_vec still generate good vectorized code with
both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of
torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/
binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that
auto vectorization still works well):

[torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331)
[torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16)
[torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628)
[torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475

Differential Revision: D15745116

Pulled By: colesbury

fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91
2019-06-17 19:08:33 -07:00
7f057f00cc Update mkldnn-bridge to fix MKLDNN grouped conv issue (#21854)
Summary:
1. Fix grouped conv issue in https://github.com/pytorch/pytorch/issues/21597
2. Fix build error in 731670f40a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21854

Test Plan: buck run experimental/jbai/pt_issue_21597_mkldnn_conv_2d_repro:run

Reviewed By: yinghai

Differential Revision: D15861105

Pulled By: bddppq

fbshipit-source-id: fe3e2943a15aab4294f8e6bb15db15829a94420f
2019-06-17 18:21:26 -07:00
5237835a17 Make script::Method a value type (#21675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21675
ghimport-source-id: 90ee7ba00e58b0151ca4c17e91fd17303c9d5d08

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15777725

Pulled By: zdevito

fbshipit-source-id: 8482cd2e1dcd7dd77a9cacbb76743bd190c7c4cf
2019-06-17 18:14:50 -07:00
cc4498a54a Always enable P2P access for GPU copies (#21872)
Summary:
PR https://github.com/pytorch/pytorch/issues/20685 incorrectly only enabled P2P access for non-contiguous copies.
This can make cudaMemcpy slow for inter-gpu copies, especially on ROCm
devices.  I didn't notice a difference on CUDA 10, but ngimel says it's
important for CUDA too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21872

Differential Revision: D15863965

Pulled By: colesbury

fbshipit-source-id: 0a858f3c338fa2a5d05949d7f65fc05a70a9dfe1
2019-06-17 17:48:28 -07:00
76a250d590 Add new regression loss function type to FBLearner (#21080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21080

Add Huber loss as a new option for regression training (refer to TensorFlow implementation: https://fburl.com/9va71wwo)
  # huber loss
  def huber(true, pred, delta):
    error = abs(true-pred)
    loss = 0.5 * min(error, delta)^2 + delta * max(error - delta, 0)
    return mean(loss)

As a combination of MSE loss (`x < delta`) and MAE loss (`x >= delta`), the advantage of Huber loss is to reduce the training dependence on outlier.

One thing worth to note is that Huber loss is not 2nd differential at `x = delta`. To further address this problem, one could consider adopt the loss of `LOG(cosh(x))`.

Reviewed By: chintak

Differential Revision: D15524377

fbshipit-source-id: 73acbe2728ce160c075f9acc65a1c21e3eb64e84
2019-06-17 17:43:00 -07:00
8aeb4ef4bf Add python string standard lib (#21807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21807
ghimport-source-id: dcb2c78b8facb90a323ab9212b7703e553354273

Test Plan: Imported from OSS

Differential Revision: D15835509

Pulled By: bwasti

fbshipit-source-id: bc8bc5ae5a4fb4a1581aa94485973ed87af4eaaf
2019-06-17 15:48:36 -07:00
d329dffd92 improve error message on recursive class defs (#21842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21842
ghimport-source-id: 33569714b18fc476c4e6b3bc976b53b1f107273d

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857568

Pulled By: suo

fbshipit-source-id: 6307597b9741cfdccd5c55216ebdc7c4391a5e23
2019-06-17 15:23:21 -07:00
cdae8b93a7 improve recursive scripting error message (#21841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21841
ghimport-source-id: fbca813d12ca4bfad7967e12c8dafe5eaba77cab

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857569

Pulled By: suo

fbshipit-source-id: 152eba10565cf7119508079e98512f116eb3a5a8
2019-06-17 15:23:17 -07:00
c0420d9618 Attempt to fix TRT build after library merge (#21775)
Summary:
After fixing https://github.com/pytorch/pytorch/issues/20774 the TRT build was broken

Because of missing annotations, pybind_state_gpu.so was missing symbols, but pybind_state.so did not. It caused a weird combination when trying to import pybind_state_gpu first left system in semi-initialized state and lead to sigsev.

Minimal repro:
```
>>> import ctypes

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so: undefined symbol: _ZN6caffe219TensorRTTransformer9TransformEPNS_9WorkspaceEPNS_6NetDefERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_11TensorShapeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SC_EEE

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so')
Segmentation fault (core dumped)
```

Too lazy to repro locally, let's see if CI passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21775

Differential Revision: D15829605

Pulled By: dzhulgakov

fbshipit-source-id: 1adb2bde56b0cd68f84cfca67bc050adcf787cd9
2019-06-17 14:16:45 -07:00
0408697317 Followup cleanup in cmake.py and add a comment in setup.py (#21792)
Summary:
Following up b811b6d5c03596d789a33d7891b606842e01f7d2

* Use property instead of __setattr__ in CMake.
* Add a comment clarifying when built_ext.run is called.

 ---

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792

Differential Revision: D15860606

Pulled By: umanwizard

fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0
2019-06-17 13:46:25 -07:00
7279e07c8b Don't use anonymous namespace in header. (#21790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21790
ghimport-source-id: ff648a1e9a1b8627f0742307e2e7810d6445d597

Test Plan: Imported from OSS

Differential Revision: D15827311

Pulled By: ezyang

fbshipit-source-id: 996bfd3a93fcda5934dcc523adae0648cba1c4fa
2019-06-17 13:26:02 -07:00
1aa16d356e named inference rule for tensor.select (#21752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21752
ghimport-source-id: 95e17087b8c29c9bd88003ae225cb7329d0b67e6

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

gh-metadata: pytorch pytorch 21752 gh/zou3519/50/head

Imported from OSS

Differential Revision: D15833453

Pulled By: zou3519

fbshipit-source-id: 7b51e4137e54712aa9c6274a9e6bb48ab7191b8d
2019-06-17 13:12:49 -07:00
b403b10ff9 Fix #11752: fix numerical issue in log_softmax (#21672)
Summary:
https://github.com/pytorch/pytorch/issues/11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in https://github.com/pytorch/pytorch/issues/11752. `log_softmax` is still not working for big logits.

I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`.

This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21672

Differential Revision: D15856327

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
2019-06-17 12:59:08 -07:00
0f675f9cbc Port im2col and vol2col (#21769)
Summary:
resolves partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21769

Differential Revision: D15854530

Pulled By: ezyang

fbshipit-source-id: 574853c068010d1b7588047d2ab7450077471447
2019-06-17 10:06:26 -07:00
2b23fac8da Disallow creation of tensors with duplicate names (#21781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21781
ghimport-source-id: d77e0c97fe0104b4b29571fd5828967399d34fb1

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 21781 gh/zou3519/51/head

Imported from OSS

Differential Revision: D15833454

Pulled By: zou3519

fbshipit-source-id: fca4de83fba4bced615ec3cbd4ce4c441ddfcaf2
2019-06-17 09:59:50 -07:00
44707dd3ca Rename Dimname::name to Dimname::full_name (#21803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21803
ghimport-source-id: e0bc5a746e745e18f19215c6551d79cb0cd5f9c5

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D15833452

Pulled By: zou3519

fbshipit-source-id: 7aa4d78ff436bd6a622a5ea235b75135d9798d33
2019-06-17 08:32:32 -07:00
7c1528bab6 Copy NamedTensorMeta in TensorImpl::copy_tensor_data() (#21735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21735
ghimport-source-id: 4a4289693e372880e3d36e579c83d9e8745e70ed

Test Plan:
- I'm not sure how to test this other than making sure it compiles.
- [namedtensor ci]

gh-metadata: pytorch pytorch 21735 gh/zou3519/49/head

Imported from OSS

Differential Revision: D15833456

Pulled By: zou3519

fbshipit-source-id: ea2fa6d5c5f1eb2d7970d47189d6e4fcd947146d
2019-06-17 08:32:28 -07:00
da4e60226c Keep Reducer hooks in a vector instead of an unordered_map (#21783)
Summary:
kuttas pointed out that the DDP Reducer only needs to remember `uintptr, Function` pairs, and hence does not need a nunordered map as added by https://github.com/pytorch/pytorch/issues/21591. Using a vector should speed it up a bit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21783

Differential Revision: D15854312

Pulled By: mrshenli

fbshipit-source-id: 153ba035b8d658c7878a613f16a42de977d89c43
2019-06-17 08:24:19 -07:00
76713fb564 Fix remote build + clean up disable feature hack (#21816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21816

Clean up disable feature hack.

Reviewed By: bddppq

Differential Revision: D15833285

fbshipit-source-id: a2ae5d0f15e47b835dbd3997bbaa0add7e868f20
2019-06-17 08:08:34 -07:00
4a6aa1d806 Populate producer_info.json in any PyTorch model at FB (#21662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21662

Use hook added in https://github.com/pytorch/pytorch/pull/20863 to auto-write a file with environment information (including user, machine, Flow, etc).

Reviewed By: natalialunova

Differential Revision: D15690185

fbshipit-source-id: ccaaeda9562db32925041d18f394fb98fab8db99
2019-06-16 20:12:23 -07:00
c9ba3f699d Bag of documentation fixes (#21846)
Summary:
Thanks henon for raising the issues.

Fixes https://github.com/pytorch/pytorch/issues/21830
Fixes https://github.com/pytorch/pytorch/issues/21831
Fixes https://github.com/pytorch/pytorch/issues/21832
Fixes https://github.com/pytorch/pytorch/issues/21827
Fixes https://github.com/pytorch/pytorch/issues/21822
Fixes https://github.com/pytorch/pytorch/issues/21820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21846

Differential Revision: D15847389

Pulled By: soumith

fbshipit-source-id: 421cc48af646a2618af731697de7d4de83d3eabe
2019-06-16 19:35:27 -07:00
972ec676b2 Remove lowered execution (#21674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21674
ghimport-source-id: b8e27f0ce9b8b362daf73556ee67457fb5355062

Reviewed By: eellison

Differential Revision: D15777726

Pulled By: zdevito

fbshipit-source-id: 718ac676c9a1bcf99b856862fd29631d825645da
2019-06-16 14:29:18 -07:00
ff1172d705 high pri Jit builtins (#21451)
Summary:
bin/hex/oct/round/chr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21451

Differential Revision: D15702863

Pulled By: ailzhang

fbshipit-source-id: 9f69896b79e7584f12353e9f2ee2969dbe1ec6d6
2019-06-16 09:48:38 -07:00
4f75da3b41 change ClassType::compilation_unit to return owning ptr (#21787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21787
ghimport-source-id: eed7b98b0f02745066164b8ef3906291931e2ecb

Test Plan: Imported from OSS

Differential Revision: D15831353

Pulled By: suo

fbshipit-source-id: 50695c35dba8ffea710cbc9aca8aba6a75512fa0
2019-06-16 02:37:07 -07:00
263b1985a8 Revert D15833924: [jit] Fix stdout capturing, remove some expect files
Differential Revision:
D15833924

Original commit changeset: 152972b4c240

fbshipit-source-id: 1d5a2258bc134fdc7bd2cb557bcc05f2289443b6
2019-06-15 20:39:11 -07:00
04f09d4235 Move unwrap logic from c10 to caffe2 (#21620)
Summary:
After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path.

Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620

Differential Revision: D15763560

Pulled By: yf225

fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
2019-06-14 22:02:43 -07:00
794ee6d00c Switch to out-source builds for LibTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21772

Differential Revision: D15839332

Pulled By: yf225

fbshipit-source-id: 017cf61c5682c6a8ffeaf2ca952e1418c27be30e
2019-06-14 21:00:18 -07:00
4a2fc00db0 Revert D15830704: [jit] Add Python string standard lib
Differential Revision:
D15830704

Original commit changeset: e55a8c6bf910

fbshipit-source-id: 1ec953bfaabab0288e953f48cde0a32370ac3fc6
2019-06-14 20:52:58 -07:00
97b92eede1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 979417253fab9142059bdfb6e834f44bb1cc8d0d
2019-06-14 17:41:30 -07:00
220efdbdc4 Refactor pybind_utils.h (#21550)
Summary:
This refactors pybind_utils so we can have all our type-inferring stuff in
1 place (e.g. for #21379)

There is some follow up work to make the error messages better, but I think that's fine to save for another PR.
](https://our.intern.facebook.com/intern/diff/15727002/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21550

Pulled By: driazati

Differential Revision: D15727002

fbshipit-source-id: a6974f2e1e5879f0503a18efc138da31cda7afa2
2019-06-14 17:27:45 -07:00
a85305fdea Hook up profiled execution in the interpreter (#21799)
Summary:
Rebasing https://github.com/pytorch/pytorch/pull/21616 onto master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21799

Differential Revision: D15832854

Pulled By: Krovatkin

fbshipit-source-id: 88d754446df2abc25ea86e46764848d48ee3a5fc
2019-06-14 16:56:13 -07:00
4bcc72fe95 Support for NamedTuple (#21428)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/18

This implements NamedTuple by taking advantage of the existing `names` field in `TupleType`.

TODO: This currently doesn't retain the NamedTuple-ness through serialization. Discussed with suo offline, we can probably make a way to define an anonymous NamedTuple in script (e.g. `NamedTuple('Foo', [('a', int), ('b', float), ('c', List[float])])` and serialize that
TODO: implement support for calling the constructor with kwargs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21428

Differential Revision: D15741564

Pulled By: jamesr66a

fbshipit-source-id: c077cbcea1880675ca6deb340a9ec78f824a136c
2019-06-14 16:45:56 -07:00
ac8d1a1f76 fix some issues found by enabling -Wshorten-64-to-32 (#18187)
Summary:
when enabling this flag, there were a lot of warnings, this pr focuses on the warnings where this comparison could be affecting array indices, which could be ones most prone to fail

the good news is that I didn't find anything obviously concerning

one degenerate case could be when the matrices we work with are too skinny could run into issues (dim1=1, dim2 needs to hold a big number)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18187

Differential Revision: D14527182

Pulled By: hyuen

fbshipit-source-id: b9f46b6f68ab912c55368961758a7a5af1805555
2019-06-14 16:29:32 -07:00
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
d0021b3ac7 Fix stdout capturing, remove some expect files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21805

Pulled By: driazati

Differential Revision: D15833924

fbshipit-source-id: 152972b4c24041b8a459d5b8ef8789543a6b8153
2019-06-14 16:05:06 -07:00
07fea3f5b6 Add new get_batch() method to ChunkDataset API (#21797)
Summary:
We plan on generating python bindings for C++ ChunkDataset API using the current Pytorch Dataloader class, which must call get_batch() instead of get_batch(size)

This changes doesnt break the current API, just add one more method that will make future extensions easier (WIP)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21797

Differential Revision: D15830522

Pulled By: soumith

fbshipit-source-id: 7208f305b48bf65d2783eaff43ff57a05e62c255
2019-06-14 13:39:54 -07:00
dddc65db9e Add Python string standard lib (#21059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21059
ghimport-source-id: f813585cde1b275c134b19009a2f5c0b3d70fc6e

Reviewed By: jamesr66a

Differential Revision: D15830704

Pulled By: bwasti

fbshipit-source-id: e55a8c6bf910a163b9a5260235e315af9532b129
2019-06-14 13:34:42 -07:00
65a3dbdfb0 Remove hip device sync in miopen Conv implementation (#21791)
Summary:
xw285cornell bddppq
Note there are other optimizations coming.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21791

Differential Revision: D15829238

Pulled By: bddppq

fbshipit-source-id: 66c62c646f315d65b4e432ca20890faded843db4
2019-06-14 12:32:50 -07:00
1fc240e59a add tests for add_custom_scalars and others (#20987)
Summary:
Originally, the tests for tensorboard writer are smoke tests only. This PR lets CI compare the output with expected results at low level. The randomness of the tensors in the test are also removed.
ps. I found that how protobuf serializes data differs between different python environment. One method to solve this is to write the data and then read it back instantly. (compare the data at a higher level)

For `add_custom_scalars`, the data to be written is a dictionary. and the serialized result might be different (not `ordereddict`). So only smoke test for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20987

Reviewed By: NarineK, lanpa

Differential Revision: D15804871

Pulled By: orionr

fbshipit-source-id: 69324c11ff823b19960d50def73adff36eb4a2ac
2019-06-14 12:27:07 -07:00
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
710821875a Fix flaky nuclear_norm() test (#21638)
Summary:
Try to fix a sporadic failure on some CIs.

I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638

Differential Revision: D15827779

Pulled By: ezyang

fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02
2019-06-14 11:40:03 -07:00
zaf
ff8c3fd54e Adding the quantized namespace to torch.nn and importing it from torch (#21600)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21600 Adding the quantized namespace to torch**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15742149/)

Add nn.quantized name space to torch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21600

Differential Revision: D15742149

Pulled By: zafartahirov

fbshipit-source-id: 60dede12c81861f369d208b06f5b68e9384312f6
2019-06-14 11:05:45 -07:00
9a1dc43f34 Deprecate unordered_map and vector in IValues (#21712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21712

Warn when people use unordered_map or vector with IValues. These APIs are deprecated.
The unordered_map API is slow because it requires copying the whole map.
The vector API is slow for some types (e.g. std::string) because for them it also requires copying the whole map.
Also, the vector API would get slow for all types if we decide to switch to SmallVector.

Differential Revision: D15792428

fbshipit-source-id: 1b72406b3a8d56521c862858c9f0ed01e56f2757
2019-06-14 11:05:41 -07:00
029a968212 Define __setstate__ on _ConvNd to handle pre-padding_mode pickles. (#21687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21687
ghimport-source-id: df49530d25239ac4d62eae83c5d7b0d8f00f836a

Differential Revision: D15807402

Pulled By: ezyang

fbshipit-source-id: f51b221444afc4e017db7544642a9c0a7d2a3efb
2019-06-14 11:00:21 -07:00
7284f448ba Fix handling of kwargs from common method invocations (#21499)
Summary:
When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't.

Also fixed some flake8 warnings I was seeing locally.

I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499

Differential Revision: D15826974

Pulled By: nairbv

fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36
2019-06-14 10:47:33 -07:00
c1744a6c39 Add ONNX py3 CI cases (#21715)
Summary:
So far, we only have py2 ci for onnx. I think py3 support is important. And we have the plan to add onnxruntime backend tests, which only supports py3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21715

Reviewed By: bddppq

Differential Revision: D15796885

Pulled By: houseroad

fbshipit-source-id: 8554dbb75d13c57b67ca054446a13a016983326c
2019-06-14 10:20:14 -07:00
c06ccbe663 Add aten mkldnn zero_ operator (#20573)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💚
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20573

Differential Revision: D15820477

Pulled By: bddppq

fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68
2019-06-14 09:48:49 -07:00
bc6281028c rebuild_storage_fd retry on EINTR (#21723)
Summary:
Some data loader tests are flaky on py 2 with the following error
```
Jun 12 22:17:31 Traceback (most recent call last):
Jun 12 22:17:31   File "test_dataloader.py", line 798, in test_iterable_dataset
Jun 12 22:17:31     fetched = sorted([d.item() for d in dataloader_iter])
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__
Jun 12 22:17:31     idx, data = self._get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data
Jun 12 22:17:31     success, data = self._try_get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data
Jun 12 22:17:31     data = self.data_queue.get(timeout=timeout)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Jun 12 22:17:31     res = self._recv()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Jun 12 22:17:31     return pickle.loads(buf)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Jun 12 22:17:31     return Unpickler(file).load()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Jun 12 22:17:31     dispatch[key](self)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Jun 12 22:17:31     value = func(*args)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Jun 12 22:17:31     fd = multiprocessing.reduction.rebuild_handle(df)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
Jun 12 22:17:31     new_handle = recv_handle(conn)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
Jun 12 22:17:31     return _multiprocessing.recvfd(conn.fileno())
Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call
```

Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174
So we should call it with an outer try-catch loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723

Differential Revision: D15806247

Pulled By: ezyang

fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7
2019-06-14 09:10:00 -07:00
deb2140c6e Throwing errors for min and max reductions in empty CUDA tensors (#19612)
Summary:
Related to https://github.com/pytorch/pytorch/issues/17750.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19612

Differential Revision: D15813649

Pulled By: gchanan

fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35
2019-06-14 08:34:30 -07:00
b811b6d5c0 When building extensions, honor options set in CMake. (#21653)
Summary:
Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because:

1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail.
2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example,

```bash
# First build
USE_CUDA=0 python setup.py install
# Subsequent builds like this would fail, unless "build/" is deleted
python setup.py install
```

This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions.

 ---

The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653

Differential Revision: D15824506

Pulled By: ezyang

fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028
2019-06-14 08:13:40 -07:00
4001e71547 When converting to NumPy, throw TypeError when type is not supported (#21608)
Summary:
This makes the error thrown in aten_to_numpy_dtype consistent with that in numpy_dtype_to_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21608

Differential Revision: D15816035

Pulled By: gchanan

fbshipit-source-id: 392e8b9ea37003a859e7ed459911a1700fcbd695
2019-06-14 07:35:03 -07:00
2d5ce519f2 Fix with emit_nvtx, also allow shape information to appear in nvtx ranges. (#21691)
Summary:
This PR is intended as a fix for https://github.com/pytorch/pytorch/issues/21644.

It allows the `with emit_nvtx` context manager to take an additional `record_shapes` argument. `record_shapes` is False by default, but if True, the nvtx ranges generated for each autograd op will append additional information about the sizes of Tensors received by that op.

The format of shape information is equivalent to what the CPU-side profiler spits out.  For example,
```
M = torch.randn(2, 3)
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)

with torch.cuda.profiler.profile():
    with torch.autograd.profiler.emit_nvtx(record_shapes=True):
        torch.addmm(M, mat1, mat2)
```
produces the following nvtx range label for addmm:
![Screenshot from 2019-06-12 10-48-01](https://user-images.githubusercontent.com/7799218/59374008-b7d13100-8cff-11e9-9245-58410073d965.png)
(cf the "Input Shapes" shown in 864cfbc216 (diff-115b6d48fa8c0ff33fa94b8fce8877b6))

I also took the opportunity to do some minor docstring cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21691

Differential Revision: D15816226

Pulled By: gchanan

fbshipit-source-id: b2b01ea10fea61a6409a32b41e85b6c8b4851bed
2019-06-14 07:35:00 -07:00
b9675efb5a Fix the issue of sizes vs size for tensor creation ops (#21686)
Summary:
Related to [pytorch#20921](https://github.com/pytorch/pytorch/issues/20921)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21686

Differential Revision: D15816109

Pulled By: gchanan

fbshipit-source-id: 4428b8e77b6c8b297ddb77e58fc1cb916c9cc46e
2019-06-14 07:34:56 -07:00
1e7bd7586d Query caffe2 operator stats for detailed execution info (#20924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20924

I found a python3 bug for deserializing caffe2 code. The exception thrown is Unicode related error instead of just decode error, and we need to catch that as well

Reviewed By: ipiszy

Differential Revision: D15293221

fbshipit-source-id: 29820800d1b4cbe5bf3f5a189fe2023e655d0508
2019-06-13 23:41:04 -07:00
d9eec4ef0d backend.py: _getattr__ must raise AttributeError (#21763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21763

Custom __getattr__ functions can only raise AttributeError. This code throwed NotImplementedError which caused upstream troubles when hasattr() was called.

Differential Revision: D15815176

fbshipit-source-id: 0982e2382de4578d3fc05c5d2a63f624d6b4765e
2019-06-13 23:17:57 -07:00
044809f1f3 Handling numel() == 0 in convTranspose (#21652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21652

Diff fixes issue of empty ROIs for convTranspose

Issue StackTrace: P65374505

Reviewed By: jerryzh168

Differential Revision: D15766739

fbshipit-source-id: 39cf8feca66b6aae22ff4ec5c1b6a4e3f20f378d
2019-06-13 23:02:26 -07:00
5c0e058950 Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647
ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42

Differential Revision: D15804425

Pulled By: zou3519

fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d
2019-06-13 20:38:19 -07:00
3e79036382 Make it possible to trigger all tests by pushing to ci-all/ branch (#21750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21750
ghimport-source-id: 4792aa5ccab7e4b54c21f23d0b78802f85bbeb8d

Differential Revision: D15819367

Pulled By: ezyang

fbshipit-source-id: db91ee727c66469ac78e59b3662f29db53a916bc
2019-06-13 19:53:35 -07:00
16b4a12ed8 better example for local weights (#21685)
Summary:
fixes https://github.com/pytorch/hub/issues/29
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21685

Differential Revision: D15817774

Pulled By: ailzhang

fbshipit-source-id: d2f615e5d431186d45a21d8300fb9ba3c37b246c
2019-06-13 17:56:25 -07:00
adc99efb46 Add batch id to tracer event (#21446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446

this is used for easier tracing of iter id when looking at trace diagram

Reviewed By: ilia-cher

Differential Revision: D15628950

fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d
2019-06-13 17:13:42 -07:00
fbecb4621f schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15808354

Pulled By: ZolotukhinM

fbshipit-source-id: 16d938fd5acafb445a0c433cabc9a55cab563165
2019-06-13 17:04:38 -07:00
cfd8c58b45 Tune elementwise ops for ROCm (#21754)
Summary:
```
The stride calculation using OffsetCalculator performs poorly with
MAX_DIMS=25. This reduces MAX_DIMS (after coalescing) to 16 on ROCm.
I think it's unlikely that anyone will exceed this limit. If they do,
we can add additional specializations for ROCm with more dimensions.
```

I'm not sure about the underlying cause. With MAX_DIM=25, the add kernel's params
is ~648 bytes vs. ~424 bytes with MAX_DIM=16. The kernel instruction footprint is
bigger too, but most of these instructions are never executed and most kernel parameters
are never loaded because the typical dimensionality is much smaller.

Mini benchmark here:
https://gist.github.com/colesbury/1e917ae6a0ca9d24712121b92fed4c8f

(broadcasting operations are much faster)

cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21754

Reviewed By: bddppq

Differential Revision: D15811906

Pulled By: colesbury

fbshipit-source-id: 063f92c083d26e2ef2edc98df7ff0400f9432b9d
2019-06-13 16:25:26 -07:00
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
efd20de276 fix multihead attention for half (#21658)
Summary:
Currently multihead attention for half type is broken
```
  File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward
    attn_output = torch.bmm(attn_output_weights, v)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2'
```
because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658

Differential Revision: D15807487

Pulled By: zhangguanheng66

fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d
2019-06-13 15:17:04 -07:00
4716409f30 Use expect to fill in pytorchbot token (#20459)
Summary:
In this PR, we use `expect` to fill in the token for pytorchbot when doing `git push`, so that we don't need to save the token in the git remote URL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20459

Differential Revision: D15811676

Pulled By: yf225

fbshipit-source-id: cd3b780da05d202305f76878e55c3435590f15a8
2019-06-13 14:56:38 -07:00
b858f42e16 Document that no_grad is thread local. (#21755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21755
ghimport-source-id: dfb53759024d9ba9d104fdb2a8151ab996e55234

Differential Revision: D15811172

Pulled By: ezyang

fbshipit-source-id: c8c7c1c15277d8fe8cc513e20af449257d7ff15c
2019-06-13 13:47:09 -07:00
3e8dc565bd Bug fix: ONNX export full operator (#21669)
Summary:
Fix an obvious bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21669

Reviewed By: zrphercule

Differential Revision: D15806614

Pulled By: houseroad

fbshipit-source-id: d0f6e934252e0057f3dbcc7f160236ee6f4497ac
2019-06-13 13:20:21 -07:00
4b45f08f87 Added dim check for index_copy_ (#21617)
Summary:
Fixing reported [bug](https://github.com/pytorch/pytorch/issues/20322)
The issue was related to not checking the dimensions of source vs destination tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21617

Differential Revision: D15749963

Pulled By: izdeby

fbshipit-source-id: acff114c729fd9c0a9a51325e0ebd8b42e1f2fc1
2019-06-13 13:15:23 -07:00
aa6887e6ef add error message to missing function backend (#21742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21742

Add error message to NotImplementedError so we know which function it is about.

Reviewed By: bddppq

Differential Revision: D15806379

fbshipit-source-id: 14eab9d03aa5b44ab95c5caeadc0e01d51f22188
2019-06-13 13:10:48 -07:00
756a20de93 Add/edit docs for nn.transformer (#21746)
Summary:
Add docs for TransformerEncoder and TransformerDecoder, plus minor edits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21746

Differential Revision: D15807498

Pulled By: zhangguanheng66

fbshipit-source-id: 388efb5821c4c3d25865cecea70902e9b2bf5d15
2019-06-13 12:27:26 -07:00
7c7d5be033 Skip failing test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21751

Pulled By: driazati

Differential Revision: D15809091

fbshipit-source-id: 3cc96e632a7b89b4d86d68d2a76021d971447e12
2019-06-13 12:21:56 -07:00
51ee048709 improve torch.load & torch.save doc formatting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21747

Differential Revision: D15808189

Pulled By: ezyang

fbshipit-source-id: 5413eaaa901be098c6bad135f702ba103bc79d6c
2019-06-13 12:13:04 -07:00
63a7c7bb2a Add event and event_counter columns to caffe2_usage_tracer table (#21739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21739

Added event and event_counter columns for PyTorch/Caffe2 API usage metrics

Reviewed By: dzhulgakov

Differential Revision: D15119119

fbshipit-source-id: a71010bd659109a8e4f3a8bad84b22c1d15dc528
2019-06-13 12:06:02 -07:00
f87d5cc191 Fix first reshape in pixel_shuffle conversion (#21486)
Summary:
When converting pixel_shuffle to reshape + transpose + reshape, the first reshape should
be:
[N, C * r^2, H, W] => [N, C, r, r, H, W]
in order to match pytorch's implementation (see ATen PixelShuffle.cpp).

This previously wasn't caught by the test case, since it uses C = r = 4. Updated test case to
have C = 2, r = 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21486

Reviewed By: houseroad

Differential Revision: D15700945

Pulled By: houseroad

fbshipit-source-id: 47019691fdc20e152e867c7f6fd57da104a12948
2019-06-13 11:44:54 -07:00
fc3f702ba8 at::launch benchmark (#21581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21581
ghimport-source-id: 6a65d73694b17611d6ad45db0b39b86c318a68c7

Differential Revision: D15736495

Pulled By: ilia-cher

fbshipit-source-id: 6b9109ad3611ff3c8b1a37796e9149bef0c2ad36
2019-06-13 10:46:35 -07:00
eca42a5122 Fix failing test for Final annotations (#21725)
Summary:
](https://our.intern.facebook.com/intern/diff/15800009/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21725

Pulled By: driazati

Differential Revision: D15800009

fbshipit-source-id: 5409c213161e3f2031710933897b85872aad2a83
2019-06-13 10:41:44 -07:00
5485f09f18 Native TBB parallel backend (#20480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20480
ghimport-source-id: c710f897c4c9b9616fc3dd76d80b4845aea43a1f

Differential Revision: D15333692

Pulled By: ilia-cher

fbshipit-source-id: 61e476dd5c737fe144e3aec000d8ebb11fbc0547
2019-06-13 10:11:16 -07:00
ab0d5ab99d Port avg_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21635

Differential Revision: D15768487

Pulled By: ezyang

fbshipit-source-id: 85e1d883aded0f4d3ac5100719df335f5a337fc5
2019-06-13 10:03:58 -07:00
5a7e2ccc0b Add use_rocm flag to detect AMD build in the runtime (#21718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21718

adding a detection method on whether the package is built for AMD.

Reviewed By: bddppq

Differential Revision: D15795893

fbshipit-source-id: 91a21ee76b2273b1032507bdebe57e016717181d
2019-06-13 09:30:49 -07:00
556af7c19d ROCm 2.5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21724

Differential Revision: D15799149

Pulled By: bddppq

fbshipit-source-id: c72689e73470f2ca145556a2ac8cb34e36e341ef
2019-06-13 01:32:46 -07:00
42770e1370 Improving Categorical Distribution Docs' (#16291) (#21707)
Summary:
**Closes:** Confusing documentation with distributions.Categorical about logits https://github.com/pytorch/pytorch/issues/16291

**Solution**: Changes documentation on the Categorical distribution from `log probabilities` to `event log-odds`. This makes should reduce confusion as raised by this issue, and is consistent with other distributions such as `torch.Binomial`.

More than happy to make any other changes if they fit :).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21707

Differential Revision: D15799181

Pulled By: soumith

fbshipit-source-id: f11acca7a5c130102a3ff6674640235ee5aa69bf
2019-06-12 23:54:02 -07:00
a3db2844e1 Support tuples in ScriptModule inputs/outputs (#20784)
Summary:
- [x] Add tests after https://github.com/pytorch/pytorch/pull/20256 is merged

- Support exporting ScriptModule with inputs/outputs of arbitrarily constructed tuples.

- Moved the assigning of output shapes to after graph conversion to ONNX is completed. By then all tuples in the IR has already been lowered by the pass ```_jit_pass_lower_all_tuples```. If assigning output shapes is required to happen before that, we'll need to hand parse the tuple structures in the graph, and repeat the same logic in ```_jit_pass_lower_all_tuples```. Handling inputs is easier because all tuple information is encoded within the input tensor type.

- Swap the order of ```_jit_pass_lower_all_tuples``` and ```_jit_pass_erase_number_types```. Ops like ```prim::TupleIndex``` relies on index being a scalar. ```_jit_pass_erase_number_types``` will convert these kind of scalars to tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20784

Reviewed By: zrphercule

Differential Revision: D15484171

Pulled By: houseroad

fbshipit-source-id: 4767a84038244c929f5662758047af6cb92228d3
2019-06-12 23:37:28 -07:00
4c03ac7ac4 Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689)
Summary:
…ngular_solve

Changelog:
- Iterate over mini batches of 65535 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689

Differential Revision: D15800254

Pulled By: soumith

fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a
2019-06-12 23:30:19 -07:00
b599bb3836 Add mkldnn mul operator (#20575)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💛
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20575

Differential Revision: D15799529

Pulled By: bddppq

fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486
2019-06-12 22:41:51 -07:00
d3b3cbe26e Revert D15769066: [pytorch][PR] schema_matching.cpp: improve error messages.
Differential Revision:
D15769066

Original commit changeset: 5853e0360581

fbshipit-source-id: ac6fa8429136abf4c7835919009f936eea11ea7b
2019-06-12 20:17:38 -07:00
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
e9121e27ce remove liveness tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21719

Differential Revision: D15797628

Pulled By: Krovatkin

fbshipit-source-id: 87742bdde0b05aff4341ababb1f55c51991768ec
2019-06-12 19:04:41 -07:00
f5c00345b3 Profiling Programs section in README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21695

Differential Revision: D15795716

Pulled By: Krovatkin

fbshipit-source-id: e14a44210ea4312a247157a6681fce449e40f779
2019-06-12 17:52:05 -07:00
8dd670657b Liveness for BailOut graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21615

Differential Revision: D15793434

Pulled By: Krovatkin

fbshipit-source-id: d89f1bf61ea57a1e3b75f8e2b200c27beb8b46cf
2019-06-12 17:22:33 -07:00
8c57ce87b0 make tests pass with enable_first_class_module() enabled. (#21565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21565
ghimport-source-id: d1fe735fb7821eadc59116fb921d8fe39a49f818

Reviewed By: driazati

Differential Revision: D15729503

Pulled By: zdevito

fbshipit-source-id: fabb678f040d21fae7545e3b2be1d098e24c544e
2019-06-12 17:13:00 -07:00
d8056cb832 Update quantization to work with first-class modules. (#21660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21660
ghimport-source-id: f9a11b2748f49042ee636755358d79c547aa249e

Reviewed By: suo

Differential Revision: D15770237

Pulled By: zdevito

fbshipit-source-id: 41fa8577028eef247bc545635cd93192a0b19db4
2019-06-12 17:12:57 -07:00
56f4602630 Add WeakIValue, use in tracer. (#21515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21515
ghimport-source-id: 7898a68791db2b5050164ab01d6ca6991e05746d

Reviewed By: suo

Differential Revision: D15719981

Pulled By: zdevito

fbshipit-source-id: 42cf26cf6541bcdf95f1343da3b9228fe2c229da
2019-06-12 17:12:53 -07:00
0293cf5bb6 Add Final[T] annotated members to __constants__ (#21603)
Summary:
Class member annotations can be marked with `Final[T]` instead of adding them to `__constants__`. `Final` comes from the `typing_extensions` module (which will be used if it is present). If not, the polyfill from `_jit_internal` is exposed as `torch.jit.Final` for users that don't want to install `typing_extensions`.

This keeps around `__constants__` since a lot of code is still using it, but in documentation follow ups we should change the examples to all to use `Final`.

TODO: install typing_extensions on CI, move tests to a Python3 only file when #21489 lands
](https://our.intern.facebook.com/intern/diff/15746274/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21603

Pulled By: driazati

Differential Revision: D15746274

fbshipit-source-id: d2c9b5643b4abba069b130c26fd42714c906ffac
2019-06-12 16:40:40 -07:00
0481a7710d Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Differential Revision: D15790937

Pulled By: driazati

fbshipit-source-id: 0cc204f7209a79839d330663cc6ba8320d3a4120
2019-06-12 15:51:46 -07:00
699de487db numerical integration "trapz" function. (#21610)
Summary:
This is intended to match [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html): numerical integration based on the trapezoid rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21610

Differential Revision: D15747618

Pulled By: umanwizard

fbshipit-source-id: 8eadb2e75c9877b07592d875ca0b2cca6cb72297
2019-06-12 15:30:13 -07:00
b527e48588 Use c10::List (#21177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177

- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now

Differential Revision: D15476433

fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
2019-06-12 13:58:24 -07:00
ae342fd076 Refactor Random Number Generators in ATen (#21364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21364
ghimport-source-id: ca7d37e10190ba46dc8512f437404ca9216d3369

Differential Revision: D15696497

Pulled By: ezyang

fbshipit-source-id: 2e713b8566ae915e175b5a79ac1dd9b86cc2a23d
2019-06-12 13:01:30 -07:00
96910251e0 schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15769066

Pulled By: ZolotukhinM

fbshipit-source-id: 5853e0360581c44e42b068add3bf2bc68e671b2b
2019-06-12 12:43:12 -07:00
28adca82ea Add some named tensor helper functions (#21636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21636
ghimport-source-id: 5eff5744cd3c80f75bdb02576be1407a64e0434d

Differential Revision: D15780269

Pulled By: zou3519

fbshipit-source-id: 87ff40ffbe0ebd5fc4d105709c9f6f8dda5f9952
2019-06-12 12:34:44 -07:00
20b0acf057 Add some more namedtensor builds to the CI (#21632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21632
ghimport-source-id: 6a8da97ce153c6d279017af920edd0d20765c32c

Differential Revision: D15760331

Pulled By: zou3519

fbshipit-source-id: b2f4c65df5f6f9322d47da995c76851387e5df47
2019-06-12 12:34:40 -07:00
3e6eb3dcab Add virtual dtor to NamedTensorMetaInterface (#21633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21633
ghimport-source-id: 6cdf0b1559e696a19e282ff6d5ba79c6b119e8c0

Differential Revision: D15760589

Pulled By: zou3519

fbshipit-source-id: 537882c05ab7b19889a31c648c5efeb1949831a8
2019-06-12 12:34:37 -07:00
83cec5f3ee nn.Transformer (#20170)
Summary:
Accidentally rebased the old PR and make it too messy. Find it here (https://github.com/pytorch/pytorch/pull/19274)

Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed).

Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed.

Add a few unit tests for the transformer module, as follow:
TestNN.test_Transformer_cell
TestNN.test_transformerencoderlayer
TestNN.test_transformerdecoderlayer
TestNN.test_transformer_args_check
TestScript.test_scriptmodule_transformer_cuda

There is another demonstration example for applying transformer module on the word language problem. https://github.com/pytorch/examples/pull/555
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20170

Differential Revision: D15417983

Pulled By: zhangguanheng66

fbshipit-source-id: 7ce771a7e27715acd9a23d60bf44917a90d1d572
2019-06-12 12:22:12 -07:00
180aa234fc Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 5cbf562652b9d7cf3877b5f819141f88c9b857d3
2019-06-12 12:17:42 -07:00
8f40164517 Add libtorch Linux CPU binary build to PR CI (#21671)
Summary:
Currently we don't have any Linux libtorch binary build in the PR CI, which led to nightly build failure such as https://circleci.com/gh/pytorch/pytorch/1939687. This PR adds Linux libtorch CPU binary build to prevent such breakage from happening in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21671

Differential Revision: D15785003

Pulled By: yf225

fbshipit-source-id: d1f2e4235e48296ddecb3367f8e5a0df16f4ea49
2019-06-12 12:07:31 -07:00
39d412194f Fix ProcessGroupGloo allgather for tensors with shared storage (#21490)
Summary:
Fix https://github.com/pytorch/pytorch/issues/20421

`ProcessGroupGloo` only requires input/output tensors to be contiguous. Contiguous tensors might not start from the beginning of the underlying storage, e.g., `chunk(..., dim=0)[1]`. The current implementation passes `tensor.storage().data()` ptr to gloo buffer. This leads to wrong results if the tensor has a non-zero storage offset.

The proposed solution is to use `tensor.data_ptr()` instead. Let's see if this breaks any tests.

cc qijianan777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21490

Differential Revision: D15768907

Pulled By: mrshenli

fbshipit-source-id: 9d7d1e9baf0461b31187c7d21a4a53b1fbb07397
2019-06-12 11:59:17 -07:00
ad73ea22f7 Add strong Wolfe line search for lbfgs (#8824)
Summary:
This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book.

The implementation is based on four sources:
+ https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html
+ https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59
+ https://github.com/torch/optim/blob/master/lswolfe.lua
+ https://github.com/torch/optim/blob/master/polyinterp.lua

The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824

Differential Revision: D15783067

Pulled By: vincentqb

fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69
2019-06-12 11:32:41 -07:00
2c91ba3bbc Add div hashing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21422

Reviewed By: xianjiec

Differential Revision: D15589181

fbshipit-source-id: f6ff0726164f88da45e4b090b4d5ad05305b3225
2019-06-12 11:27:37 -07:00
76e01542ed Fix the shape of PReLU weight (#21330)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/21271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21330

Reviewed By: zrphercule

Differential Revision: D15776459

Pulled By: houseroad

fbshipit-source-id: 4e0aef88e9c91c79faa3da6fa66f7466dee52018
2019-06-12 11:03:40 -07:00
7123c6ca04 Enable groupwise for qconv (#21592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21592

We now support groupwise convolutions for qconv2d

Reviewed By: zafartahirov

Differential Revision: D15739239

fbshipit-source-id: 80b9b4fef5b9ee3d22ebecbaf205b970ab3d4250
2019-06-12 11:03:36 -07:00
8cc8e15473 Back out "[pytorch][PR] [Re-landing] Fix caffe2 windows CI for new Windows AMI" (#21670)
Summary:
Original commit changeset: e65c1d6bfcc9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21670

Differential Revision: D15776087

Pulled By: yf225

fbshipit-source-id: cbb55cbbcb133cae1aeb2fe75cc52e7350cc6c88
2019-06-12 10:37:19 -07:00
cbcb2b5ad7 Delete DDP hooks in Reducer destructor (#21591)
Summary:
Closes https://github.com/pytorch/pytorch/issues/21344

DDP assigns the original module to the first module replica instead of creating a new one. Then, it creates a new Reducer to add post hooks to sync gradients. However, because every reconstructed DDP instance wraps the same original module, all their reducers will add hooks to the same set of variables. This PR deletes DDP hooks from variables when destructing Reducer, trying to make DDP failure recoverable.

pietern kuttas and I discussed the following solutions:

#### Solution 1

Keep `add_post_hook` API intact, and do a `dynamic_cast` in `del_post_hook` to check hook type. If the type matches Reducer's hook, delete it. As pietern mentioned, this will not work if we create multiple DDP instances from the same original model.

#### Solution 2

Use a counter to generate a unique key for every hook in `Function`, and keep them in a map. return the key to the caller of `add_post_hook`, and ask the caller to provide key if it needs to delete the hook.

Con: this would add extra overhead to `add_post_hook` and every `Function` object.

#### Solution 3 [Current implementation]

kuttas suggests that, instead of generating a unique key, directly using the address of the pointer would be better. In order to avoid messing up dereferencing, let `add_post_hook` to return a `uintptr_t`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21591

Differential Revision: D15745706

Pulled By: mrshenli

fbshipit-source-id: e56d2d48de0c65f6667790ab16337eac7f7d8b76
2019-06-12 07:08:28 -07:00
1e4af2b969 Pin torchvision version. (#20811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20811
ghimport-source-id: 52e043453272d8441a2c0efd7f005b71ded024d6

Differential Revision: D15779416

Pulled By: ezyang

fbshipit-source-id: 1b3c2d9aeab57e580038f0c2a8bfbfcae9d7b62a
2019-06-12 06:16:20 -07:00
1ffa9d3d3b correct measure quantization error when followed_by=Relu and dequantize_output=1 (#21664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21664

As title

Reviewed By: csummersea

Differential Revision: D15770947

fbshipit-source-id: 57f5842e1a250300703b02134c314e4f06b767b8
2019-06-11 23:36:15 -07:00
c2a18a6702 Override print when python is present (#21625)
Summary:
This makes it so we can see the output of prim::Print in environments like iPython notebooks which override sys.stdout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21625

Differential Revision: D15756793

Pulled By: jamesr66a

fbshipit-source-id: 7d9a14b2e229ed358e784318e9d862677db2c461
2019-06-11 22:58:22 -07:00
aa7e27fa70 Emit Loop Condition as Separate Block (#21611)
Summary:
Emit loop condition as a separate block in loops, then inline them before conversion to SSA. This is needed for breaks & continues where we will inline the condition block after the continue pass and before the break pass.

I also considered emitting a prim::For and a prim::While, but i think it's easier to just have one pathway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21611

Differential Revision: D15775820

Pulled By: eellison

fbshipit-source-id: de17c5e65f6e4a0256a660948b1eb630e41b04fb
2019-06-11 22:03:26 -07:00
341a7e4bb5 Fix issue in backward path (#21663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663

as title

Reviewed By: hl475

Differential Revision: D15770793

fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6
2019-06-11 21:09:25 -07:00
afd202be9f StoreMatrixInMatrixMarketFormat can store both integer and float tensors (#21606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21606

StoreMatrixInMatrixMarketFormat was able to dump quantized tensors only but sometimes we want to dump float tensors.

Reviewed By: csummersea

Differential Revision: D15741611

fbshipit-source-id: 95b03c2fdf1bd8407f7d925171d9dc9f25677464
2019-06-11 17:28:19 -07:00
c2a08d339b Automatic update of fbcode/onnx to dd599b05f424eb161a31f3e059566a33310dbe5e (#21641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21641

Previous import was 5160f3ac3380302224998f1c95e111cd961c4bc5

Included changes:
- **[dd599b05](https://github.com/onnx/onnx/commit/dd599b05)**: Fix type s/depracted/deprecated/ (#2092) <Takeshi Watanabe>
- **[abb1702a](https://github.com/onnx/onnx/commit/abb1702a)**: Add shape inference for Tile op (#2076) <Hariharan Seshadri>
- **[67638d9c](https://github.com/onnx/onnx/commit/67638d9c)**: [New Operator] Round (#2053) <Jeff Saremi>
- **[584e4477](https://github.com/onnx/onnx/commit/584e4477)**: Add dilations support in ConvTranspose shape inference and update docs (#2068) <daquexian>

Reviewed By: zrphercule

Differential Revision: D15762382

fbshipit-source-id: 590f25fb733e1565eb90fcdeb797b0ba34e2d2c3
2019-06-11 16:54:47 -07:00
968114ae3d Revert D15769256: [jit] Add python string standard lib
Differential Revision:
D15769256

Original commit changeset: 1af487446361

fbshipit-source-id: 96bea4a49664dad68762bef75ae28e64c673f8b1
2019-06-11 16:54:43 -07:00
039629cedd fix incorrect use of TeX in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21649

Differential Revision: D15766392

Pulled By: umanwizard

fbshipit-source-id: a362ec06e971ee12c47a45bc9c15cc773ec878e3
2019-06-11 16:19:40 -07:00
1bd21d3f14 test_jit: Remove tests checking non-guaranteed properties from 'test_insert_observers'. (#21657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21657
ghimport-source-id: e9c7e45c00db55bf3b7895d06d77f0d99bfc1afe

Differential Revision: D15769295

Pulled By: ZolotukhinM

fbshipit-source-id: cfb40bc5d7116b1d99f5e0f5c4f5577f5aa33804
2019-06-11 16:12:09 -07:00
ee33afe2b1 randomized testing for qconv (#21436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21436

Test many different options

Reviewed By: zafartahirov

Differential Revision: D15683754

fbshipit-source-id: 60d0fc697b53c7e4adadbe80995d45f28729bca4
2019-06-11 16:07:22 -07:00
cf5c3bb3fe make range functions respect current stream (#21619)
Summary:
Stream is not respected on range/linspace/logspace functions, which contributes to https://github.com/pytorch/pytorch/issues/21589 (this is not a complete solution for that issue).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21619

Differential Revision: D15769666

Pulled By: ezyang

fbshipit-source-id: 7c036f7aecb3119430c4d432775cad98a5028fa8
2019-06-11 15:46:48 -07:00
9241c4b3c6 Add python string standard lib (#21656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21656
ghimport-source-id: cc7d7f68e33e95a97f6274c50823138aa4bacabb

Differential Revision: D15769256

Pulled By: bwasti

fbshipit-source-id: 1af487446361d90d03dce004c3e2169a3e62667d
2019-06-11 15:23:23 -07:00
9737b166a4 Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15761029

Pulled By: colesbury

fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb
2019-06-11 15:18:17 -07:00
fb9fbc009c Fix momentum bug in CyclicLR (#20401)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/19003

The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum).

Maybe printing a warning when switching this argument's value would suffice?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401

Differential Revision: D15765463

Pulled By: ezyang

fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf
2019-06-11 15:10:28 -07:00
cdbc20677c Add len to OrderedDict types (#21651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21651
ghimport-source-id: 0bba5c1930865e2d18b18782ba8c8990b0761d4d

Differential Revision: D15767795

Pulled By: bwasti

fbshipit-source-id: 70e27176897b0f977c9034ffb3ad21091c91e12e
2019-06-11 14:53:40 -07:00
7a040f4b0b Revert D15706021: [jit] Support for type annotations instead of torch.jit.annotate()
Differential Revision:
D15706021

Original commit changeset: 8bf1459f229d

fbshipit-source-id: 7ae34578560e2dccd0f04af2220445b3999771fe
2019-06-11 14:33:28 -07:00
b46e87cd3d Fix catch block to fix 'error: catching polymorphic type' (#21637)
Summary:
Fix catch block to fix 'error: catching polymorphic type `class c10::Error` by value [-Werror=catch-value=]'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21637

Differential Revision: D15761860

Pulled By: VitalyFedyunin

fbshipit-source-id: befc18a9c217440381cdb50a1319b0b5db5710e9
2019-06-11 12:30:52 -07:00
dd439bc39e Rename hubconf.conf to hubconf.py in docs (#21631)
Summary:
It's a typo I guess. cc fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21631

Differential Revision: D15764909

Pulled By: soumith

fbshipit-source-id: 5ffc7bde181c13e151332e7de3c0da36505b495e
2019-06-11 12:22:43 -07:00
bbcd6cc782 Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
](https://our.intern.facebook.com/intern/diff/15706021/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Pulled By: driazati

Differential Revision: D15706021

fbshipit-source-id: 8bf1459f229d5fd0e16e59953b9656e85a2207fb
2019-06-11 12:03:57 -07:00
25d1496d58 Fix Process Group for tensors shared across processes (#21449)
Summary:
Ops on a Process Group (pg) instance will hit an error when input/output tensors are created on a different process, because, pg calls `recordStream` on `CUDACachingAllocator` which only knows tensors created within the same process.

The proposed solution is to add a `suppressError` arg (suggestions for better names?) to `recordStream`. See comments in code for arguments.

CC pichuang1984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21449

Differential Revision: D15689736

Pulled By: mrshenli

fbshipit-source-id: e7fc81b167868f8666536067eaa7ae2c8584d88e
2019-06-11 11:50:25 -07:00
50ee1f3fa7 better error msg when seeing a unsupported builtin function (#21068)
Summary:
fixes https://github.com/pytorch/lockdown/issues/39.
Hopefully it doesn't break other tests....
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21068

Differential Revision: D15761895

Pulled By: ailzhang

fbshipit-source-id: 60cbb16cfc930b377d753b81e10b7edaea9a1281
2019-06-11 11:32:44 -07:00
4610347fdf Breaks up NN module in docs so it loads faster.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21291

Differential Revision: D15760935

Pulled By: ezyang

fbshipit-source-id: 114da4c52b78949e631e9adcae4eb620546124fb
2019-06-11 09:38:41 -07:00
646a7f99bb Move management of calls of "cmake --build" to setup_helper/cmake.py and refactor as a CMake class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21493

Differential Revision: D15759279

Pulled By: ezyang

fbshipit-source-id: 157e1de36f1c5a51caf2a25b363a94369c442012
2019-06-11 07:04:05 -07:00
835a6b9da2 Fix namedtensor build (#21609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21609
ghimport-source-id: 648a0bcd28db2cdda1bf2fa6a904ca8f851088c2

Differential Revision: D15747687

Pulled By: zou3519

fbshipit-source-id: 2a972a15fa7399391617fc6e6b19879b86568c3a
2019-06-11 06:53:50 -07:00
29c849ff34 implement transpose operator for MKLDNN (#19955)
Summary:
implement transpose operator for MKLDNN
1. upgrade mkldnn-bridge to support ND transpose
2. implement transpose operator in caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955

Differential Revision: D15701832

Pulled By: bddppq

fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f
2019-06-11 01:55:13 -07:00
731670f40a upgrade mkldnn-bridge (#20569)
Summary:
1. reduce the overhead of mkldnn-bridge itself
2. remove redundant code and useless APIs
3. provide new operators, including int8 inner_product,  ND permute/transpose, elem_add/mul, and etc.
4. improve inner_product to support io format weights without implicit reorder
5. add SoftMax support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20569

Reviewed By: houseroad

Differential Revision: D15558663

Pulled By: bddppq

fbshipit-source-id: 79a63aa139037924e9ffb1069f7e7f1d334efe3a
2019-06-11 00:47:11 -07:00
f2623c74a9 add PT pointwise unary ops to the benchmark suite (#21207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207

This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately.

Reviewed By: zheng-xq

Differential Revision: D15471597

fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56
2019-06-10 21:35:44 -07:00
4e3c97a0be add separate path for op with JIT (#21210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210

This diff introduces a new path to run op with JIT. There are two steps involved here:
1. Users need to script the op. This should happen in the `init` method.
2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend

Reviewed By: zheng-xq

Differential Revision: D15460831

fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee
2019-06-10 19:53:58 -07:00
a82feee07c Method-only entries in native functions should have self as first argument (#21549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21549
ghimport-source-id: a98fd7a18b4c523d9facb328a3b80a35416834ce

Differential Revision: D15724794

Pulled By: li-roy

fbshipit-source-id: a0f218cf6fd32d9694921685fc805d868156fce3
2019-06-10 19:32:34 -07:00
fff22125a5 AT_CHECK -> TORCH_CHECK (#21547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21547
ghimport-source-id: d99d3fdcd9abde4e1126716d32ed05aaf8508c50

Differential Revision: D15747676

Pulled By: bwasti

fbshipit-source-id: ae9824436e8316e2d0002d2973df4833a18c5f23
2019-06-10 16:58:09 -07:00
f5c24fc66d Deprecate torch::jit::RegisterOperators (#21552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21552

Original commit changeset: a142c22be3fd

https://github.com/pytorch/pytorch/pull/21368 got reverted because of a MSVC issue. This commit re-introduces that change and fixes the MSVC issue.

Differential Revision: D15727526

fbshipit-source-id: 8eb0eb9a7108dc049911b79342c364ac1b8623c8
2019-06-10 16:52:24 -07:00
cab3e726df Split out Function into its own file (#21539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21539
ghimport-source-id: f1e4396a0bec6e30d3179f926ec4da68807942f7

Differential Revision: D15741979

Pulled By: suo

fbshipit-source-id: 4cd0ed36bcbf8db0b36a101dda6f58975f806889
2019-06-10 16:37:58 -07:00
512c9d8c76 add PT gather op to the benchmark suite (#21614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614

as title

Reviewed By: kimishpatel

Differential Revision: D15525115

fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d
2019-06-10 16:31:52 -07:00
32a0440209 Publish torch::Dict and torch::OperatorKernel (#20723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20723

These classes already existed but only as c10::Dict and c10::OperatorKernel.
Since they're now part of torch::RegisterOperators(), they should also live in the torch namespace.

Differential Revision: D15421575

fbshipit-source-id: d64ebd8664fadc264bbbae7eca1faa182529a32b
2019-06-10 16:19:42 -07:00
a93a1ccbb3 Run test_c10d.py in multi-gpu environment (#21598)
Summary:
yf225 helped me discovered that our CI does not run multi-gpu tests in `test_c10d.py`. There are quite a few multi-gpu c10d tests. This PR tries to enable those tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21598

Differential Revision: D15744256

Pulled By: mrshenli

fbshipit-source-id: 0a1524a862946128321f66fc8b7f331eff10e52a
2019-06-10 15:58:38 -07:00
74f6c55f0f support negative axis in concat and split operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955

Differential Revision: D14476031

Pulled By: ezyang

fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7
2019-06-10 15:26:29 -07:00
3889855a5b Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463
ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad

Differential Revision: D15747426

Pulled By: ezyang

fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76
2019-06-10 15:26:25 -07:00
8b9b215dc5 Add a 'dim' argument to nuclear norm (#21022)
Summary:
Addresses #18275.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
2019-06-10 15:18:34 -07:00
2378c120e6 Implements divmod function (#20979)
Summary:
This PR refer to issue #18627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20979

Differential Revision: D15743929

Pulled By: wanchaol

fbshipit-source-id: 967fc3fd519501e427176e10b112c8be1390540b
2019-06-10 15:00:56 -07:00
8a88d33103 Uninitialized Ivalue (#21387)
Summary:
Create an uninitialized ivalue. This will be needed for Breaks & Continues to match up if block outputs of values that are guaranteed not to be used but need to escape the block scope. It is not exposed to users.

Was previously part of final returns but I was asked to make a separate PR for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21387

Differential Revision: D15745124

Pulled By: eellison

fbshipit-source-id: ae6a6f766b4a70a71b9033987a630cfbf044e296
2019-06-10 14:51:24 -07:00
dd0ffd6864 Use schema string specification in derivatives.yaml. (#20916)
Summary:
For consistency, derivatives.yaml now uses the same schema specification as native_functions.yaml.

Note that there are some small downsides, e.g. changing the default values or return parameter names in native_functions.yaml also now requires updating derivatives.yaml as well.  But this has a few nice properties:
1) Able to copy-paste definitions from native_functions to derivatives.
2) Makes it impossible to write derivatives for operators without schemas (e.g. old TH operators).
3) Moves us closer to the ideal situation of co-locating forward and backwards declarations.

Note that this doesn't change any generated code; in particular, this has the same behavior of mapping in-place and out-of-place definitions together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20916

Differential Revision: D15497800

Pulled By: gchanan

fbshipit-source-id: baee5caf56b675ce78dda4aaf6ce6a34575a6432
2019-06-10 13:47:55 -07:00
5f25a252d6 Allow tensors with requires_grad=True in c10 ops (#21599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21599

We prevented this because c10 ops can't have a backwards yet and calling them with requires_grad=True would do the wrong thing
if the c10 op is not purely implemented by calling other autograd-able ops.

However, it is a valid use case to have c10 ops that just call other autograd-aware ops, and these ops should be callable with requires_grad=True.

This should fix https://github.com/pytorch/pytorch/issues/21584.

Differential Revision: D15744692

fbshipit-source-id: ba665365c850ef63fc9c51498fd69afe49e5d7ec
2019-06-10 13:37:06 -07:00
5a48642fde Revert D15717575: [pytorch][PR] Fix bug in multinomial_alias_draw
Differential Revision:
D15717575

Original commit changeset: b1154e226d42

fbshipit-source-id: 305ca010bfda88c9295c52e0626d867452c72f84
2019-06-10 13:28:11 -07:00
4fb302eb34 fix optional type promotion for classes (#21593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21593
ghimport-source-id: f68730618bccf2326218e08d0a2a70171fdd8921

Differential Revision: D15741471

Pulled By: suo

fbshipit-source-id: 7ac1a0f6d9d2ff4bc819caff43a7a5b6d37cbc98
2019-06-10 12:51:00 -07:00
a436822c40 Consider contained types in alias analysis (#21431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21431
ghimport-source-id: d86ce974a065ec572e71cfa14a8f6bdf48216da7

Reviewed By: jamesr66a

Differential Revision: D15718560

Pulled By: suo

fbshipit-source-id: a36ce907ab26be22f12bab6175797fe8b34721f1
2019-06-10 12:42:10 -07:00
bb4aff2680 cleanups to memory_dag (#21430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21430
ghimport-source-id: 2dc5a0df8512e796c12d65d3ecc5981638122ce6

Reviewed By: jamesr66a

Differential Revision: D15718561

Pulled By: suo

fbshipit-source-id: 1ef31c08c8a757b632451eb07a47a8227e76c67f
2019-06-10 12:42:06 -07:00
ae144032aa cleanups to alias analysis interfaces (#21397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21397
ghimport-source-id: 8733e1af2fe66a3f4494a2c24c82a039375a982e

Reviewed By: jamesr66a

Differential Revision: D15642662

Pulled By: suo

fbshipit-source-id: ae66b7b4f19f255d6fe0e7e804bd0df6d86cb8d1
2019-06-10 12:42:02 -07:00
ddac8da813 avoid calling front() on empty working set (#21396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21396
ghimport-source-id: 7e57282099d2fd57c58c990b51ae933e427aecb2

Reviewed By: jamesr66a

Differential Revision: D15642663

Pulled By: suo

fbshipit-source-id: f9b467ba53f03438879bf3929da522aabaff2343
2019-06-10 12:41:58 -07:00
bb1dbdb99b Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes #21257, fixes #21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15717575

Pulled By: ezyang

fbshipit-source-id: b1154e226d426c0d412d360c15f7c64aec95d101
2019-06-10 12:05:48 -07:00
30d6933016 BailOut Graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21381

Differential Revision: D15724412

Pulled By: Krovatkin

fbshipit-source-id: 18e4a1916c7cd1baea76953d0087d6257e58c55b
2019-06-10 11:49:38 -07:00
3df5a46a99 Skip triangular_solve CUDA test on non-default stream
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590

Differential Revision: D15742549

Pulled By: ezyang

fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07
2019-06-10 11:38:42 -07:00
6f99bcda8a fix test (#21594)
Summary:
test that wasn't on the CI, but is tested internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21594

Differential Revision: D15742157

Pulled By: eellison

fbshipit-source-id: 11fc82d1fc0281ffedd674ed96100e0c783c0599
2019-06-10 11:23:18 -07:00
91ea2cd5a7 clip sigmoid to prevent transforms return inf/nan values (#20288)
Summary:
This PR addresses some numerical issues of Sigmoid/StickBreakingTransform, where these transforms give +-inf when the unconstrained values move to +-20 areas.

For example, with
```
t = torch.distributions.SigmoidTransform()
x = torch.tensor(20.)
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current behaviour the inverse will return `inf` and logdet return `-inf` while this PR makes it to `15.9424` and `-15.9424`.

And for
```
t = torch.distributions.StickBreakingTransform()
x = torch.tensor([20., 20.])
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current value is `(inf, nan)` and `-inf` for logdet, while this PR makes it `[16.6355, 71.3942]` and `-47.8272` for logdet.

Although these finite values are wrong and seems unavoidable, it is better than returning `inf` or `nan` in my opinion. This is useful in HMC where despite that the grad will be zero when the unconstrained parameter moves to unstable area (due to clipping), velocity variable will force the parameter move to another area which by chance can move the parameter out of unstable area. But inf/nan can be useful to stop doing inference early. So the changes in this PR might be inappropriate.

I also fix some small issues of `_Simplex` and `_RealVector` constraints where batch shape of the input is not respected when checking validation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20288

Differential Revision: D15742047

Pulled By: ezyang

fbshipit-source-id: b427ed1752c41327abb3957f98d4b289307a7d17
2019-06-10 11:16:31 -07:00
4bdbd30b96 Add python binding to deserialize blob (#21532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21532

Add python binding to deserialize blob

Reviewed By: yinghai

Differential Revision: D15706816

fbshipit-source-id: f498c7e0f7392f055b13810bbf81cba59f25e1d2
2019-06-10 10:49:21 -07:00
e4fae884f6 Change compiler to use Load/Stores, then transform to SSA (#21101)
Summary:
This changes our compiler so it first emits Loads & Stores, and then transforms the graph to SSA in a follow up pass. When a variable is set, we emit a prim::Store, and when a variable is referenced, we emit a prim::Load.
```
a = 1
print(a)
```
becomes:
```
%a.1 : int = prim::Constant[value=1]()
prim::Store[name="a"](%a.1)
%a : int = prim::Load[name="a"]()
prim::Print(%a)
```
In the follow up pass, convertToSSA, the values are turned into SSA form with the Loads & Stores removed. This change will enable breaks and continues because you can transform the graph with the variable naming information still intact.

There are still some remaining jitter and edge cases issues that I have to look through, but I think is still ready for eview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21101

Differential Revision: D15723353

Pulled By: eellison

fbshipit-source-id: 3269934d4bc24ddaf3a87fdd20620b0f954d83d0
2019-06-10 10:26:43 -07:00
1e6c99a6e0 update hub doc (#21568)
Summary:
update doc as pointed out in https://github.com/pytorch/hub/pull/22
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21568

Differential Revision: D15732927

Pulled By: ailzhang

fbshipit-source-id: 78ab026539e5ee59e7c3a8144e2c9fcbbc225733
2019-06-10 09:39:35 -07:00
mal
f308b07e8c Don't leak threads on exit (#21438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21438
ghimport-source-id: 33f145f5b3508163365442c22a223c4a44e677d8

Differential Revision: D15738856

fbshipit-source-id: 656e8d0e3d0d22f116e3ab66bf0282608d6f1a76
2019-06-10 09:14:13 -07:00
c294d64eff fix concat and split tensor inference function (#21382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21382

Concat tensor inference function was not handling correctly the case where axis argument points to the last dimension so input tensors don't need to have the same number of dimensions.
Split tensor inference function was not handling correctly the case where split information is provided as the second input tensor rather than as an argument.

Reviewed By: mdschatz

Differential Revision: D15633148

fbshipit-source-id: d566af44dc882457ee9efe83d2461b28408c2c5d
2019-06-10 08:23:53 -07:00
9deab0cf0e Documentation for locking discipline in engine.cpp/.h (#21548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21548

Added documentation as titled.

Reviewed By: ezyang

Differential Revision: D15723146

fbshipit-source-id: fab4a35c62f07256673318c0874701f7628b2f7a
2019-06-10 07:50:01 -07:00
547fcaa977 Add named_guard to native_functions options (#21373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21373
ghimport-source-id: acab6d3ab0b287d504afa98eaefa2aed6fe99453

Differential Revision: D15717925

Pulled By: zou3519

fbshipit-source-id: 8515c448b368be79f71681833b5edf960da44fe8
2019-06-10 07:29:41 -07:00
8ffcbfb7d4 Add unique_ptr<NamedTensorMeta> field to TensorImpl (#21341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21341
ghimport-source-id: 06021b06864746571a904a1cfc0aaea5f8a12325

Differential Revision: D15717907

Pulled By: zou3519

fbshipit-source-id: 48ee76cf2f11a8b092be75ecac8d5faee68ca0d9
2019-06-10 07:29:36 -07:00
f9c4d0d7a9 Fix NVTX path on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21580

Differential Revision: D15738060

Pulled By: ezyang

fbshipit-source-id: 05a2e97279816753d574678252bf9b35913c99b1
2019-06-10 06:05:44 -07:00
c4e0d61646 Regularization is not supported in FP16 (#21319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21319

Add assertion to raise Exception when Regularization is applied on FP16.

Reviewed By: bddppq

Differential Revision: D15528486

fbshipit-source-id: c887c90d1d9ccfdaded3b5fa16816c6f29910e2e
2019-06-09 23:59:48 -07:00
b1bf16eeab Enabled _th_ixor_ and _th_equal for bool (#21538)
Summary:
Following up on the feedback in this [PR](https://github.com/pytorch/pytorch/pull/21113/files?file-filters%5B%5D=.cwrap&owned-by%5B%5D=)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21538

Differential Revision: D15721390

Pulled By: izdeby

fbshipit-source-id: 1b5265bf8726c1051f306f7674d731e25a6c7d03
2019-06-09 15:28:38 -07:00
e447a733a1 Update module.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21570

Differential Revision: D15732665

Pulled By: ezyang

fbshipit-source-id: caa12a8619ad1396540f787b5c849d29cc5b03bd
2019-06-09 15:28:35 -07:00
04e6564f0c clean up the TracingState API (#21564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21564
ghimport-source-id: b6f71e2238f6f7c8de6cfbf6969a5e08e07be46c

Reviewed By: suo

Differential Revision: D15729497

Pulled By: zdevito

fbshipit-source-id: aacfea6058fadb572df692aa9ebd6cab0bcd03fc
2019-06-09 15:28:32 -07:00
69aa2b2814 Collapse tracing_state.h into tracer.h (#21563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21563
ghimport-source-id: de87e5e621da33326a9d2cb8a57d82d355166479

Reviewed By: suo

Differential Revision: D15729499

Pulled By: zdevito

fbshipit-source-id: 17b3e2e71d004f08c4413e80091388ae9ac2df2b
2019-06-09 15:28:29 -07:00
ea822d9626 Interpreter support for CallFunction/CallMethod (#21562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21562
ghimport-source-id: 17e5e183f730f50d97ef48973aafc6249d54978f

Reviewed By: suo

Differential Revision: D15729500

Pulled By: zdevito

fbshipit-source-id: efa8a133b617b1498810392a8da6b513ce00b5eb
2019-06-09 15:28:26 -07:00
ad0c08f950 Expose ExecutionPlan in prep for function calls (#21561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21561
ghimport-source-id: 4bf28d8140610a0cefef0c0a17f0a513ae855dde

Reviewed By: suo

Differential Revision: D15729498

Pulled By: zdevito

fbshipit-source-id: b26458336da1efaba71d8a577c3917c6622dae0d
2019-06-09 15:28:22 -07:00
de31f6719c Add flag to temporarily enable first class modules (#21560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21560
ghimport-source-id: a555ca33fcd3efd1147aaf90f26a8e63da1c1a67

Reviewed By: suo

Differential Revision: D15729502

Pulled By: zdevito

fbshipit-source-id: d6c11472bfc791e2ad1e9aa695b0439d72b79681
2019-06-09 15:28:19 -07:00
18996a8952 unfinished push/pop reduction (#21559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21559
ghimport-source-id: 81ba4a5638577781e1ea706599966c033c37e814

Reviewed By: suo

Differential Revision: D15729501

Pulled By: zdevito

fbshipit-source-id: 3423bff61e89617c40078d5fab726b77d21bfa27
2019-06-09 15:28:16 -07:00
13edda417d Prepare interpreter for function calling (#21558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21558
ghimport-source-id: a8a19dbefea869ca1401e5afea6c02f31f95b99a

Reviewed By: suo

Differential Revision: D15729491

Pulled By: zdevito

fbshipit-source-id: 9629664608a2379a2ddcafaf741fa8463c4fb917
2019-06-09 15:28:13 -07:00
8ae7b1c486 Update functional.py doc (#21510)
Summary:
- Fixes a typo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21510

Differential Revision: D15731277

Pulled By: ezyang

fbshipit-source-id: c3f8e110f5c61e797b857477b495168ea8d63cd5
2019-06-09 15:28:09 -07:00
74828be4a7 fix segfault in cat on CPU with tensors that can't be indexed with 32-bit ints. (#21530)
Summary:
Should be self-explanatory. This `int` variable is overflowing.

Reported in #21526
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21530

Differential Revision: D15719275

Pulled By: umanwizard

fbshipit-source-id: 24e917a00a5b78bc3af29ef3b8b72eea7e89d5d5
2019-06-09 15:28:06 -07:00
406374657a Optimize batch mm op when broadcast the second input (#21556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21556

Optimize batch mm op when broadcast the second input

Reviewed By: houseroad

Differential Revision: D15728914

fbshipit-source-id: c60441d69d4997dd32a3566780496c7ccda5e67a
2019-06-09 15:28:03 -07:00
d71501259b Revert D15572818: Prepare interpreter for function calling
Differential Revision:
D15572818

Original commit changeset: 3a9b5f053664

fbshipit-source-id: b932411e8e88c7414c8db332d6049fe4e26bd83e
2019-06-07 22:20:54 -07:00
d4bcab0dba Revert D15590900: Reduce number of stack manipulation instructions in interpreter.
Differential Revision:
D15590900

Original commit changeset: 98829979feba

fbshipit-source-id: eb7f1d396bb2b98d2852af81c69db81430eba33c
2019-06-07 22:20:50 -07:00
03641413e5 Revert D15600068: Add flag to temporarily enable first class modules
Differential Revision:
D15600068

Original commit changeset: 9b68e23d7f8b

fbshipit-source-id: 45f36b3aaa4f1c457c27490579496456cbbc680b
2019-06-07 22:20:47 -07:00
e616a5e8b8 Revert D15600067: Expose ExecutionPlan in prep for function calls
Differential Revision:
D15600067

Original commit changeset: 82b7de458dd6

fbshipit-source-id: ca26a362cd73bdb9e8c4eba15dd5c10986fa79fe
2019-06-07 22:20:44 -07:00
bfb235b8c9 Revert D15618275: Interpreter support for CallFunction/CallMethod
Differential Revision:
D15618275

Original commit changeset: 038ae27e5416

fbshipit-source-id: 8dbe0f564ba103fe445dacc471085c659171705f
2019-06-07 22:20:40 -07:00
c27cabe2d7 Revert D15719982: Collapse tracing_state.h into tracer.h
Differential Revision:
D15719982

Original commit changeset: 56bb021dd949

fbshipit-source-id: 2eb3e2c9745c35a84ebcc0fc7ac62b5f1fdd6437
2019-06-07 22:20:37 -07:00
3cfe914191 Revert D15719980: clean up the TracingState API
Differential Revision:
D15719980

Original commit changeset: 3de2746c3f3c

fbshipit-source-id: 4610e215936b2476a0271355ef3b8f1f480bdea8
2019-06-07 22:20:34 -07:00
dd0faf4366 clean up the TracingState API (#21514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21514
ghimport-source-id: 6a9b6fdd7e696ea29e8715482708efe897230e4d

Reviewed By: jamesr66a

Differential Revision: D15719980

Pulled By: zdevito

fbshipit-source-id: 3de2746c3f3c3de4111b4cb73f4c4acedbf28862
2019-06-07 20:57:05 -07:00
8c5f3acfc0 Collapse tracing_state.h into tracer.h (#21513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21513
ghimport-source-id: 86278929818a8fc65684bd8f2ffac31460772fe9

Reviewed By: jamesr66a

Differential Revision: D15719982

Pulled By: zdevito

fbshipit-source-id: 56bb021dd949668562ea481c5ff0115a9ea2b02e
2019-06-07 20:57:01 -07:00
5f6afafdef Interpreter support for CallFunction/CallMethod (#21325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21325
ghimport-source-id: eeca1176f5e00c85a69cd016acccf5105e670e02

Reviewed By: jamesr66a

Differential Revision: D15618275

Pulled By: zdevito

fbshipit-source-id: 038ae27e5416f1ce338009627c839a4d61a00658
2019-06-07 20:56:58 -07:00
1517ff66a1 Expose ExecutionPlan in prep for function calls (#21273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21273
ghimport-source-id: b92c1e07fbe4122467a21b98d29635295093e0c2

Reviewed By: jamesr66a

Differential Revision: D15600067

Pulled By: zdevito

fbshipit-source-id: 82b7de458dd65c175f55b0f383bfc3fcf4704032
2019-06-07 20:56:55 -07:00
7e08bc42d5 Add flag to temporarily enable first class modules (#21272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21272
ghimport-source-id: 43e73d1b93ccbe0dd6845eb3f7444c9d0abd444b

Reviewed By: jamesr66a

Differential Revision: D15600068

Pulled By: zdevito

fbshipit-source-id: 9b68e23d7f8b6046a5a0d6d9fd16138ac384b863
2019-06-07 20:56:52 -07:00
dde27958dd Reduce number of stack manipulation instructions in interpreter. (#21240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21240
ghimport-source-id: 5e9cbe8b3df3ac721135d2f652a420ae0b14ac55

Reviewed By: jamesr66a

Differential Revision: D15590900

Pulled By: zdevito

fbshipit-source-id: 98829979feba23685f0ba98ba3cb840157f7259a
2019-06-07 20:56:49 -07:00
c53e4d012d Prepare interpreter for function calling (#21185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21185
ghimport-source-id: 6b9cb92d1f1f59bb980dcfa0d29dfe985ee955d1

Reviewed By: jamesr66a

Differential Revision: D15572818

Pulled By: zdevito

fbshipit-source-id: 3a9b5f053664c09212b97f1391d8d006337b5550
2019-06-07 20:56:46 -07:00
c36dc35853 Revert D15576968: Turn on Werror for deprecated-declarations.
Differential Revision:
D15576968

Original commit changeset: fb73a8986a5b

fbshipit-source-id: 1ae19afc6816f764b895a47162728433a319ac0b
2019-06-07 19:15:56 -07:00
b849f101b1 Fix slow unpickling (#21542)
Summary:
This was looking at the number of elements in the memo table, not the total capacity, and was thus calling reserve() a lot more than it should have
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21542

Reviewed By: driazati

Differential Revision: D15723132

Pulled By: jamesr66a

fbshipit-source-id: 20e1f9099b6a51a33994ea9dbc3f22eb3bc0c8f9
2019-06-07 17:28:55 -07:00
66d596645a Turn on Werror for deprecated-declarations. (#21195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21195

The motivation is that, while we shouldn't break USER code for using
deprecated declarations, we should keep our internal code base
deprecation clean.

Differential Revision: D15576968

fbshipit-source-id: fb73a8986a5b60bf49ee18260653100319bb1030
2019-06-07 17:24:17 -07:00
a5cf6d5100 reorganize op bench directory (#21543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543

No code change in this diff.

Reviewed By: hl475

Differential Revision: D15721419

fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667
2019-06-07 16:06:51 -07:00
5b4a188a95 add support for steps(strides) in tensor slices
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20929

Differential Revision: D15632636

Pulled By: Krovatkin

fbshipit-source-id: 0e127bbd7b339784c4be2e0a57f28024727d5ad3
2019-06-07 15:55:26 -07:00
5744fb3007 Add mkldnn softmax operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21516

Differential Revision: D15712759

Pulled By: bddppq

fbshipit-source-id: bf515135263156bea1a2b3e53a47edf697b8b1e2
2019-06-07 15:22:18 -07:00
a947d98282 Set "scalar_check: false" for some LAPACK functions that can't return scalars. (#21498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21498
ghimport-source-id: 33ce2f3f083616f633561e4871585f439e2647c0

Differential Revision: D15715477

Pulled By: gchanan

fbshipit-source-id: 7772573ba74cdf7a5f2d86d2e581652ebd85e1c6
2019-06-07 15:00:54 -07:00
fe5ceea580 Rename caffe2<->c10 operator wrappers (#21322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322

Naming is everything.

- Rename c10_operator.h -> export_caffe2_op_to_c10.h
- Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h
- Rename corresponding macros

This hugely improves readability and explains what these things are doing.

Reviewed By: dzhulgakov

Differential Revision: D15616816

fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603
2019-06-07 13:48:10 -07:00
dad85b7e69 clang-format by line (#21531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21531
ghimport-source-id: 711867e19cc3948a5e2a6aa8c4f2cd631abb04d2

Reviewed By: zdevito

Differential Revision: D15719260

Pulled By: suo

fbshipit-source-id: e88c5d3e14e6ecc956ce30ab0246ed606f4b0a38
2019-06-07 13:42:44 -07:00
fa0c5c31d4 Turn namedtensor build back on (#21520)
Summary:
namedtensor build + test should run on PRs only if the commit message
includes [namedtensor ci].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21520

Differential Revision: D15718404

Pulled By: zou3519

fbshipit-source-id: ce8b5df2682e795e64958a9d49e2e3c091599b33
2019-06-07 13:37:48 -07:00
2b902e9738 Fix the offset numerical bug when casting (#21484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484

cast<int32_t*> => cast<int32_t>

Also fixed reserve problem which might cause incorrect pointer.

Reviewed By: yinghai

Differential Revision: D15699866

fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990
2019-06-07 12:33:18 -07:00
ac8c3fa7b6 Revert D15717337: [pytorch][PR] [precommit hook] clang-format by line
Differential Revision:
D15717337

Original commit changeset: 57e65a679a8f

fbshipit-source-id: f73794087a23d56d03497b29d9a9e4e7d54deaad
2019-06-07 11:50:42 -07:00
a77802cf56 clang-format by line (#15657)
Summary:
This should further reduce noise by only clang-formatting the lines you actually touched in the precommit hook.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15657

Differential Revision: D15717337

Pulled By: suo

fbshipit-source-id: 57e65a679a8fdee5c3ff28e241c74ced9398eb0c
2019-06-07 11:46:36 -07:00
b7f5d1e4c6 Fix size of histc return on CPU when input is 0-dimensional and bins=1. (#21497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21497
ghimport-source-id: bc03f27408aa772f78d5351afe404b5e91a7c4ce

Differential Revision: D15715478

Pulled By: gchanan

fbshipit-source-id: 90e1b65249b4b12f936ee8877cc0bc5a972d9ceb
2019-06-07 11:23:46 -07:00
881adb5bcd fix tuple indexing bug (#21521)
Summary:
lower tuples pass didn't check bounds for tuple index
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21521

Differential Revision: D15716813

Pulled By: eellison

fbshipit-source-id: 8eead98c2c63118e7d24a8c8bf6184b02afb7dcd
2019-06-07 11:17:59 -07:00
a5cca4d342 add failback for Sign operator (#21343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343

Needed to binarise features

Reviewed By: yinghai

Differential Revision: D15625653

fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab
2019-06-07 10:56:22 -07:00
51fb42ebcf Updating submodules
Reviewed By: yns88

fbshipit-source-id: 5778cdb5173fc16e5d5474fefa2ea89264101184
2019-06-07 10:43:12 -07:00
54413cf91e replace LegacyTracedModule with torchscript used in add_graph (#21339)
Summary:
The new implementation of tracing supports more module. So many error-handling code can be removed by placing the old one (LegacyTracedModule).

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21339

Reviewed By: natalialunova

Differential Revision: D15695154

Pulled By: orionr

fbshipit-source-id: af7d35754e9f34bd1a0ad7b72a9ebe276ff8ab98
2019-06-07 10:43:08 -07:00
b144ba66d5 Change PyTorch tests to use non-default CUDA stream (#21474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474
ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3

Differential Revision: D15699700

Pulled By: fbhuba

fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97
2019-06-07 10:24:48 -07:00
5cc3a3e2bf Set "scalar_check: false" for TH methods that can't return scalars. (#21496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21496
ghimport-source-id: d7197bccfe9e4d807f38a66e02ca6f0bf32bdc2b

Differential Revision: D15715479

Pulled By: gchanan

fbshipit-source-id: fa59eb808d26119b33eb97bb90ef70e95e58458d
2019-06-07 10:19:23 -07:00
cc85c3dbbc ONNX Export Slice and Flip ops for opset 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20533

Reviewed By: zrphercule

Differential Revision: D15579713

Pulled By: houseroad

fbshipit-source-id: 91f3ac0cb14ef226f980362b0013b6b92cb8b8da
2019-06-07 10:03:26 -07:00
3eced796cd Make torchvision install chatty. (#21476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21476
ghimport-source-id: adfd08b818f31ebbdf3da89d6bb95d33e14a9403

Differential Revision: D15715270

Pulled By: ezyang

fbshipit-source-id: dde02579d9960ac960306d0a024b8e17846ae0ff
2019-06-07 08:41:13 -07:00
1503c734ce updating gemmlowp tp 3fb5c
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21488

Differential Revision: D15715264

Pulled By: ezyang

fbshipit-source-id: 86978f294720e0ce6f60b748a71f0604d6cfa00c
2019-06-07 08:32:49 -07:00
8c9a88bdab Make test_cuda.py work on Python 2. (#21466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466
ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91

Differential Revision: D15698096

Pulled By: ezyang

fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a
2019-06-07 08:13:03 -07:00
c60465873c Fix batch norm multiplier init (#13774)
Summary:
Fixes #12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13774

Differential Revision: D15715021

Pulled By: ezyang

fbshipit-source-id: 20ce2beee1b39ebe9f023c5f2b25be53acccb5f3
2019-06-07 07:50:39 -07:00
c604847602 Implement at::match(Dimname, Dimname) and at::unify(Dimname, Dimname) (#21281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21281
ghimport-source-id: 4b241d54a60c188b8566065c90b227b40914a5ca

Differential Revision: D15699063

Pulled By: zou3519

fbshipit-source-id: c0f00c370d266a4ea5211aae943041fd899e960a
2019-06-07 06:30:45 -07:00
4727685ea1 Added at::Dimname (#21280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21280
ghimport-source-id: 921848326e4828ffd422868be26c409c6490e1ab

Differential Revision: D15698516

Pulled By: zou3519

fbshipit-source-id: 502b9b019d51dd46327e6caf2af69aa520c70cb6
2019-06-07 06:30:42 -07:00
e27c2f1437 Revert "Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen" (#21427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21427
ghimport-source-id: 930c2fb29320f70e78f94e7eaaffe8e2ab62e7f3

Differential Revision: D15698423

Pulled By: ezyang

fbshipit-source-id: 891c94c24b6d377cd6dd94d86cc66465b582359f
2019-06-07 05:52:27 -07:00
d6af6588c2 Super resolution export to Caffe2 is broken, skip it. (#21479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21479
ghimport-source-id: 60fa97fb2dfb37a758c0e8b9c2bc0fb2819fd2f7

Differential Revision: D15713609

Pulled By: ezyang

fbshipit-source-id: a3d9c49e2db985f4373508cd44e94d43ae6e24da
2019-06-07 05:46:26 -07:00
78a376592d add cancelAsyncCallback method to OperatorBase (#21492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492

If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks.

The new behavior is to cancel the callbacks first, then set event status to finished.

Reviewed By: ilia-cher

Differential Revision: D15702475

fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca
2019-06-06 20:57:12 -07:00
696b2c89b4 Adding gradient to Boolean Mask operator (#21423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21423

- add gradient for boolean mask
- add test for gradient checking

Reviewed By: BIT-silence

Differential Revision: D15640036

fbshipit-source-id: 79f40c6901e805bf1b8e9b01b57903e30b00f654
2019-06-06 20:48:47 -07:00
d3d195e0b1 Updating submodules
Reviewed By: yns88

fbshipit-source-id: af5812e3d071e66f9d0272c36bf639eb04bde7e4
2019-06-06 20:42:34 -07:00
772fd79d40 Defer constructing error strings for definitions under If's until they're needed. (#21429)
Summary:
This saves ~7% DenseNet load time (4.3s -> 4.0s) on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21429

Differential Revision: D15681374

fbshipit-source-id: 9925a6154d51f2d592e26cb5ff8bf7ab3ee2519b
2019-06-06 20:32:57 -07:00
abc0d3e544 Fix unused variable warning
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21444

Differential Revision: D15701786

Pulled By: ezyang

fbshipit-source-id: 8348e08f9b8f3047b30736f9a944786ab84e6b68
2019-06-06 19:37:54 -07:00
bad67015fe Add warning for Turing GPUs and CUDA <= 9000 (#21468)
Summary:
Turing GPUs (compute capability 7.5) require CUDA10 to work properly.
We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older:
[Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575)
[Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6)

Tested on using CUDA9 with an RTX 2080Ti.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21468

Differential Revision: D15696170

Pulled By: ezyang

fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a
2019-06-06 19:33:02 -07:00
63d4bbb0ec Turn XLA back on for default set (but filtered out using should_run_job) (#21470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21470
ghimport-source-id: 69800c1ce1187591b7bcdb8a63973b4fd8d0e326

Differential Revision: D15696930

Pulled By: ezyang

fbshipit-source-id: fafbcba38d9572a23ee9c1d81cdcce3a154ae4c6
2019-06-06 19:18:45 -07:00
f433913996 add more info back to BenchResult (#21502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502

In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration.

Reviewed By: wanchaol

Differential Revision: D15706306

fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d
2019-06-06 18:43:51 -07:00
d51bd2191c Revert D15629687: Deprecate torch::jit::RegisterOperators
Differential Revision:
D15629687

Original commit changeset: 2f87f18be655

fbshipit-source-id: a142c22be3fdf14a2b3c29b8766b218fb0883927
2019-06-06 18:09:01 -07:00
37ab35c8fc Move jit testing utils to their own file (#21491)
Summary:
This moves `JitTestCase` to its own file so that we can have other jit
test files (ex. `test_jit_py3.py`)

There aren't any code changes, just a move and cleaning up the imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21491

Pulled By: driazati

Differential Revision: D15703060

fbshipit-source-id: 6082e8b482100bb7b0cd9ae69738f1273e626171
2019-06-06 15:52:45 -07:00
e87f77def6 Fix typo (#21426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21426

-

Differential Revision: D15679789

fbshipit-source-id: 5fd448e66af159fd79883aa874065424ec9694ad
2019-06-06 15:44:16 -07:00
d714abf597 Deprecate torch::jit::RegisterOperators (#21368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21368

-

Differential Revision: D15629687

fbshipit-source-id: 2f87f18be65552f3eb3f4c945d7f19ba4bae0eb8
2019-06-06 15:44:12 -07:00
cb2ec07fa2 ReshapeOp supports empty tensor (#21230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21230

tsia; we support empty tensor with this diff for reshape operator

Reviewed By: jerryzh168

Differential Revision: D15583356

fbshipit-source-id: 6d44c04e95ca3546509bfb12102e29c878f9a7c7
2019-06-06 15:02:11 -07:00
b161832f10 support ceil mode by padding changes (#21310)
Summary:
Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion https://github.com/pytorch/pytorch/pull/19205#discussion_r276903751).

To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size.

Strengthened the test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21310

Reviewed By: bddppq

Differential Revision: D15611664

Pulled By: akyrola

fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20
2019-06-06 14:47:35 -07:00
80a083ef92 Remove unneeded headers (#21393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21393

Result of splitting the base diff. We moved a header from src/* to include/fbgemm/*

Reviewed By: jianyuh

Differential Revision: D15635188

fbshipit-source-id: ad7d0ddba964ff1cb8b2e33f5f98e457a4d2eac9
2019-06-06 14:23:54 -07:00
8a9ea55b25 Add autograd for to_sparse. (#20458)
Summary:
https://github.com/pytorch/pytorch/issues/18111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20458

Differential Revision: D15699732

Pulled By: nairbv

fbshipit-source-id: f7a5424c1f1d3b0e4eba0d503d75ae8a18ef7ff4
2019-06-06 14:23:51 -07:00
87d10d49f4 Bilinear Upsampling increased throughput (#19306)
Summary:
changed `UpsampleBilinearKernel` s.t. the throughput increased 40~50%.

I tested locally with my local test code -- **not pytorch's provided test code** -- because I am having a build problem ( which I made an issue about [here](https://github.com/pytorch/pytorch/issues/19184)). I tested with various tensor sizes and across all the sizes, it should a significant increase in throughput.

1. added `__restrict__`
2. instead of launch as many threads as there are output elements, I launched only `output_height * output_width` may threads and had each thread iterate through the channel and batch dimension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19306

Differential Revision: D15701840

Pulled By: ezyang

fbshipit-source-id: 53c54d4f4e4a28b58ecc7d7ae6b864cbfc760e27
2019-06-06 13:58:57 -07:00
c5d5d45f40 Fix numerically instability of SigmoidTransform (#19802)
Summary:
fix #18254 for numerically instability of `SigmoidTransform`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19802

Differential Revision: D15701837

Pulled By: ezyang

fbshipit-source-id: fe6c755c523487c8bbdcc3bfb8455801617c70a4
2019-06-06 13:58:53 -07:00
f8cab38578 Address precision matrix instability of MVN distribution (#21366)
Summary:
Currently, when the input of MVN is precision matrix, we take inverse to convert the result to covariance matrix. This, however, will easily make the covariance matrix not positive definite, hence will trigger a cholesky error.

For example,
```
import torch
torch.manual_seed(0)
x = torch.randn(10)
P = torch.exp(-(x - x.unsqueeze(-1)) ** 2)
torch.distributions.MultivariateNormal(loc=torch.ones(10), precision_matrix=P)
```
will trigger `RuntimeError: cholesky_cpu: U(8,8) is zero, singular U.`

This PR uses some math tricks ([ref](https://nbviewer.jupyter.org/gist/fehiepsi/5ef8e09e61604f10607380467eb82006#Precision-to-scale_tril)) to only take inverse of a triangular matrix, hence increase the stability.

cc fritzo, neerajprad , SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21366

Differential Revision: D15696972

Pulled By: ezyang

fbshipit-source-id: cec13f7dfdbd06dee94b8bed8ff0b3e720c7a188
2019-06-06 13:54:46 -07:00
vfn
8ece538a79 Addresses bad behavior with overridden optimizer.step by #20124 (#21460)
Summary:
This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276
and previously coded bad behaviour:
- a warning was raised all the times when lr schedulling is initialized

Now the code checks that:
- on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 )
- if optimizer's step is overridden -> raise once another warning to aware user about the new pattern:
`opt.step()` -> `lrs.step()` as we can not check this .

Now tests check that
- at initialization (`lrs = StepLR(...)`)there is no warnings
- if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised.

cc ezyang

PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460

Differential Revision: D15701776

Pulled By: ezyang

fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7
2019-06-06 13:54:42 -07:00
51d0da2802 Improve build docs and process for Windows (#21190)
Summary:
Fixes #21026.
1. Improve build docs for Windows
2. Change `BUILD_SHARED_LIBS=ON` for Caffe2 local builds
3. Change to out-source builds for LibTorch and Caffe2 (transferred to #21452)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21190

Differential Revision: D15695223

Pulled By: ezyang

fbshipit-source-id: 0ad69d7553a40fe627582c8e0dcf655f6f63bfdf
2019-06-06 13:46:52 -07:00
6fc702f384 Per-callback sampling (#21394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21394
ghimport-source-id: 2607c7b456031a1ddb19fabc3b6fe2585c276d76

Differential Revision: D15639723

Pulled By: ilia-cher

fbshipit-source-id: 938d02c1daf5bec5afa5d3cd021d2dae7e7160ce
2019-06-06 13:46:48 -07:00
bb788631ce Fix caffe2 windows CI for new Windows AMI (#21452)
Summary:
The alternative of #21410.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21452

Differential Revision: D15701767

Pulled By: ezyang

fbshipit-source-id: e65c1d6bfcc98e88460f4a57e5b99c2f395c0ceb
2019-06-06 13:46:45 -07:00
3feb40d602 pack_padded_sequence: Check for empty (zero-element) tensors (#21461)
Summary:
Fixes: #20529

Thank you, JamieCT for the bug report with reproducing script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21461

Differential Revision: D15696183

Pulled By: ezyang

fbshipit-source-id: a93cde2c924f8447563c64ce8a1cf75fcee60a01
2019-06-06 13:41:52 -07:00
3b6362d98e Remove NodeExecStats and AllocatorMemoryUsed (#21419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21419

Removed ```node_stats``` and unused imports

Reviewed By: orionr

Differential Revision: D15672824

fbshipit-source-id: 6167c80c081d925f02a1d279f3af0e1b8de66752
2019-06-06 13:35:52 -07:00
0a3fb45d3d allow passing Python built-in types as dtypes (#21215)
Summary:
Another simple bit of syntax that NumPy supports and we don't.

Support int, float, and bool.

```python
>>> torch.randn((2,3), dtype=float)
tensor([[-0.1752, -0.3240, -0.6148],
        [ 0.1861,  1.6472,  0.1687]], dtype=torch.float64)
```

A bit confusingly, Python's "float" actually means double, but nothing we can do about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215

Differential Revision: D15697012

Pulled By: umanwizard

fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246
2019-06-06 13:17:23 -07:00
b647804a55 Fix embedding bag nan output when input is empty (#21400)
Summary:
```
import torch

Embed = torch.nn.EmbeddingBag(100, 10, sparse=True)

print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0])))
print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0, 0])))
```

Before this fix:
```
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```

After this fix:
```
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21400

Differential Revision: D15643357

Pulled By: bddppq

fbshipit-source-id: 119eba38129dc0a3757c331304a18044714fcca5
2019-06-06 13:03:17 -07:00
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
8a2985eb05 Support recursive ModuleList / Sequential (#21306)
Summary:
Adds support for recursively compiling `nn.Sequential` and
`nn.ModuleList`. When either is used, it is converted to a
`jit._ConstModuleList` or `jit._ConstSequential` as necessary. Due to
this, we don't need to add it to `__constants__` since it's made
constant on demand.

This PR also moves the recursive script tests out to their own class
`TestRecursiveScript` (the added test is called `test_iterable_modules`)
](https://our.intern.facebook.com/intern/diff/15611738/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21306

Pulled By: driazati

Differential Revision: D15611738

fbshipit-source-id: fac52993990bd2dfad71d044c463a58a3759932a
2019-06-06 12:23:59 -07:00
2e37ab85af Enable bool support for several index methods (#21435)
Summary:
Enable bool tensors for these index methods:
- index_select
- index_copy
- put
- take
- index_fill

Tested via unit tests

TODO:
Enable index_add in a separate PR as it requires more "side" changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21435

Differential Revision: D15684964

Pulled By: izdeby

fbshipit-source-id: 48440e4d44873d70c4577e017dd0d8977e0fa15a
2019-06-06 12:14:01 -07:00
61cc03fb8d Make ScriptModule.training an attribute instead of a parameter (#21078)
Summary:
Redo of #19587
](https://our.intern.facebook.com/intern/diff/15560540/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21078

Pulled By: driazati

Differential Revision: D15560540

fbshipit-source-id: f415775d87c163f93b3bbdd5f87c9ff73f58b049
2019-06-06 12:06:49 -07:00
f1adddd1c6 Updated sum() logic to properly deal with bool tensor (#21421)
Summary:
`torch.tensor([True, False, True], dtype=torch.bool).sum()` should return **2** instead of **True** as it does now.

Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421

Differential Revision: D15674203

Pulled By: izdeby

fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74
2019-06-06 12:02:23 -07:00
b7b6b612a7 Fix C++ data parallel (#20910)
Summary:
Fixes #19540

CC nmerrill67

C++ data parallel was using Module.clone() to create module replicas on every destination device. However, clone() does not set up gradient edges to point from replicas to the original module. As a result, the gradient will not be aggregated into the original module. This commit fixes the the problem by manually setting gradient edges from every parameter X in every replica to the same parameter X in the original module.

## Failed Attempt

Initially I tried implementing what we did in `replicate.py`, which
1. create module replicas
2. use Python `Broadcast` autograd function to broadcast every parameter in the original module to all destination devices.
3. assign the broadcast result params to module replicas' `_parameters` dict.

This works in Python because derived module member field params (e.g., `Linear.weight`) and base module `_parameters` (e.g., `Linear._parameters['weight']`) are referencing the same parameter instance. Assigning one of them will apply to both. However, in C++, even though I can modify Module's `parameters_ `values and gradient edges to point to the broadcast source,  I cannot touch the weight and bias member fields in Linear, because replicate cannot (and should not) add special-case handlers to every different module. (See `Linear` [.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/modules/linear.h), [.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/modules/linear.cpp)) Although they initially point to the same `TensorImpl` instance, after assigning to `Module.parameters_['weight']`, it will be different from `Linear.weight`.

## Solution Options

gchanan and I had several discussions on this issue and figured two solutions to this problem.

### Option One [implemented in this PR]

Replicate the module in two steps:
1. call `Module.clone()` to create a module replica on every destination device.
2. manually setting gradient edges from every parameter in every replica to the same parameter in the original module.

* Pro: Does not need to change any existing module, and relatively easier to implement
* Con: It is a little hackish.

### Options Two

Implement a `Replicatable` class (similar to `Cloneable`), and make it a friend class of `Module`. For more details see `Note [Replicating Modules]` in the code change.

* Pro: Maybe this aligns more with our existing approach implemented in `Cloneable`?
* Con: Require changes to every existing module.

I am inclined to go with option one, because `replicate` will only be used on data parallel. I feel it is too big an overkill if we have to change all existing module implementations due to a data parallel requirement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20910

Differential Revision: D15556426

Pulled By: mrshenli

fbshipit-source-id: aa836290ec657b32742e2bea80bd0ac2404ef3b0
2019-06-06 11:57:31 -07:00
da4f3629c5 Add missing shebangs to Python files with executable permissions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21305

Differential Revision: D15613078

Pulled By: ezyang

fbshipit-source-id: 1fedf4368d65db406b617a51402ee8a20968aff7
2019-06-06 10:53:40 -07:00
52596164d4 Fix 32-bit env. model load issue (#20900)
Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20900

Differential Revision: D15696709

Pulled By: ezyang

fbshipit-source-id: 37a81f05f235d3b9fc6244e12d3320ced3d1465e
2019-06-06 10:30:29 -07:00
f891b4338a Test the exceptions raised by isfinite and isinf (#21168)
Summary:
Following up ef1fdc27a3779586efad631d698cec2d6d19503f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168

Differential Revision: D15696615

Pulled By: ezyang

fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2
2019-06-06 10:30:26 -07:00
dffff3218b Improves NVRTC Error messages (#21174)
Summary:
Current versions of NVRTC incorrectly map error code 7 to the error string "NVRTC unknown error." This update maps error code 7 to the correct string explicitly in PyTorch. See the documentation at: https://docs.nvidia.com/cuda/nvrtc/index.html#group__error.

This may give us a better idea of the source of NVRTC errors that some community members, like Uber, have reported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21174

Differential Revision: D15696593

Pulled By: ezyang

fbshipit-source-id: f5c7b5876c07b311ab5f2d7c8e375e93273912c6
2019-06-06 10:30:22 -07:00
6615797837 Add derivative for QR decomposition (#21274)
Summary:
Changelog:
- Implement derivative for QR decomposition for tall and square matrices i.e., num rows >= num cols
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21274

Differential Revision: D15696506

Pulled By: ezyang

fbshipit-source-id: 1c77bb8369818112c84139360f6e2650f92bf2fd
2019-06-06 10:11:21 -07:00
26bcadcc61 Gumbel-Softmax Arxiv Docs Link Fix (#21376)
Summary:
Links separated #20297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21376

Differential Revision: D15696413

Pulled By: ezyang

fbshipit-source-id: 513bd430e41c109aa2d0fbaa9a242acb2a12059b
2019-06-06 10:11:18 -07:00
ee15ad1bd6 "CharTensor" numpy conversion is supported now (#21458)
Summary:
Fixed #21269 by removed the the expected `ValueError` when converting a tensor to a Numpy `int8` array in the Numba interoperability test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21458

Differential Revision: D15696363

Pulled By: ezyang

fbshipit-source-id: f4ee9910173aab0b90a757e75c35925b026d1cc4
2019-06-06 10:06:41 -07:00
c8083e0292 Include named_any.h in modules.h (#21437)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21437

Differential Revision: D15684880

Pulled By: yf225

fbshipit-source-id: db23c7e4e0f62d22b0b6c18f15420c3bb66af366
2019-06-06 09:57:33 -07:00
856e3518c5 Parallelize eye() on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21077

Differential Revision: D15695329

Pulled By: ezyang

fbshipit-source-id: 9841777238dac7c08cde2db3cd9401853f633af3
2019-06-06 09:52:13 -07:00
ae18f8e761 Fix latex formular error about *normal (#21000)
Summary:
issue:
 https://github.com/pytorch/pytorch/issues/20903
the latex abort norm should be `\mathcal{N}(\text{mean}, \text{std}^2)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21000

Differential Revision: D15695335

Pulled By: ezyang

fbshipit-source-id: 34dcca0acb20c297f876287e081cd44d11a3e516
2019-06-06 08:47:42 -07:00
4e02d3c0a1 insert default parameters in binary cross entropy with logits (#21336)
Summary:
I inserted default weight and reduction params in binary_cross_entropy_with_logits function . These default params exist in python and binary_cross_entropy function in cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21336

Differential Revision: D15628917

Pulled By: ezyang

fbshipit-source-id: 38e5f53851125238842df1bd71cb6149c8603be1
2019-06-06 08:47:39 -07:00
d50dca4075 fix two typos: "a the" => "the"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20437

Differential Revision: D15321243

Pulled By: zou3519

fbshipit-source-id: 6e1690132769b8ef2fd679cb5898c378efac2112
2019-06-06 08:42:57 -07:00
63a55d4932 Support gather export with OneHot + Mul (#21235)
Summary:
This could serve as a alternative solution to export ```torch.gather``` before something similar goes into ONNX spec. The exported model is verified to be correct against onnxruntime backend. We weren't able to test against Caffe2 backend because it doesn't seem to support OneHot opset9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21235

Differential Revision: D15613039

Pulled By: houseroad

fbshipit-source-id: 7fc097f85235c071474730233ede7d83074c347f
2019-06-06 08:35:28 -07:00
240d62fbaa Move redundant code that checks NumPy during build to a helper module and add an option to disable building with NumPy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21417

Reviewed By: ezyang

Differential Revision: D15694357

Pulled By: fmassa

fbshipit-source-id: bc1bda23349ba4531f19619fa4adecb846225c20
2019-06-06 08:15:19 -07:00
a68d2e817b Kill apt-get even harder, and before we purge. (#21464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21464
ghimport-source-id: 81beb6ef39e3d412e755f0ae06c9186d8e11a8bc

Differential Revision: D15694828

Pulled By: ezyang

fbshipit-source-id: 0791fe017a1318425528795f576fb96e54b14dae
2019-06-06 07:49:39 -07:00
12528990f8 change output of ai_pep_format (#21440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440

This diff modifies the output format when ai_pep_format is enabled.

Reviewed By: hl475

Differential Revision: D15681042

fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e
2019-06-05 21:54:24 -07:00
4e679e30a8 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 060bf204b6400515bbc8f1b9b3ef34bef9d32560
2019-06-05 20:06:53 -07:00
7e300fbb21 Added degrees, radians, ldexp (#21131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21131
ghimport-source-id: 62b9cb71a17f9c9a7999a6e33c2d8b840ce097ff

Differential Revision: D15563184

Pulled By: Chillee

fbshipit-source-id: e2c47fb9f9c0fe9f039cfd001c5e6d5b455e034c
2019-06-05 19:17:02 -07:00
bd2d318e23 Modify quant-dequant node api to take module object and method name (#21407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21407

Modify api takes module object and method whose graph is
instrumented to insert the quant dequant nodes

Differential Revision: D15651624

fbshipit-source-id: 1ff1ae446c986184c724504c8fdd0dcd43864016
2019-06-05 19:08:56 -07:00
505ae5f51d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 9ab609a16522eb233f128df024903eb880742224
2019-06-05 19:03:34 -07:00
f8202d85a0 Added frexp, isinf, isnan, isfinite (#21130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21130
ghimport-source-id: fa771086da13deed232e142db6f940439bcc67bc

Differential Revision: D15563186

Pulled By: Chillee

fbshipit-source-id: fe33dbc454af2a9626ad810a5304300eb17d7530
2019-06-05 18:46:39 -07:00
26db46b324 change the epilogue of SLS to match the simd section (#21439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21439

this bug got exposed after testing accuracy on shapes not multiples of 8

Reviewed By: jspark1105

Differential Revision: D15684759

fbshipit-source-id: 2950f2bd87ee1d8e539148285a14c755f606b3a7
2019-06-05 18:41:55 -07:00
7e6d932208 Make strtod_c compatible with different gcc abi (#21293)
Summary:
We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace:
```
#0  0x00007fec10160207 in raise () from /lib64/libc.so.6
#1  0x00007fec101618f8 in abort () from /lib64/libc.so.6
#2  0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6
#4  0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6
#7  0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) ()
   from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
#8  0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
```

We are suspecting this line will get compiled to gcc abi dependent symbol:
```
char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point();
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21293

Differential Revision: D15609910

Pulled By: bddppq

fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1
2019-06-05 18:10:09 -07:00
e07d94558d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 6608ac4be8c338ff5a8116b275bbad487d317972
2019-06-05 16:28:40 -07:00
991c557270 Fix an incorrect implementation of celu (#21213)
Summary:
Fixing an incorrect implementation of the CELU activation function. The existing implementation works by a chance combination of errors that seem to cancel each other out. This change makes the code more readable, aligns the parameter names correctly, and is consistent with the cuda implementation.

I came across this issue while working on version counters... I attempted to specify a gradient in derivatives.yaml for CELU due to a failed test, but the derivative couldn't be specified correctly without fixing the celu implementation.
https://github.com/pytorch/pytorch/pull/20612
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21213

Differential Revision: D15678823

Pulled By: nairbv

fbshipit-source-id: 29fa76b173a66c2c44ed2e0b7959e77f95d19c43
2019-06-05 15:45:50 -07:00
335869e833 Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21425

Differential Revision: D15678631

fbshipit-source-id: 3da2c694de13ad03019e2b3ff451e762199265bb
2019-06-05 15:40:29 -07:00
6b9f46b2d0 Fix "warning: missing return statement at end of non-void function" (#21424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21424

Fixes #21418

Differential Revision: D15676140

fbshipit-source-id: cfadce164c6cfefb16f8bf7bc09529ba8b910769
2019-06-05 15:30:54 -07:00
0e3c4a054b Remove curandStateMTGP32 usage (#21301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21301
ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79

Differential Revision: D15632929

Pulled By: ezyang

fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30
2019-06-05 14:06:25 -07:00
793b302653 ensure version_counter gets incremented for non-differentiable outputs (#20612)
Summary:
issue:
https://github.com/pytorch/pytorch/issues/14571

To reproduce I:
1) added these lines to derivatives.yaml:
```
- name: add_(Tensor self, Scalar other, Scalar alpha)
  output_differentiability: [False, False, False]
- name: add_(Tensor self, Tensor other, Scalar alpha)
  output_differentiability: [False, False, False]
```

2) Ran this code:
```
import torch

scalar = torch.tensor(5)
var1 = torch.randn(4,2,requires_grad=True)
var2 = var1.detach().requires_grad_()
output1 = var1 * scalar
output2 = var2 * scalar
output1.sum().backward()
scalar.add_(5, 1)
output2.sum().backward()
print(var1.grad)
print(var2.grad)
```
Observed modified var2.grad in the output:
```
tensor([[5., 5.],
        [5., 5.],
        [5., 5.],
        [5., 5.]])
tensor([[10., 10.],
        [10., 10.],
        [10., 10.],
        [10., 10.]])
```

After making this change, re-running the above code produces the expected error:
```
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    output2.sum().backward()
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.LongTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20612

Differential Revision: D15661958

Pulled By: nairbv

fbshipit-source-id: af3373135a1a589a635b49e0ff62622a210258e6
2019-06-05 13:36:05 -07:00
8215f44405 Revert D15660575: [pytorch][PR] Fix Caffe2 CI job for new Windows AMI
Differential Revision:
D15660575

Original commit changeset: cfc0f325b0fb

fbshipit-source-id: cb7d87605c9019b9e563bf5ce4325a919263938e
2019-06-05 12:15:34 -07:00
98e3aaeb78 Adding support for exporting models with variable length input/output to ONNX (#20034)
Summary:
Proposal: https://gist.github.com/pk-g/cc45ff8c5891b5699bffd883a87f13ae?fbclid=IwAR17bRA7Fks4APoZRYiNa93UkLdoFCpRDuIYEx0lNVyPTyaDAShbEnytiQo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20034

Reviewed By: zrphercule

Differential Revision: D15606731

Pulled By: houseroad

fbshipit-source-id: 247251e07b4893cb3f7a1287948b1f57aadb7851
2019-06-05 12:02:23 -07:00
ba2bdf8d0e Added factorial (#21129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21129
ghimport-source-id: a676dd33c4d0b2b60c3e9ce725bda0abeb22375f

Differential Revision: D15563183

Pulled By: Chillee

fbshipit-source-id: 641cae34c181a16c772665f5f7ed01c96a67ea9c
2019-06-05 11:51:03 -07:00
7a1c9076ac Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen
Differential Revision:
D15632268

Original commit changeset: 8e337e8dc17a

fbshipit-source-id: de98b1af51a53105c97fb076b09efb6fa8eb08a7
2019-06-05 11:41:50 -07:00
f172fadd80 Make warnings be UserWarnings with source file info (#21231)
Summary:
Redo of #15201, this makes `warnings.warn` calls match their Python
behavior
](https://our.intern.facebook.com/intern/diff/15605266/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21231

Pulled By: driazati

Differential Revision: D15605266

fbshipit-source-id: 5931fd720b0c40d52dd492fbd1f5a76abefaab5c
2019-06-05 11:09:11 -07:00
3068a945ce Retry awscli install. (#21383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21383
ghimport-source-id: e518a4f8bf498694b6d504b8a695c5f11e7c681f

Differential Revision: D15664738

Pulled By: ezyang

fbshipit-source-id: d645db505de906e65c057f0d6964b5ce0fb6ff52
2019-06-05 10:38:01 -07:00
bf0e3b62ae Minor preparational JIT changes (#21096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21096
ghimport-source-id: 169f8b4b70cd77b0f9b07cca81d2b4cde2c46456

Reviewed By: ezyang

Differential Revision: D15546176

Pulled By: izdeby

fbshipit-source-id: cdd0a1c87263955eef9d3174ec2f36d1d2935f48
2019-06-05 10:30:01 -07:00
c15254d4ab Expunge some more deprecated uses of AT_CHECK.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21194

Differential Revision: D15576898

fbshipit-source-id: f030195f5bffe0027d4081aece57e2852aaf9ecb
2019-06-05 10:25:25 -07:00
ec7dc52e60 Fix a bug in qconv (#21294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21294

Returned output tensor wasn't of correct shape

Reviewed By: zafartahirov

Differential Revision: D15605081

fbshipit-source-id: f79a9d5b93b8b97e79c09411b9dc681987a61e44
2019-06-05 10:19:31 -07:00
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
22ddddfb80 Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen (#19465)
Summary:
This PR is a continuation of #15310, which itself is a continuation of #14845, #14941, & #15293. It should be synced up with the pytorch/master branch as of yesterday.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19465

Differential Revision: D15632268

Pulled By: ezyang

fbshipit-source-id: 8e337e8dc17ac31439935ccb530a7caf77f960e6
2019-06-05 10:13:58 -07:00
6874c4058d Add type annotation to stft (#21302)
Summary:
We want to be able to call stft from a torchscript which requires that stft have a type annotation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21302

Differential Revision: D15607973

Pulled By: cpuhrsch

fbshipit-source-id: c4a5c09cdaafe7e81cf487a3ad216d1b03464a21
2019-06-05 10:06:48 -07:00
7c6f2836d4 Fix Caffe2 CI job for new Windows AMI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21410

Differential Revision: D15660575

Pulled By: ezyang

fbshipit-source-id: cfc0f325b0fbc22282686a4d12c7a53236d973d4
2019-06-05 06:35:39 -07:00
6251c563eb Add CUDA support for _dirichlet_grad (#21191)
Summary:
Changelog:
- Migrate _dirichlet_grad implementation from TH to ATen
- Add CUDA support for _dirichlet_grad

Closes #11030.
Closes #15773.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21191

Differential Revision: D15660330

Pulled By: ezyang

fbshipit-source-id: c8ad5b80366e5348139ce9be10400f22fc430344
2019-06-05 06:35:35 -07:00
b460a1987e Per discussion at https://github.com/pytorch/pytorch/pull/21244, fix bugs in (#21392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21392

as discussed at https://github.com/pytorch/pytorch/pull/21244, we
found some values in log_beta are not properly initialized. This diff will 1)
initialize all log_beta to -inf; 2) fix a tricky compare condition; 3) zero all
the gradient elements corresponding to padding to zero.

Offline experiments show that this diff can fix previous seen NaN loss.

Differential Revision: D15637977

fbshipit-source-id: 477008a5e11aae946bd2aa401ab7e0c513421af0
2019-06-05 00:28:45 -07:00
42b2f56124 Fixing race condition at Module::forward method (#21398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21398

Module::forward method calls find_method() function potentially in multiple threads.
Internally it calls find_offset() method and reads dict_ object.
If the correspondent name is not in a dictionary thread call insert() method and modifies dict_ object.
At the same time when first thread modifies dict_ object another thread can enter forward()->find_method()->find_offset() path
and access dict_ object for reading while it have been modified -> crash.
Moved mutex protection up to protect both calls find_offset() and insert().
Consider to use C++ 17 shared_mutex locking object instead of recursive_mutex object.

Reviewed By: bddppq

Differential Revision: D15638942

fbshipit-source-id: ca6a453448302a0b3666c87724755fa4e9ce242f
2019-06-04 23:03:25 -07:00
95eb9339c1 Adds CUDA C++11 and Profiling Notes (#21386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21386
ghimport-source-id: 9430c7640b90d9add38d9bf2f1bd0c8f62b7f239

Differential Revision: D15640102

Pulled By: ezyang

fbshipit-source-id: 98a5efdea9b1de05207ebd3624cb20acda9fe96b
2019-06-04 19:18:55 -07:00
eadac840f7 Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (#21300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21300
ghimport-source-id: c314c28cb693b554d6f24de235c11ba24ed6bf61

Reviewed By: jerryzh168

Differential Revision: D15632935

Pulled By: ezyang

fbshipit-source-id: 9bb24f17d78151bf50942905c967bdcfe1ff00cb
2019-06-04 19:13:57 -07:00
c82bf8ef10 Move THCTensor_(lognormal) to ATen (#21299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21299
ghimport-source-id: 2c63f289f02087f023feda8bff6b90ed49737889

Reviewed By: jerryzh168

Differential Revision: D15632930

Pulled By: ezyang

fbshipit-source-id: 85c17cdca486b46942c5b500e4fd4d95bb5657f9
2019-06-04 19:13:53 -07:00
4671bed0f3 Move THCTensor_(geometric) to ATen (#21298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21298
ghimport-source-id: c0e2604aa25cc5da2b67293cafd88c2e77e476f9

Reviewed By: jerryzh168

Differential Revision: D15632932

Pulled By: ezyang

fbshipit-source-id: 248ca4b56967116f27174cda44893ecfe4ca9a99
2019-06-04 19:13:50 -07:00
d341bcb3dc Move THCTensor_(exponential) to ATen (#21297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21297
ghimport-source-id: 5f45154e714ab44dec961dabf1c64e54aaa063a2

Reviewed By: jerryzh168

Differential Revision: D15632931

Pulled By: ezyang

fbshipit-source-id: 0367eec0a9ef6812b1b3ab7597817ee40a011bb8
2019-06-04 19:13:46 -07:00
92b76df8f6 Finished trigonometric functions (#21128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21128
ghimport-source-id: d566de103f2aefc59e6423181de325d8f42620f4

Differential Revision: D15563190

Pulled By: Chillee

fbshipit-source-id: ad2e09cac5c7dae9978a7bd61098c2828620cdc4
2019-06-04 17:59:09 -07:00
7309cb60fd Finished the high-priority functions (#21127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21127
ghimport-source-id: 609021958e76ea01299f62b9491038005e6b4f27

Differential Revision: D15563189

Pulled By: Chillee

fbshipit-source-id: 5c6155a69fff7447689ef012ea303dc358d50486
2019-06-04 17:59:05 -07:00
622588d8fd Added remainder of high-priority trigonometric math ops (#21126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21126
ghimport-source-id: e310f3cfb28436b99ad038691887ca82068ca2c9

Differential Revision: D15563191

Pulled By: Chillee

fbshipit-source-id: 7135ddd5bc9eebc818694fa8b67eaade907fa8a1
2019-06-04 17:59:02 -07:00
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
ba08cf336d Reorganize cmake related functions to tools/setup_helpers/cmake.py (#21367)
Summary:
Currently tools/build_pytorch_libs.py looks quite convoluted. This commit reorgnizes cmake related functions to a separate file to make the code clearer.

 ---

This is hopefully helpful for further contribution for better integration with cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21367

Differential Revision: D15636991

Pulled By: soumith

fbshipit-source-id: 44d76e4e77aec0ce33cb32962b6a79a7f82785da
2019-06-04 17:01:38 -07:00
6ee9e87ff5 Back out "[pytorch][PR] don't materialize constants" (#21374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21374

Original commit changeset: d5609b0a5697

Not materializing constants slows compilation time significantly

Differential Revision: D15630632

fbshipit-source-id: c6b5026ee6eae2ef290628f350f49a657495bd5d
2019-06-04 16:32:09 -07:00
45d2305732 fix incorrect default on Graph::toString (#21370)
Summary:
This default was incorrect and made printing in python not print file:line:col

This wasn't caught because FileCheck internally uses operator<< to print the graph, which has `true` hardcoded as the value. I've added more comprehensive tests to catch this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21370

Differential Revision: D15631135

Pulled By: jamesr66a

fbshipit-source-id: c809e06fff4f0174eefeb89062024384b4944ef7
2019-06-04 16:15:38 -07:00
0dc7286e15 Better error message when trying to instantiate NamedTuple
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21309

Differential Revision: D15630564

Pulled By: jamesr66a

fbshipit-source-id: 82753feee65bbe6c8b2f827cc2664628f3b9f4a3
2019-06-04 16:11:05 -07:00
d348d6405c cdist: pairwise distances between two sets of tensors with batch mode (#20934)
Summary:
Batch implementation for cdist function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934

Differential Revision: D15609458

Pulled By: ifedan

fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7
2019-06-04 15:52:52 -07:00
6a3ebdbbc5 Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380)
Summary:
| | Before | After
------------ | ------------ | -------------
Binary builds | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915716-77a5f900-86d6-11e9-8a39-7ef587e56281.png) | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915620-4a594b00-86d6-11e9-9e5f-95cf085e6fc8.png) |
Smoke tests | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915728-812f6100-86d6-11e9-80c1-182242fdfd0e.png) | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915686-68bf4680-86d6-11e9-8cd2-e65a47384b4f.png) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21380

Differential Revision: D15634729

Pulled By: kostmo

fbshipit-source-id: aef44b0e5b9997be55d93969ab85effca68c5c88
2019-06-04 15:48:47 -07:00
ca32563999 add suggestion to use lld to CONTRIBUTING.md (#21334)
Summary:
I found this significantly speeds up incremental builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21334

Differential Revision: D15632994

Pulled By: suo

fbshipit-source-id: bb4af90f4400bffa90d168d82ff30fece5e3835c
2019-06-04 15:40:49 -07:00
4940e41d16 Fix mkl-dnn tautological compare error (#21371)
Summary:
```
../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare]
        virtual pd_t *clone() const override { return new pd_t(*this); } virtual status_t create_primitive(primitive_t **primitive, const primitive_at_t *inputs, const primitive_t **outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu
t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(*primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v
irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic
? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); };
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371

Differential Revision: D15631392

Pulled By: bddppq

fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298
2019-06-04 15:27:07 -07:00
403ca41142 make analyzeConservative more conservative (#21227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21227
ghimport-source-id: cac97ba20cb020f3edc4e83e7641201f0826f40a

Reviewed By: jamesr66a

Differential Revision: D15592316

Pulled By: suo

fbshipit-source-id: b311f73a5d81d6d0b0331678b6a625e446588ebd
2019-06-04 15:09:46 -07:00
0dbae7eddb cleanup templated implementation of mayAlias (#21224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21224
ghimport-source-id: 6ec4ea015043bbddd031f92c5149e8313f21977d

Reviewed By: jamesr66a

Differential Revision: D15592318

Pulled By: suo

fbshipit-source-id: 47c52342f2a1360752306908e2f394ef52e47504
2019-06-04 15:09:43 -07:00
adf6f6c442 use memory locations instead of values for working set (#21223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21223
ghimport-source-id: 82800a465a4273e185bfffe2f67835b2f7f3a519

Reviewed By: jamesr66a

Differential Revision: D15592317

Pulled By: suo

fbshipit-source-id: 5e87c803a928b61c923200888a3ff1ac7b2523e0
2019-06-04 15:09:39 -07:00
f330168570 remove multisets from work set (#21222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21222
ghimport-source-id: 0eb6daa92bef68a35bef918c3f3a791b401812aa

Reviewed By: jamesr66a

Differential Revision: D15592319

Pulled By: suo

fbshipit-source-id: 895d26538ba1edcd73b83147a68b7e4069084230
2019-06-04 15:09:36 -07:00
df0b83654a cleanups to alias analysis (#21221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21221
ghimport-source-id: 778e7317bbe874d35a903d89af5e0bc9721c8680

Reviewed By: jamesr66a

Differential Revision: D15592313

Pulled By: suo

fbshipit-source-id: d6f6d2be8cd80b40dd26d0bb3be30f074e356105
2019-06-04 15:09:33 -07:00
77c2f5dd75 fix copyright notice in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21372

Differential Revision: D15631889

Pulled By: umanwizard

fbshipit-source-id: cf764432c27cb1b01d8137ed60ec7de361450d0e
2019-06-04 14:53:45 -07:00
57f932a638 Enable 'empty' function for mkldnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21184

Differential Revision: D15625296

Pulled By: bddppq

fbshipit-source-id: 47d26798bcf48e227ffd813f299959a7b8993641
2019-06-04 14:16:13 -07:00
b869a3b4ac add new ops to benchmark_all_test (#21365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365

This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary

Reviewed By: hl475

Differential Revision: D15627328

fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940
2019-06-04 13:54:26 -07:00
2ed6f017ed Added better tests for math ops and unified them (#21125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21125
ghimport-source-id: 2a576b563208ce3d83e6771643e20d24bc72af86

Differential Revision: D15563188

Pulled By: Chillee

fbshipit-source-id: 0e77471729f715063d6bee075d2fc65f8db8b6c3
2019-06-04 13:15:54 -07:00
6938de8851 made floor/ceil return ints (#21124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21124
ghimport-source-id: e3e45bd50c9af1ee03fd58f2f4d631ce23d9612e

Differential Revision: D15563187

Pulled By: Chillee

fbshipit-source-id: 6504a41da883a8287d64db20d40cf958edb7404c
2019-06-04 10:32:16 -07:00
87690d2b77 Move THCTensor_(cauchy) to ATen (#21289)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
cauchy, size, elements 65536 forward 4.980564117431641e-06 bandwidth (GB/s) 52.63339529803734
cauchy, size, elements 131072 forward 6.232261657714844e-06 bandwidth (GB/s) 84.12483762631982
cauchy, size, elements 262144 forward 9.548664093017577e-06 bandwidth (GB/s) 109.81389540833959
cauchy, size, elements 524288 forward 1.59454345703125e-05 bandwidth (GB/s) 131.52052963827754
cauchy, size, elements 1048576 forward 2.86865234375e-05 bandwidth (GB/s) 146.21165262978724
cauchy, size, elements 2097152 forward 5.4748058319091796e-05 bandwidth (GB/s) 153.2220184158516
cauchy, size, elements 4194304 forward 0.00010075807571411133 bandwidth (GB/s) 166.50988897012377
cauchy, size, elements 8388608 forward 0.0001935744285583496 bandwidth (GB/s) 173.34124269355965
cauchy, size, elements 16777216 forward 0.00038077831268310545 bandwidth (GB/s) 176.24129779641603
cauchy, size, elements 33554432 forward 0.0006851387023925781 bandwidth (GB/s) 195.8986224705994
```
#### After:
```
cauchy, size, elements 65536 forward 6.077289581298828e-06 bandwidth (GB/s) 43.13501874366419
cauchy, size, elements 131072 forward 6.2131881713867184e-06 bandwidth (GB/s) 84.38308731972373
cauchy, size, elements 262144 forward 6.46829605102539e-06 bandwidth (GB/s) 162.11008150033175
cauchy, size, elements 524288 forward 6.8783760070800785e-06 bandwidth (GB/s) 304.8905726935182
cauchy, size, elements 1048576 forward 9.505748748779296e-06 bandwidth (GB/s) 441.23867681003264
cauchy, size, elements 2097152 forward 1.5070438385009766e-05 bandwidth (GB/s) 556.6266744001266
cauchy, size, elements 4194304 forward 2.4406909942626954e-05 bandwidth (GB/s) 687.396152951685
cauchy, size, elements 8388608 forward 4.6243667602539064e-05 bandwidth (GB/s) 725.6005792706125
cauchy, size, elements 16777216 forward 9.100198745727539e-05 bandwidth (GB/s) 737.4439380404413
cauchy, size, elements 33554432 forward 0.00017449140548706055 bandwidth (GB/s) 769.1939188944922
```
### Double Type
#### Before:
```
cauchy, size, elements 65536 forward 4.885196685791015e-06 bandwidth (GB/s) 53.660889593753055
cauchy, size, elements 131072 forward 6.229877471923828e-06 bandwidth (GB/s) 84.15703235943361
cauchy, size, elements 262144 forward 9.605884552001953e-06 bandwidth (GB/s) 109.15975455706132
cauchy, size, elements 524288 forward 1.5976428985595704e-05 bandwidth (GB/s) 131.26537863315923
cauchy, size, elements 1048576 forward 2.9621124267578124e-05 bandwidth (GB/s) 141.59840666786866
cauchy, size, elements 2097152 forward 5.5103302001953126e-05 bandwidth (GB/s) 152.23421637604707
cauchy, size, elements 4194304 forward 0.00010124444961547851 bandwidth (GB/s) 165.70998275677383
cauchy, size, elements 8388608 forward 0.0001944279670715332 bandwidth (GB/s) 172.58027487195184
cauchy, size, elements 16777216 forward 0.00034950494766235353 bandwidth (GB/s) 192.01119883668116
cauchy, size, elements 33554432 forward 0.0007002186775207519 bandwidth (GB/s) 191.67973135938277
```
#### After:
```
cauchy, size, elements 65536 forward 5.91278076171875e-06 bandwidth (GB/s) 44.33514628129032
cauchy, size, elements 131072 forward 6.234645843505859e-06 bandwidth (GB/s) 84.09266751632889
cauchy, size, elements 262144 forward 7.433891296386719e-06 bandwidth (GB/s) 141.05344807902503
cauchy, size, elements 524288 forward 1.1401176452636719e-05 bandwidth (GB/s) 183.94171941045587
cauchy, size, elements 1048576 forward 1.960039138793945e-05 bandwidth (GB/s) 213.99082890665372
cauchy, size, elements 2097152 forward 3.434181213378906e-05 bandwidth (GB/s) 244.26806504326578
cauchy, size, elements 4194304 forward 6.517410278320313e-05 bandwidth (GB/s) 257.4215107465028
cauchy, size, elements 8388608 forward 0.0001229524612426758 bandwidth (GB/s) 272.9057365819818
cauchy, size, elements 16777216 forward 0.00023239374160766602 bandwidth (GB/s) 288.77225150621814
cauchy, size, elements 33554432 forward 0.00046050310134887696 bandwidth (GB/s) 291.4589013773367
```
Resubmit of https://github.com/pytorch/pytorch/pull/20622
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21289

Differential Revision: D15622713

Pulled By: ezyang

fbshipit-source-id: abe8bd57794bd1c3a0b92395367a9653c5d0f2db
2019-06-04 08:24:42 -07:00
f9e746e9c8 Use "important" node to toggle whether or not to build on PR (#21308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21308
ghimport-source-id: 75fba872a658d8257a3f6ff9d9e33a320c6e523e

Differential Revision: D15621909

Pulled By: ezyang

fbshipit-source-id: 6d016d9ffdeb6414d70a1b48ed4766b5dc626353
2019-06-04 08:05:56 -07:00
1291d95e82 Revert "Fix the caffe2_gpu linkage with torch on Windows" (#21335)
Summary:
The original PR (#16071) is not working anymore after `caffe2` and `torch` is unified. What's more, It is making the binary big since the optimizing flag is disabled on a very big project(the `torch` library used to be small, but it now applies on the whole `caffe2` and `caffe2_gpu` library). We need to get it reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21335

Differential Revision: D15622163

Pulled By: soumith

fbshipit-source-id: 900bd400106d27a1512eed1e9f2288114f5f41bb
2019-06-04 07:49:49 -07:00
38d68ad803 Update randomness.rst (#21337)
Summary:
Following [this question on the forums](https://discuss.pytorch.org/t/reproducibility-and-performance/46504), I propose the following doc change. It clarifies that 'performance reduction' concerns the processing speed (and not the training accuracy).

Related website commit: https://github.com/pytorch/pytorch.github.io/pull/211
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21337

Differential Revision: D15622151

Pulled By: soumith

fbshipit-source-id: f0edeb20049f2ee715c400e7c57abb966864d621
2019-06-04 07:38:00 -07:00
ae42a11ab2 Make .circleci Conf class uses dataclasses; use types. (#21284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21284
ghimport-source-id: a628af85b7a085e15903168e957bed1e273d6636

Differential Revision: D15621908

Pulled By: ezyang

fbshipit-source-id: 64e2da8b96cdc1b53c0b314771d225eebf3d4b2d
2019-06-04 07:28:53 -07:00
25a6ff10f0 Add gtest for TensorIterator (#21253)
Summary:
This adds a regression test for the bug fix in #21236. Operations
involving CUDA tensors an CPU scalars should not copy the CPU scalar to
the device (because that is slow). They should instead "lift" the scalar
to a kernel parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21253

Reviewed By: bddppq

Differential Revision: D15604080

Pulled By: colesbury

fbshipit-source-id: c14ded5d584499eaa5ea83337ffc50278205f3d6
2019-06-04 07:23:42 -07:00
fecd5fa171 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 1a30f65182aead9145cb02fb544e2b7a25043f44
2019-06-04 07:23:39 -07:00
2ee2d78a29 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0256f2f4afaeaa2c16074dbca3b9a03ca434c7de
2019-06-03 23:36:35 -07:00
af4c24153f Honor OMP/MKL environment variables in AT_PARALLEL_NATIVE case (#21189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21189
ghimport-source-id: 4dcfaf04880346ff5ca79ca4dd11c94dcb645ce5

Differential Revision: D15574578

Pulled By: ilia-cher

fbshipit-source-id: 919fccb58b997f9a7add5486a79f9cd4cabaa1ee
2019-06-03 23:22:58 -07:00
f251416d70 Update fbgemm submodule (#21328)
Summary:
Fix master breakage

cc jianyuh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21328

Differential Revision: D15618649

Pulled By: bddppq

fbshipit-source-id: bce279705520dbd9c6df5fb794cdaeaed48a1a5a
2019-06-03 22:17:04 -07:00
113a27ee45 bake constants into the traced graph, get rid of getNestedValueTrace (#21046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21046
ghimport-source-id: 5cb3efb1896fbe42336e24c14fbf0bb5e646528e

Differential Revision: D15530991

Pulled By: wanchaol

fbshipit-source-id: b096ca5a1cdce496742b7f7e1de3ef8d21e9a8b0
2019-06-03 21:48:11 -07:00
cf356a342b Fix a bug in loop unrolling (#21239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21239
ghimport-source-id: 68256b752be795b32ab3f426848ed1d64fc5ea3e

Reviewed By: suo

Differential Revision: D15590901

Pulled By: zdevito

fbshipit-source-id: 8700aab723d4486fd20d3414df8160b36a3cc5da
2019-06-03 21:35:14 -07:00
6e657c5586 Add CallMethod, inline eagerly (#21116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21116
ghimport-source-id: 3c47e335dd80f52216e50e0a215cedc1862a9e78

Reviewed By: eellison

Differential Revision: D15552816

Pulled By: zdevito

fbshipit-source-id: 708fe87439d94117dca0a26c98f0917f497f718f
2019-06-03 21:35:11 -07:00
0f58d20fe4 Add quantized::fbgemm_linear_unpack operator for serialization (#97)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
2019-06-03 20:36:30 -07:00
4b576e5184 Do not hardcode build_dir in build_caffe2. Use the build_dir parameter.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21296

Differential Revision: D15613035

Pulled By: bddppq

fbshipit-source-id: 19313cbe0135581990d489f489d366d00962a3c3
2019-06-03 20:31:30 -07:00
702ba3d2fb build torch for libtorch mobile build (#21234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21234
ghimport-source-id: 8d401691a811991c79acf5e09e60389910910365

Differential Revision: D15616540

Pulled By: ljk53

fbshipit-source-id: 150e706630911bf14c55f47f4058eaada1edf1cc
2019-06-03 19:51:05 -07:00
82ceeaeca2 Add options to jit's operator constructor (#21315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21315
ghimport-source-id: 168ddecb333a8cb309e7b859683de9b077123205

Differential Revision: D15614506

Pulled By: bwasti

fbshipit-source-id: ae013a88e2069c38845b5b8ff805db96ab2c29e9
2019-06-03 19:30:22 -07:00
457c0f164e insert missing #pragma once in VariableTypeUtils.h (#21134)
Summary:
insert missing #pragma once keyword to prevent redefinition error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21134

Differential Revision: D15607673

Pulled By: li-roy

fbshipit-source-id: 0000fa18e3c55e5d36a64b171d6e85eb4bc211a1
2019-06-03 17:50:56 -07:00
1c5bd1fa65 Automatic update of fbcode/onnx to 5160f3ac3380302224998f1c95e111cd961c4bc5 (#21311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21311

Previous import was 9005291283e943f1a91da5f0acf218bc4e8eb2ca

Included changes:
- **[5160f3ac](https://github.com/onnx/onnx/commit/5160f3ac)**: Fix typo (#2069) <Takeshi Watanabe>
- **[ac218ac6](https://github.com/onnx/onnx/commit/ac218ac6)**: Add a missing step when upgrading an operator (#2071) <daquexian>
- **[5972eed9](https://github.com/onnx/onnx/commit/5972eed9)**: Clarify the axis/size in pads, strides, dilations (#2048) <daquexian>

Reviewed By: bddppq

Differential Revision: D15612734

fbshipit-source-id: 235dc3d49e4a6ccd4f43e6c2f648e87611d52697
2019-06-03 17:35:53 -07:00
02fd1878e3 Cast dropout to float in RNN (#21304)
Summary:
This solves the situation where, for example, someone instantiates LSTM with `dropout=0`, a Python integer. This works fine in Python, but JIT throws a type error because it expected float but got int

Resolves https://github.com/pytorch/lockdown/issues/65
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21304

Differential Revision: D15613153

Pulled By: jamesr66a

fbshipit-source-id: eabff76e3af3de0612583b37dbc5f7eab7e248a4
2019-06-03 16:59:04 -07:00
45de3ef6a7 Export feature length information for onnxifi operator (#21303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21303

Export feature length information for onnxifi operator
recommit for D15548138
disable caffe2_extract_feature_length_for_shape_inference by default
change LOG(INFO) to VLOG(4)
change LOG(WARNING) to LOG_EVERY_N(WARNING, 1000)

Reviewed By: yinghai, ipiszy

Differential Revision: D15608620

fbshipit-source-id: f96410366fe6bae954fea9d6b50ee72f4969d024
2019-06-03 16:53:06 -07:00
7c823312d3 hub doc improvements
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21307

Differential Revision: D15610441

Pulled By: ailzhang

fbshipit-source-id: 2b2a28ed808936cf7c93db31afc6b5ea888ab1b1
2019-06-03 16:29:39 -07:00
22865d4ce1 Add ONNX export support for torch.rand. (#20559)
Summary:
This PR adds support for torch.rand export in the PyTorch ONNX exporter. There are other generator ops that need to be supported for export and they will added in subsequent PRs. This op is needed with priority for a model on our end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20559

Differential Revision: D15379653

Pulled By: houseroad

fbshipit-source-id: d590db04a4cbb256c966f4010a9361ab8eb3ade3
2019-06-03 16:09:01 -07:00
7d84ca6e06 clean code to unify the logic to use fp16 by the optimizer engine (#20915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20915

Clean the unary processor code. Some question are added into the comments to seek suggestions.

Reviewed By: pjh5

Differential Revision: D15448502

fbshipit-source-id: ef0c45718c1a06187e3fe2e4e59b7f20c641d9c5
2019-06-03 15:03:35 -07:00
3004b397f0 change test_name to be globally unique value across tests (#21206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206

This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test.

Reviewed By: zheng-xq

Differential Revision: D15543508

fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8
2019-06-03 14:55:11 -07:00
ca80ec7c97 introduce a new intrace to add op [PT changes] (#21149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149

The diff modifies the interface for PyTorch operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433897

fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58
2019-06-03 14:55:08 -07:00
88d033f842 don't materialize constants (#21229)
Summary:
This doesn't affect anything because we run constant pooling, and in the case of Closures and Forks creates unnecessary closures over constants.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21229

Differential Revision: D15587764

Pulled By: eellison

fbshipit-source-id: d5609b0a5697071fab5050eb9e03876ab9ebb27a
2019-06-03 13:36:57 -07:00
9a41f44732 Improve ONNX Loop export (#20445)
Summary:
~~This is work in progress due to its dependency on multiple pending PRs.~~

- [x] ONNX: Relax constraint on subgraph input/output type & shape check. https://github.com/onnx/onnx/pull/2009
- [x] PyTorch: Add infra to test_pytorch_onnx_caffe2.py to test ScriptModule models. https://github.com/pytorch/pytorch/pull/20256

This PR should partially resolve https://github.com/pytorch/pytorch/issues/17531. However, ideally we shouldn't need to put cast(and reshape) node to help the conversion for loop condition.

- Added cast node for condition values before entering loop node. The ONNX spec only accepts Bool type, while in PyTorch if the condition value is an output from other node it could potentially have any integral type.
- Tidying up the exported ONNX loop subgraph input type & shape. According to ONNX spec, input "M" is exported as 0-d scalar tensor with type int64. input "Cond" is exported as incomplete tensor of type Bool without shape information. This is because through out the iteration, the rank of condition value is dynamic, either 0-d or 1-d, as long as it holds a single value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20445

Differential Revision: D15534188

Pulled By: houseroad

fbshipit-source-id: d174e778529def05ee666afeee4b8fb27786e320
2019-06-03 13:00:00 -07:00
mal
4980b8b95c Renaming member variables in engine.cpp/h (#21283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21283
ghimport-source-id: 360a138e420ace3cd4ca6ccbc761c8e68319440d

Differential Revision: D15607428

fbshipit-source-id: f8df6b42796a49c4d68fa8366b6a68d5715f6421
2019-06-03 12:54:50 -07:00
37fed9b24a Rename FC to Linear for the function name (#21268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21268

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15599232

fbshipit-source-id: 0046f933657f60807fdca7009676bfb052748d91
2019-06-03 11:55:35 -07:00
63b3c5a66a Replace AT_ASSERTM with TORCH_CHECK (#21267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21267

Replace AT_ASSERTM with TORCH_CHECK: AT_ASSERTM is deprecated.

Not sure when ```AT_ASSERT``` is dprecated with some new TORCH ASSERT function.

Reviewed By: zafartahirov

Differential Revision: D15599242

fbshipit-source-id: 23f21a9a23dc3c147dc817e6d278066d0832e08d
2019-06-03 11:47:14 -07:00
ad971a37d0 Improve performance of advanced indexing backward (#20557)
Summary:
This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100:
```
Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1
Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096
```
after:
```
Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096
Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096
```
Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported.
The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index.
There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls.
I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent.
This PR also fixes #20457 by casting indices to `self` backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557

Differential Revision: D15582434

Pulled By: ezyang

fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a
2019-06-03 11:38:53 -07:00
4ac732ed7a file:line for tracing (#21247)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/21217

This adds support for recording file and line information during tracing, by extracting the top Python interpreter frame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21247

Reviewed By: suo, driazati

Differential Revision: D15594553

Pulled By: jamesr66a

fbshipit-source-id: 72e1b3a46f1dabe3e83a608ec1a7d083bd1720f9
2019-06-03 11:13:49 -07:00
27d1daab45 Export ONNX Dropout for opset 10 (#20710)
Summary:
Remove Dropout from the opset 10 blacklist.
ONNX Dropout was modified in opset 10, but only the output "mask" was modified, which is not exported in pytorch opset 9. So we can still fallback on the opset 9 op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20710

Differential Revision: D15571248

Pulled By: houseroad

fbshipit-source-id: 15267eb63308a29a435261034b2f07324db1dea6
2019-06-03 10:59:56 -07:00
770089c2b8 math module support: isnan, asinh, atanh, cosh, sinh, and tanh (#19337)
Summary:
driazati and eellison Please review This PR is for #19026 .  Specifically, isnan, asinh, atanh, cosh, sinh, and tanh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19337

Differential Revision: D15580932

Pulled By: driazati

fbshipit-source-id: 38513fa59088e038264f9f6f0d6374a13a165589
2019-06-03 10:54:42 -07:00
fb72625267 Remove onnx export expects (#21238)
Summary:
We're not getting much from checking the export strings, and they are noisy and slow development. DIdn't realize they existed until now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21238

Differential Revision: D15604256

Pulled By: eellison

fbshipit-source-id: 488e9401231228cffe132dab99d519563fa63afc
2019-06-03 10:30:12 -07:00
2e59a0a646 add contiguous function type hint for tensor (#21285)
Summary:
Fixes #21261
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21285

Differential Revision: D15604270

Pulled By: soumith

fbshipit-source-id: c1c02348e338477a507052de0a1065cf42a99387
2019-06-03 10:17:03 -07:00
96667dfe41 Write add_scalars data in the same file (#21100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21100

Added multifile flag to write scalar data into separate files. This can slow down dashboard loading.

Reviewed By: orionr

Differential Revision: D15548913

fbshipit-source-id: dd39a7f76f93025d28f14babbf933e39860e6910
2019-06-03 09:53:27 -07:00
5b33698776 Fix build error in c10 on Windows (#21005)
Summary:
Targets https://github.com/pytorch/pytorch/issues/20635#issuecomment-496265510
Reference:
1. https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2015#microsoft-specific-predefined-macros
2. https://docs.microsoft.com/en-us/cpp/cpp/deprecated-cpp?view=vs-2019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21005

Differential Revision: D15543134

Pulled By: ezyang

fbshipit-source-id: f32709b018a7de651cb31575fc6117bfc4dd3bd1
2019-06-03 09:53:24 -07:00
155f767382 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#21287)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779
normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568
normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376
normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219
normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762
normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498
normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253
normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696
normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774
normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252
```
#### After:
```
normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385
normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916
normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706
normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386
normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293
normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237
normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946
normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927
normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856
normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966
```
### Double Type
#### Before:
```
normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461
normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661
normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304
normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668
normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678
normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186
normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557
normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402
normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674
normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564
```
#### After:
```
normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444
normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349
normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296
normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726
normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345
normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881
normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457
normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966
normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818
normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795
```

Resubmit of #20621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21287

Differential Revision: D15603695

Pulled By: ezyang

fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361
2019-06-03 09:45:02 -07:00
21113c2d36 EliminateGuards
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21070

Differential Revision: D15603561

Pulled By: Krovatkin

fbshipit-source-id: 03056688e8b99eddcb30d80cc20ab37ad3f13af2
2019-06-03 09:39:45 -07:00
c8539be962 Make is_contiguous checks generic in number of arguments (#21106)
Summary:
Loops.h has contains specializations for cases where all the inputs are
contiguous as well as cases where one input is a scalar and all other
inputs are contiguous.

Previously, there were separate checks for each functions that take
zero, one, or two input arguments. This is getting unwieldy, especially
once we add support for functions that take three inputs (#21025).

This requires the use of recursive templates (which have their own
downsides), but this seems better than the alternative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21106

Differential Revision: D15562430

Pulled By: colesbury

fbshipit-source-id: 5f19ab2212e16e29552887f4585c2b4a70309772
2019-06-03 09:19:19 -07:00
b159e0ce08 Significantly simplify the spawning of pytorch libs building process. (#21105)
Summary:
Instead of attempting to hardcode calls to "ninja" or "make", we should always let cmake do it. This better integrates build configurations (DEBUG or REL_WITH_DEB_INFO) and better handles the case in which the native build tool is not in PATH (cmake has some capacity to find them and has options for users to specify their locations).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21105

Differential Revision: D15602883

Pulled By: soumith

fbshipit-source-id: 32ac46d438af00e791defde6ae5ac21c437d0bb0
2019-06-03 08:28:19 -07:00
f62a006097 Retry Fix Python DataParallel RNN in no_grad mode (#21262)
Summary:
Retry #21197

The previous one failed because it uses some Python3 only syntax.

ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262

Differential Revision: D15598941

Pulled By: mrshenli

fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715
2019-06-03 08:04:35 -07:00
0c6efbd410 Fix gelu documents (#21265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21265

Fix gelu documents

Reviewed By: hl475

Differential Revision: D15598958

fbshipit-source-id: 483040069102daada705401c36c8990598142d3d
2019-06-02 20:17:56 -07:00
eaa3ba6587 Add autograd for layer_norm on CPU (#20883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20883

Add autograd for layer_norm on CPU, after this diff, both PyTorch and jit model can automatically benefit from performance improvement of nn.functional.layer_norm

Reviewed By: zheng-xq

Differential Revision: D15483790

fbshipit-source-id: 94ed3b16ab6d83ca6c254dbcfb224ff7d88837f3
2019-06-02 16:55:32 -07:00
31c79b71ff Add gelu gradient for pytorch (#21237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21237

Add gelu gradient for pytorch

Reviewed By: zheng-xq

Differential Revision: D15589816

fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce
2019-06-02 09:42:42 -07:00
93ae040ff0 Add gelu activation in pytorch (#20665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665

Add gelu activation forward on CPU in pytorch

Compare to current python implemented version of gelu in BERT model like

  def gelu(self, x):
      x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two))

The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm.

Reviewed By: zheng-xq

Differential Revision: D15400974

fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121
2019-06-02 09:08:47 -07:00
aac424a6c4 Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode
Differential Revision:
D15577342

Original commit changeset: 1a024c572171

fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de
2019-06-01 22:17:19 -07:00
360e6d1b0b Fixes a bug in the test (#21146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21146

The error was reported by https://our.intern.facebook.com/intern/test/562949965807317?ref_report_id=1837062

The API changed from `a.quantize_linear(...)` to `torch.quantize_linear(a, ...)`

Reviewed By: dskhudia

Differential Revision: D15557418

fbshipit-source-id: 88463e09fdf1f574f1b8128f6a00c2810091cd03
2019-06-01 18:00:33 -07:00
62ae348d1a Exclude file:line from graphs used for fuser kernel cache (#21252)
Summary:
cc ezyang this is meant to fix the fuser failures on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21252

Differential Revision: D15594283

Pulled By: jamesr66a

fbshipit-source-id: 85f37e78b2de051c92ade3fe4c44c7530b4542e5
2019-06-01 16:18:55 -07:00
7c40576c61 Save the weight shape info the first time we have chance to extract it (#21233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21233

It is possible that OnnxifiOp is created in a thread where weights have been cleaned from the workspace, which is legit use case as we can create the backend once and lower all the weights. So we need to extract the weight shape info the first time we create the backend and save it.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D15587237

fbshipit-source-id: 1f264dc32c0398c42b618e9c41c119eb13e1c9f1
2019-06-01 12:55:29 -07:00
0efc527dd1 Revert D15548138: Export feature length information for onnxifi operator
Differential Revision:
D15548138

Original commit changeset: 460118648bb4

fbshipit-source-id: 1a25ca2942d804f6c88e96c436f09f68c260b9be
2019-06-01 12:41:47 -07:00
51ebbe970a Fix Python DataParallel RNN in no_grad mode (#21197)
Summary:
Fixes #21108

When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)).

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

apsdehal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197

Differential Revision: D15577342

Pulled By: mrshenli

fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c
2019-06-01 10:37:57 -07:00
f051fbd4a8 Fix typo in test_dataloader
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226

Differential Revision: D15592797

Pulled By: soumith

fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724
2019-06-01 10:30:14 -07:00
d168a8533f compare scalar device with common device (#21236)
Summary:
I think there was a typo in #20690 here https://github.com/pytorch/pytorch/pull/20690/files#diff-b47a50873394e38a005b4c1acd151957R130.
Original conditional was ` common_backend == Backend::CUDA && op.tensor.type().backend() == Backend::CPU)`, now it is `op.device.is_cuda() && op.tensor.device().is_cpu()`. It seems that `op.device` and `op.tensor.device()` should be the same, so this conditional is never true. This leads to spurious h2d copies for operations between cuda tensors and cpu scalars, because cpu scalars are now sent to gpu, instead of being passed to lambdas directly.
Unfortunately, I don't know how to test this change, because functionally everything was fine after #20690, it was just a performance regression.

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21236

Differential Revision: D15592754

Pulled By: soumith

fbshipit-source-id: 105bfecc61c222cfdb7294a03c9ecae3cc7f5817
2019-06-01 10:24:31 -07:00
41b17e2458 Fix wrong type hints for Tensor.is_cuda, is_leaf (#21192)
Summary:
`Tensor.is_cuda` and `is_leaf` is not a predicate function but a `bool` attribute. This patch fixes the type hints in `torch/__init__.pyi` for those attributes.

```diff
- def is_cuda(self) -> bool: ...
+ is_cuda: bool
- def is_leaf(self) -> bool: ...
+ is_leaf: bool
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21192

Differential Revision: D15592766

Pulled By: soumith

fbshipit-source-id: 8c4ecd6939df8b8a8a19e1c9db6d40193bca7e4a
2019-06-01 10:04:52 -07:00
be7fc40621 Fix sccache not being used on Windows (#21248)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21167.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21248

Differential Revision: D15592742

Pulled By: soumith

fbshipit-source-id: 4add002698c13301f142526cd783c866d345bf5e
2019-06-01 09:47:39 -07:00
619261d7a7 Add file-line info for jit.load and string frontend (#21217)
Summary:
This makes file-line reporting also work for things loaded using `torch.jit.load()` as well as the string frontend (via `CompilationUnit`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21217

Differential Revision: D15590838

Pulled By: jamesr66a

fbshipit-source-id: 6b6a12574bf9eca0b83f24f0b50535fda5863243
2019-05-31 23:43:15 -07:00
b663eec119 Lazily build error strings in schema matching using replay. (#21241)
Summary:
Saves ~20% (5.3s -> 4.3s) loading DenseNet on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21241

Differential Revision: D15590338

fbshipit-source-id: 2c8aebc829d4ea46f358d74d396cc44f5f57fcf5
2019-05-31 23:34:20 -07:00
5bc7c1f83d fix contribution and governance links (#21243)
Summary:
Updated web links on contribution_guide and governance documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21243

Differential Revision: D15591065

Pulled By: soumith

fbshipit-source-id: fdcfc518605a08a2ac35a10c146122d7d0a3f609
2019-05-31 21:02:13 -07:00
85786bea7d Export feature length information for onnxifi operator (#21110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21110

Export feature length information for onnxifi operator

Reviewed By: ipiszy

Differential Revision: D15548138

fbshipit-source-id: 460118648bb4467c096f79dea524060c9524f23d
2019-05-31 20:25:34 -07:00
516ea33f6a add PT maxpool and avgpool ops to the benchmark suite (#21200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200

This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15541980

fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3
2019-05-31 19:35:29 -07:00
dceea73460 add PT conv and convtranspose ops to the benchmark suite (#21199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199

This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15520817

fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde
2019-05-31 19:35:25 -07:00
2d75d31398 add PT linear op to the benchmark suite (#21204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204

as title

Reviewed By: hl475

Differential Revision: D15484743

fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e
2019-05-31 19:35:22 -07:00
00b3e69211 add PT batchnorm op to the benchmark suite (#21201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201

as title

Reviewed By: hl475

Differential Revision: D15482581

fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb
2019-05-31 19:35:18 -07:00
ed1078bde3 migrate matmul operator to the new interface (#21198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198

as title

Reviewed By: hl475

Differential Revision: D15325768

fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc
2019-05-31 19:35:15 -07:00
c8dc707fee avoid multiple writes to files on export (#21186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21186
ghimport-source-id: 2f62fed50e0d74f4162b74b6a2f44b8baa376316

Differential Revision: D15581527

Pulled By: suo

fbshipit-source-id: b1150cfa47d8df6f217f048c742a5ba9fa7f7935
2019-05-31 19:14:46 -07:00
4c19421f16 Register gradient op with engine (#21205)
Summary:
cc dreiss
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205

Differential Revision: D15578948

Pulled By: bddppq

fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345
2019-05-31 18:48:47 -07:00
daa1e2de1a Add file:line:graph to graph printout (#21180)
Summary:
Example:

```
import torch

torch.jit.script
def foo(x):
    y = torch.neg(x)
    return x - y

print(foo.graph.debug_str())
```

```
graph(%x : Tensor):
  %2 : int = prim::Constant[value=1]()
  %y : Tensor = aten::neg(%x) # demo.py:5:9
  %3 : Tensor = aten::sub(%x, %y, %2) # demo.py:6:12
  return (%3)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21180

Differential Revision: D15583548

Pulled By: jamesr66a

fbshipit-source-id: 0c6dc2fb7555c01dde9c563b78422ef234b2681b
2019-05-31 18:14:18 -07:00
678dc44d4c use _sparse_coo_tensor_unsafe in coalesce for speedup (#21214)
Summary:
Studied why sparse tensor coalesce was slow:  issue #10757.

Using nv-prof, and writing a simple benchmark, I determined bulk of the time was used ``kernelTransformReduceInnermostDimIndex``, which is called when sparse tensor is constructed with sparse_coo_tensor when it does sanity check on the minimum and maximum indices. However, we do not need this sanity check because after coalescing the tensor, these min/maxs won't change.

On my benchmark with 1 million non-zeros, the runtime of coalesce. was about 10x from 0.52s to 0.005 sec.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21214

Reviewed By: bddppq

Differential Revision: D15584338

Pulled By: akyrola

fbshipit-source-id: a08378baa018dbd0b45d7aba661fc9aefd3791e0
2019-05-31 17:10:05 -07:00
9e5f1db66b Reuse common options between ONNXIFI and TVM transformations (#21163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21163

These two backend transformation share some common traits. Therefore we want to reuse the data struct/code as much as possible.

Reviewed By: hlu1

Differential Revision: D15561177

fbshipit-source-id: 35f5d63b2b5b3657f4ba099634fd27c3af545f1b
2019-05-31 17:01:36 -07:00
b12a5f6155 schema_matching.cpp: mark internal functions as static. (#21140)
Summary:
Some of the functions are only used in this file - mark them `static`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21140

Differential Revision: D15578076

Pulled By: Krovatkin

fbshipit-source-id: 71ae67baabebd40c38ecb9292b5b8202ad2b9fc1
2019-05-31 16:40:16 -07:00
668dbcc41b migrate intraop benchmarks to the new interface (#21202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202

Migrate Ilia's op benchmarks to the new interface

Reviewed By: hl475

Differential Revision: D15322577

fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e
2019-05-31 16:19:04 -07:00
c62d476206 migrate add operator to the new interface (#21152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152

Migrate existing add benchmark to use the new op front-end

Reviewed By: zheng-xq

Differential Revision: D15325524

fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29
2019-05-31 16:19:00 -07:00
fd19d06db4 remaining use of t.quantize_linear (#21219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21219

att

Differential Revision: D15583802

fbshipit-source-id: 742e8b799d67485b2d48b1458839f3f3b000f200
2019-05-31 16:05:44 -07:00
4dbeb87e52 PyTorch Dockerfile should update submodules recursively.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21216

Differential Revision: D15584114

Pulled By: bddppq

fbshipit-source-id: dbe0c3a54024a90fcd2c6689f8b9689ed0cd639b
2019-05-31 14:56:57 -07:00
0aeb971622 conditionally defined var better error message (#20911)
Summary:
i will do loops in a follow up after some other changes I am working on have landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20911

Differential Revision: D15497205

Pulled By: eellison

fbshipit-source-id: 8cac197c6a6045b27b552cbb39e6fc86ca747b18
2019-05-31 14:32:03 -07:00
2f4824b2fb Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work

](https://our.intern.facebook.com/intern/diff/15560490/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15560490

fbshipit-source-id: cc7ef3a1c2772eff9beba5f3e66546d2b7d7198a
2019-05-31 14:27:16 -07:00
834d678eb8 Remove old custom op implementation (#21085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21085

Now that torch::jit::RegisterOperators() always passes through to torch::RegisterOperators() (see diffs stacked below this), we can remove the old custom op implementation.

Reviewed By: dzhulgakov

Differential Revision: D15542261

fbshipit-source-id: ef437e6c71950e58fdd237d6abd035826753c2e4
2019-05-31 13:51:14 -07:00
384d828ea5 Add aliasAnalysis to torch::RegisterOperators() (#21084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21084

- Now AliasAnalysisKind can be set using the torch::RegisterOperators() API
- This also allows us to remove the last place in torch::jit::RegisterOperators that didn't use c10 yet.

Reviewed By: dzhulgakov

Differential Revision: D15542097

fbshipit-source-id: ea127ecf051a5c1e567e035692deed44e04faa9e
2019-05-31 13:51:07 -07:00
80556761c8 c10::OperatorOptions (#21181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21181

Implement c10::OperatorOptions as a class to store metadata about operators.
This is meant to replace torch::jit::OperatorOptions.

Reviewed By: dzhulgakov

Differential Revision: D15569897

fbshipit-source-id: 95bf0bf917c1ef2bdf32702405844e1a116d9a64
2019-05-31 13:51:00 -07:00
b91e0d14a7 registration options should only be callable on rvalues (#21079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21079

They're invalidating *this, so they shouldn't be callable on non-rvalues.

Reviewed By: dzhulgakov

Differential Revision: D15541583

fbshipit-source-id: a2a9dafb29af03477486ea2ce9029399f557c728
2019-05-31 13:50:54 -07:00
181792176d Implement various AliasAnalysis operations directly on top of MemoryLocations. (#21203)
Summary:
This reduces DenseNet load time by about 25% (down to 5.3s on my laptop) and gets AliasAnalysis out of the profile top hits entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21203

Differential Revision: D15578155

fbshipit-source-id: ddbb1ad25c9540b5214702830084aa51cc6fd3cb
2019-05-31 13:38:32 -07:00
e098878d75 Cuda persistent softmax (#20827)
Summary:
Adds persistent cuda kernels that speed up SoftMax applied over the fast dimension, i.e. torch.nn.Softmax(dim=-1) and torch.nn.LogSoftmax(dim=-1). When the size is <= 1024, this code is 2-10x faster than the current code, speedup is higher for smaller sizes. This code works for half, float and double tensors with 1024 or fewer elements in the fast dimension. Numerical accuracy is on par with the current code, i.e. relative error is ~1e-8 for float tensors and ~1e-17 for double tensors. Relative error was computed against the CPU code.

The attached image shows kernel time in us for torch.nn.Softmax(dim=-1) applied to a half precision tensor of shape [16384,n], n is plotted along the horizontal axis. Similar uplifts can be seen for the backward pass and for LogSoftmax.

![image](https://user-images.githubusercontent.com/41591019/58212822-b63ebb00-7cb5-11e9-910d-1fc7d8585d58.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20827

Differential Revision: D15582509

Pulled By: ezyang

fbshipit-source-id: 65805db37487cebbc4ceefb1a1bd486d24745f80
2019-05-31 13:20:15 -07:00
052bab7069 Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 (#21115)
Summary:
This is a follow up on Jame's PR: https://github.com/pytorch/pytorch/pull/19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance.

benchmark(from Jame's script):

```python
import torch, time
ops = ['sinh', 'cosh']
x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')
```
code on master:

```
op	time per iter (ms)	gops/s	GB/s
sinh	3.37614369392395	0.3105839369002935	2.484671495202348
cosh	3.480502033233643	0.3012714803748572	2.4101718429988574
```
after change (on Macbook pro 2018):

```
op	time per iter (ms)	gops/s	GB/s
sinh	0.8956503868103027	1.1707425301677301	9.365940241341841
cosh	0.9392147302627564	1.1164390487217428	8.931512389773943
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21115

Reviewed By: ljk53

Differential Revision: D15574580

Pulled By: xta0

fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e
2019-05-31 12:06:26 -07:00
7f960a9c01 remove quantize_linear from Tensor method (#21196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15577123

fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240
2019-05-31 12:01:10 -07:00
c185145d8c remove dependency to caffe2::math and eigen (#21169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21169

We should minimize dependency from perfkernels (we were including eigen header files only in cc files not compiled with avx or avx2 options but better to be very strict because it's easy to introduce illegal instruction errors in perfkernels)

Reviewed By: salexspb

Differential Revision: D15563839

fbshipit-source-id: d4b1bca22d7f2e6f20f23664d4b99498e5984586
2019-05-31 11:55:16 -07:00
8c927b208c improve test_docs_coverage error messages (#21029)
Summary:
Most important fix: Correct "tensor.rst" to "tensors.rst"

Secondary fix: some minor English spelling/grammar fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21029

Differential Revision: D15523230

Pulled By: umanwizard

fbshipit-source-id: 6052d8609c86efa41a4289cd3a099b2f1037c810
2019-05-31 11:13:39 -07:00
e13b483f58 Fix weak module cuda() _flat_weights bug (#21107)
Summary:
Dynamically creating a type at runtime was messing up the MRO and has been causing many other problems. I think it's best to delete it, this causes a regression since
```python
self.linear = nn.Linear(10, 10)
isinstance(self.linear, nn.Linear)
```
will now be `False` again, but this will be fixed once recursive script mode is the default (#20939)
](https://our.intern.facebook.com/intern/diff/15560549/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21107

Pulled By: driazati

Differential Revision: D15560549

fbshipit-source-id: 7bd6b958acb4f353d427d66196bb4ee577ecb1a6
2019-05-31 10:35:30 -07:00
0223d3744a introduce a new intrace to add op [C2 changes] (#21148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148

The diff modifies the interface for Caffe2 operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433888

fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53
2019-05-31 09:21:07 -07:00
31089b02ce introduce a new interface to add op [core changes] (#21147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147

This diff introduces a new interface to add PT/C2 operators to the benchmark suite.

The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors),  ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs

Reviewed By: zheng-xq

Differential Revision: D15250380

fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
012069ca8f Revert D15454048: Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen
Differential Revision:
D15454048

Original commit changeset: 8bfc57bf015b

fbshipit-source-id: 98c562ab4cf7a00e9041b2aa50eb7fb0f0c48f69
2019-05-31 07:49:22 -07:00
dc8f306b8e Revert D15454052: Move THCTensor_(cauchy) to ATen
Differential Revision:
D15454052

Original commit changeset: 4f4d33ec11cf

fbshipit-source-id: 832a738796e6b6bdf969a44bb2cdcf171cbd5f77
2019-05-31 07:49:18 -07:00
be9ce6318e remove import torchvision when testing torch.hub (#21132)
Summary:
This should pass once https://github.com/pytorch/vision/pull/971 is merged.
To remove torchvision as baseline, we just compare to sum of all param.sum() in pretrained resnet18 model, which means we need to manually update the number only when that pretrained weights are changed, which is generally rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21132

Differential Revision: D15563078

Pulled By: ailzhang

fbshipit-source-id: f28c6874149a1e6bd9894402f6847fd18f38b2b7
2019-05-31 07:38:30 -07:00
e161360b62 Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method
Differential Revision:
D15558784

Original commit changeset: 0b194750c423

fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529
2019-05-31 06:20:05 -07:00
5fcd37bd8f List (#21164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21164

Write a List type to be used in operator kernels. This abstracts away from the concrete list type used (e.g. std::vector vs SmallVector)
and allows us to change these implementation details without breaking the kernel API.
Also, this class allows for handling List<bool>, which would not work with ArrayRef because vector<bool> is a bitset and can't be converted to ArrayRef<bool>.

Reviewed By: ezyang

Differential Revision: D15476434

fbshipit-source-id: 5855ae36b45b70437f996c81580f34a4c91ed18c
2019-05-31 04:15:39 -07:00
f91f24764e remove quantize_linear from Tensor method (#21156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15558784

fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9
2019-05-30 22:02:12 -07:00
0a0ff83124 replace num_bits with quant_min and quant_max (#21097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21097

att

Differential Revision: D15547166

fbshipit-source-id: 60bc7f7d82c424558b67881627fb74f1eff515af
2019-05-30 20:57:57 -07:00
277bf69fa0 Add torch.load/torch.save for QTensor (#20830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830

att

Reviewed By: dzhulgakov

Differential Revision: D15340701

fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226
2019-05-30 20:52:19 -07:00
eb4d43df3b Make CUDA triu / tril support batches of size > 65535 (#21067)
Summary:
In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints.

Changelog:
- Compute offset, row and col in the kernel
- Use 1st dimension of grid alone
- Remove unnecessary contiguity checks on tensors as a result of this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21067

Differential Revision: D15572501

Pulled By: ezyang

fbshipit-source-id: 93851cb661918ce794d43eeb12c8a38762e1358c
2019-05-30 20:16:11 -07:00
057ddab766 on import, register class before defining it (#21182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21182
ghimport-source-id: 2457a4306c0a72888bb8359a267fcd12b43f103a

Differential Revision: D15571334

Pulled By: suo

fbshipit-source-id: 26ca9dddb25df1b1eac2e17c70f682e20e08cb6d
2019-05-30 20:09:01 -07:00
d6438c956b Move THCTensor_(cauchy) to ATen (#20622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20622
ghimport-source-id: b100d6cededf6f2c2020c3d7961271f16497bbdc

Differential Revision: D15454052

Pulled By: ezyang

fbshipit-source-id: 4f4d33ec11cf36b91c67759bd27252d1e457cff1
2019-05-30 18:13:16 -07:00
26d16ae515 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#20621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20621
ghimport-source-id: f461d7f1eb6b5a8306dd8175cbb0a7fcc9f64c76

Differential Revision: D15454048

Pulled By: ezyang

fbshipit-source-id: 8bfc57bf015b85f57ed99a54176926386aab4e34
2019-05-30 18:01:31 -07:00
07ac00d21a Automatic update of fbcode/onnx to 9005291283e943f1a91da5f0acf218bc4e8eb2ca (#21057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21057

Previous import was cc2333a3f929caca7223b98699237f19388dd585

Included changes:
- **[90052912](https://github.com/onnx/onnx/commit/90052912)**: Fix wrong condition and add --user in update_doc.sh (#2050) <daquexian>
- **[a4f44a20](https://github.com/onnx/onnx/commit/a4f44a20)**: Add bit-shift operators for supporting hashing (#1931) <Wei-Sheng Chin>
- **[0098752c](https://github.com/onnx/onnx/commit/0098752c)**: Add shape inference logic for Expand op (#2041) <Hariharan Seshadri>
- **[fbe8addb](https://github.com/onnx/onnx/commit/fbe8addb)**: update qops tests (#2040) <Ashwini Khade>
- **[874fb37c](https://github.com/onnx/onnx/commit/874fb37c)**: Fix torchvision installation (#2054) <bddppq>
- **[1f5f6582](https://github.com/onnx/onnx/commit/1f5f6582)**: Fix bug that kernel_shape rather than effective_kernel_shape is used in dilated conv (#2043) <daquexian>
- **[38b6c44e](https://github.com/onnx/onnx/commit/38b6c44e)**: Changes done internally at Facebook (#2035) <Lu Fang>
- **[5c51f0db](https://github.com/onnx/onnx/commit/5c51f0db)**: Explicitly specify type of integers in the input tensor. (#2034) <Dmitri Smirnov>

Reviewed By: benoitsteiner

Differential Revision: D15534241

fbshipit-source-id: 8d2b78a986e5b7fbeb248f2d7b80c1a07230654e
2019-05-30 17:33:18 -07:00
ff0d00f921 Updated scalar type to onnx mapping (#21095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21095
ghimport-source-id: 32a79eace02216de9170f163027b1aa93756b821

Differential Revision: D15546175

Pulled By: izdeby

fbshipit-source-id: 4e47c8538aaf30b4af198baac7279133e4d74b36
2019-05-30 17:11:12 -07:00
726caeace3 Use QTensor for bias (#21038)
Summary:
Use QTesnor for bias tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21038

Differential Revision: D15524980

Pulled By: dskhudia

fbshipit-source-id: c7bf2efc8fe3f4b5574c721c2f64ff073045ecc4
2019-05-30 16:16:03 -07:00
64f06d4964 Enable all and any for bool tensors (#21033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033
ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783

Differential Revision: D15530497

Pulled By: izdeby

fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966
2019-05-30 16:16:00 -07:00
9a22cb9f49 Enabled add, sum and mul for bool tensor (#21032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032
ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0

Differential Revision: D15530496

Pulled By: izdeby

fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24
2019-05-30 16:11:43 -07:00
fe39602451 Support for rudimentary f-strings (#21037)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/51

This adds support for converting simple f-string literals to calls to `string.format()`. It does not support conversion specifiers or format strings.

This also does not support the string parser frontend, since that implementation would be more involved and likely would require modifying our TorchScript AST
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21037

Reviewed By: zdevito

Differential Revision: D15541183

Pulled By: jamesr66a

fbshipit-source-id: ae9df85e73f646d7219c1349f5b7683becbcef20
2019-05-30 15:50:45 -07:00
76deb450c6 Record source/line info in SourceRange and report in highlight (#21157)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/20898 with flake8 fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21157

Reviewed By: zdevito

Differential Revision: D15560324

Pulled By: jamesr66a

fbshipit-source-id: fc4e429eac03d2768f758b19c9d43e0bb614c2b8
2019-05-30 15:45:30 -07:00
416357648c Optimize alias analysis (#20899)
Summary:
# Overall Improvements
1. Switched from using `unordered_set` to sparse bitset.
1. Prevent some excessive memory allocations (thanks to resistor )
1. Take advantage of the sparse bitset operations
1. Switch to `flat_hash_map` instead of `unordered_map` in some places.

# Benchmarks (somewhat approximate, best of a couple runs)
1. InceptionNet (load + one forward pass): 19.8->13.3
1. GoogleNet(load + one forward pass): 10.0 -> 7.24
1. DenseNet (only load): 7.3 -> 5.3

I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies.

## Some notes on our graph topologies
In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).

PS: Holy crap macbooks throttle an insane amount with the default fan settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899

Differential Revision: D15564612

Pulled By: Chillee

fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0
2019-05-30 15:37:50 -07:00
31aefd9b09 Adding models to jenkins benchmark script (#21010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21010

Adding and editing the jenkins benchmarks script to accommadate both
Renext and Shufflenet models.

Reviewed By: bddppq

Differential Revision: D15515354

fbshipit-source-id: 2a92c272b0b74ed3ecc78af6544a06337c7753cf
2019-05-30 15:17:40 -07:00
f6e5846a67 add handle to run all jit tests (#21161)
Summary:
Now you can run `python test/run_tests --jit` to run all jit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21161

Differential Revision: D15563912

Pulled By: eellison

fbshipit-source-id: 4bb0285cda4168b72a3dc4bba471485566a59873
2019-05-30 14:12:21 -07:00
7f308b88b9 Only populate net_pos in ssaRewrite if the op doesn't already have a net_pos argument (#21051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21051

In net transforms, we perform an SSARewrite where we update the 'net_pos' for all the ops in the net. The transform function also takes a unordered set of net positions for blacklisting. It's possible that SSARewrite will change the indexes of the ops so the blacklist is applied to the wrong ops. We fix this issue by having SSARewrite only assign new net_pos if the op doesn't already have one.

Reviewed By: yinghai

Differential Revision: D15532795

fbshipit-source-id: e020492a7b5196a91cdc39d0eda761b1ca612cdb
2019-05-30 13:37:35 -07:00
80020306ef Added base parameter to math.log (#21151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21151
ghimport-source-id: 76dc0852022a87a000888a787de1391f71923074

Differential Revision: D15563185

Pulled By: Chillee

fbshipit-source-id: 6ed7cc32ed7c103f360022b97f6df47ccd0403e7
2019-05-30 13:32:52 -07:00
4e3e4d7ff5 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 80a731a1b8b04df01cb0d68ec39d4af10e0b61b7
2019-05-30 13:07:20 -07:00
4aee92833c Update libtorch docs (#21150)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21150

Differential Revision: D15559590

Pulled By: pjh5

fbshipit-source-id: 4063bf91464425e8efe4765dc17bb7e9b7bfccc7
2019-05-30 12:49:56 -07:00
313ef4f5d5 Make data_ptr a method on Tensor (#20878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20878
ghimport-source-id: f19993d97ecb8cfcd60b371d9ed49e3ad2e051c7

Differential Revision: D15482061

Pulled By: li-roy

fbshipit-source-id: c0563ce849fc3277e86a1a58bd384e38365786b2
2019-05-30 11:47:59 -07:00
d17aa72373 Added more regression test for groupconv w/o bias. (#18519)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/18218, which was fixed by https://github.com/pytorch/pytorch/pull/18463 with mkl-dnn upgraded to v0.18.1.
Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18519

Differential Revision: D14643071

Pulled By: soumith

fbshipit-source-id: d0ebed59326c67089e042b50583b87ed2c3ccc2f
2019-05-30 11:36:07 -07:00
6dc445e1a8 Conservative alias analysis rules for CallFunction/CallMethod (#21087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21087
ghimport-source-id: 4fa6763ffecc7d2974b902dd9bd2bd9ac467bab7

Differential Revision: D15542512

Pulled By: zdevito

fbshipit-source-id: 2dcd673cd4c200d7a854347429d4f33a11793cbc
2019-05-30 11:01:56 -07:00
b6d1a72f48 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: 7fad3a0567022dd417f4bd079a50a22e3c1dc020

Differential Revision: D15547218

Pulled By: suo

fbshipit-source-id: 5dbd567c79e6d01e9af4b8552777f7f0043df5b2
2019-05-30 10:50:34 -07:00
ec76976a7a Remove all devtoolset7 jobs (#21153)
Summary:
These do not work. We'll save time and cpu until someone has the time to fix these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21153

Differential Revision: D15558601

Pulled By: pjh5

fbshipit-source-id: f9bfe580aa7962a88506f9af0032647f553637a4
2019-05-30 10:39:26 -07:00
fffffde2f8 Delete more tabs, fix lint. (#21142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21142
ghimport-source-id: 4666c0731d9c08e9990ffafd0ae88fa1e7896348

Differential Revision: D15555285

Pulled By: ezyang

fbshipit-source-id: 9e5bfacf202ceba37bd29cfd5dcb651b7f79068d
2019-05-30 06:36:47 -07:00
e9df9e7960 Revert D15552424: [pytorch][PR] [JIT] Record source/line info in SourceRange and report in highlight
Differential Revision:
D15552424

Original commit changeset: 78d0f0de03f7

fbshipit-source-id: cc24f62189b7bbcdc1406912cfb3d4ca52b8e67e
2019-05-30 05:17:15 -07:00
c4a90ca18e Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method
Differential Revision:
D15477933

Original commit changeset: c8aa81f681e0

fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb
2019-05-30 05:04:12 -07:00
3805490d6a Typo fix (#21122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21122

fix a typo

Reviewed By: dzhulgakov

Differential Revision: D15553921

fbshipit-source-id: 260b0be5975d49bb6d70e45d83505efcecf02875
2019-05-30 00:16:01 -07:00
52ded63128 Revert D15546045: [jit] Add support for recursive compilation on Modules
Differential Revision:
D15546045

Original commit changeset: c2c8fe179088

fbshipit-source-id: c921fb92cf9f5c6c94c77fa5070f9c5775c91b77
2019-05-29 23:42:50 -07:00
3083c71cde First class functions in IR, inlined eagerly (#21052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21052
ghimport-source-id: cc476b9cc301967dde5de6212ca144cdb252e84c

Differential Revision: D15533353

Pulled By: zdevito

fbshipit-source-id: 4d25461969cfcc9e5f641d585584cc100c7b34ae
2019-05-29 23:04:18 -07:00
6b099edb53 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21118

Differential Revision: D15553121

Pulled By: jamesr66a

fbshipit-source-id: 14ebf0e4cb33f8155ac86a9538beb8570bdfe8c8
2019-05-29 21:50:12 -07:00
7cea6d9b71 Redesign the output shape adjustment of OnnxifiOp (#21027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21027

Previously, we are only able to adjust batch size when output shape has batch size conditioned at its first dim. Although not common, there are cases where we want to slice back the output whose batch size is conditioned on non-first dim, or whose output shape doesn't really has batch size in it but rather is an expression of it. Examples are shapes at the output of `Transpose` or `Tile`. This diff redesigns how we handle the output size. The key is when we run OnnxifiOp, the input shapes are given, and we can actually do a shape inference to derive the real output shapes, no matter how they got transformed. And then we compare the real output shape with max batch sized output shape, dim by dim and use a `Slice` op to cut the max output back to real output shape.

Notice that general `Slice` op is slow and in most of the cases, we still prefer adjusting batch size by shrinking its first dim, which is just an operation on meta info without data allocation/manipulation. Therefore, we add a flag `fast_path` to detect this situation and operate accordingly.

Reviewed By: tracelogfb

Differential Revision: D15515189

fbshipit-source-id: 9c1fff161f82d0bc20eeac07ca4a2756e964e9fd
2019-05-29 21:39:00 -07:00
6875018793 Record source/line info in SourceRange and report in highlight (#20898)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/29

Examples:

```
import torch

torch.jit.script
def foobar(x):
    return torch.blargh(xyz)

==

RuntimeError:
object has no attribute blargh:
at compile.py:5:12
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```

It also gets the correct column number in the case where the original source file has common leading whitespace in front of the callable:

```
import torch

with torch.no_grad():
            torch.jit.script
            def foo(x):
                return torch.blargh(x)

==
RuntimeError:
object has no attribute blargh:
at compile_leading.py:6:24
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20898

Differential Revision: D15552424

Pulled By: jamesr66a

fbshipit-source-id: 78d0f0de03f7ccbf3e7ea193a1b4eced57ea5d69
2019-05-29 21:32:33 -07:00
57f4f98c40 Fix borked SourceRanges
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21109

Reviewed By: zdevito

Differential Revision: D15551392

Pulled By: jamesr66a

fbshipit-source-id: 4f29214049b8feced0e740f84007b5751703ee20
2019-05-29 20:13:14 -07:00
67291ba74f remove quantize_linear and dequantize from Tensor method (#20874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874

A criteria for what should go in Tensor method is whether numpy has it, for this one it does not
so we are removing it as a Tensor method, we can still call it as function.
Python
```
torch.quantize_linear(t, ...), torch.dequantize(t)
```
C++
```
at::quantize_linear(t, ...), at::dequantize(t)
```

Reviewed By: dzhulgakov

Differential Revision: D15477933

fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437
2019-05-29 19:17:16 -07:00
8d3388aef2 Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work
](https://our.intern.facebook.com/intern/diff/15546045/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15546045

fbshipit-source-id: c2c8fe179088ffbdad47198e799a456560655b86
2019-05-29 18:52:36 -07:00
33d35f5f93 Fixed isinstance typos
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21102

Differential Revision: D15549564

Pulled By: Chillee

fbshipit-source-id: 6746dc9e01b5a30d55d544beb70b7005f0cfd8ae
2019-05-29 17:51:27 -07:00
990e63f587 Remove unnecessary sources from base CircleCI AMI (#21103)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21103

Differential Revision: D15550213

Pulled By: kostmo

fbshipit-source-id: b4a2c38d168f722b30c96494079ccdd468b9ece8
2019-05-29 17:46:08 -07:00
12b0dede39 Support exporting tensor factories from scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20255

Differential Revision: D15534186

Pulled By: houseroad

fbshipit-source-id: 182e117a35fa31445fcad8cb492160500f71599a
2019-05-29 16:53:49 -07:00
9be72ce44f Convert Tree to use intrusive_ptr instead of shared_ptr.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20815

Differential Revision: D15453817

fbshipit-source-id: 569ab807d32fb3dcebfe201a049c770b1600e5c7
2019-05-29 16:33:02 -07:00
4900edebcf QTensor permute, transpose and contiguous (#20869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869

Adding support for the functions listed in the title, by implementing the copy kernel.

Differential Revision: D15474060

fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317
2019-05-29 16:05:53 -07:00
99b057d89c Failing assertions is unlikely (#20876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876

Tell the compiler that assertions are likely to succeed.
This allows the compiler to generate betterr code and optimize for the success case.

Differential Revision: D15480066

fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee
2019-05-29 15:59:33 -07:00
9daf48525e Quantized Max Pool op (#20474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20474

parallel implementaiton of the MaxPool (no ReLU).

Reviewed By: dskhudia

Differential Revision: D15327923

fbshipit-source-id: ca6475e7fe1434b55d4b7730a074bb7ff50355fd
2019-05-29 15:01:01 -07:00
154029a6ff Revert D15534670: [jit] improve error message on inferred type
Differential Revision:
D15534670

Original commit changeset: 8bbfd6e9c1af

fbshipit-source-id: fe62cf954292e8ef1d00a3cc569206f73cedcd31
2019-05-29 14:56:08 -07:00
5dacf6b048 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: e7d6e082b0faf4f3d3e683f2c98863ee269439f0

Differential Revision: D15534670

Pulled By: suo

fbshipit-source-id: 8bbfd6e9c1afbc3006d7d55ed633e18618e05021
2019-05-29 14:47:00 -07:00
6ea9044d3c add 'all' builtin (#20521)
Summary:
[jit] add 'all' builtin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20521

Differential Revision: D15527657

Pulled By: driazati

fbshipit-source-id: eaa3c1c560810581150646858339369e4305fdf2
2019-05-29 14:46:56 -07:00
8fcd80af20 Fix "cuda: unknown error" on Windows (#21062)
Summary:
Thanks Jonas1312 for validating this workground.
Fixes #20635.
However, I don't know exactly why this one is needed.
The following are my guesses:
1. It is a CUDA bug. Static linking against `cudart` is the default now, so they didn't run enough tests for dynamic ones.
2. It is related to UCRT. But (1)according to msdn, shared DLLs should share the same CRT. (2) The CUDA related objects like `CUDevice` passing to `cudart` are stored on the stack, not the heap. (3) If this is the case, it should always fail, not sometimes. https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=vs-2019
3. It is a bug of our side. However, I was unable to find it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21062

Differential Revision: D15543557

Pulled By: ezyang

fbshipit-source-id: c23af45ebf582fad93ce5f029af6e1f06cf1d49d
2019-05-29 14:34:02 -07:00
157fcfc07d Add quantize_linear_per_channel (#20765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765

att

Reviewed By: dskhudia

Differential Revision: D15435455

fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e
2019-05-29 14:29:16 -07:00
53ccba004f New torch assertion macros (#20887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20887

Switch AT_xxx assertion macros to the TORCH_ variants and make sure the separation between TORCH_CHECK and TORCH_INTERNAL_ASSERT makes sense.

Differential Revision: D15484658

fbshipit-source-id: 490ae64cc36946756c30971f1b685048bc5f77da
2019-05-29 14:15:04 -07:00
449a2c3555 Fixes #20124 (#20203)
Summary:
Fixes #20124

Description:
Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change:

![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png)

cc SsnL, bado-lee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203

Differential Revision: D15543060

Pulled By: ezyang

fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5
2019-05-29 14:15:01 -07:00
74375299e0 add torch.nn._intrinsic and torch.nn._intrinsic.quantized namespace (#20940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20940

- `torch.nn._intrinsic` will contain normal(unquantized) fused modules like Conv2DRelu, Conv2DBnRelu, FakeQuantize ops etc.
- `torch.nn._intrinsic` will contain fused and quantized modules like Quantized Conv2DRelu, Quantized LinearRelu etc.
Right now I only added FakeQuantize op in `torch.nn._intrinsic` namespace, we'll have more later

Differential Revision: D15505228

fbshipit-source-id: d380929e38af7a5bcfbea27474d5b80f95d43b03
2019-05-29 14:09:37 -07:00
736bf7b46c Fix __constants__ for some nn modules (#21071)
Summary:
A bunch of modules were missing entries for `__constants__` which was making their `__repr__`s not work. Others had `__constants__` that were not necessary since it was provided by some parent class instead.

Fixes #20978
](https://our.intern.facebook.com/intern/diff/15539518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21071

Pulled By: driazati

Differential Revision: D15539518

fbshipit-source-id: 24bdd1ef41ef636eefd5d2bad4ab2d79646ed4f0
2019-05-29 13:55:53 -07:00
1e1f2c85f0 remove constant pooling expect (#21003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21003
ghimport-source-id: c1e0d0555758cab12ce82e0283bab559c7e8e4e2

Differential Revision: D15523443

Pulled By: wanchaol

fbshipit-source-id: 40973c1c0c0ab07fe4b1334e9ae0e4b16b5add8e
2019-05-29 13:55:50 -07:00
0ffd20c268 Fix empty tensor for unique_dim (#19000)
Summary:
Fixes: #18408

cc: zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000

Reviewed By: ezyang

Differential Revision: D15470136

Pulled By: VitalyFedyunin

fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2
2019-05-29 13:50:32 -07:00
2cd1c78632 Revert D15523444: [jit] move casting ops from prim to aten
Differential Revision:
D15523444

Original commit changeset: 642342bf1cce

fbshipit-source-id: 29de1c7e19cbb3273230c280346e786e61d2d445
2019-05-29 13:42:05 -07:00
7cb1aa67b0 Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031
ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5

Reviewed By: ezyang

Differential Revision: D15530499

Pulled By: izdeby

fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2
2019-05-29 13:22:54 -07:00
85777b92b2 Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14430749

fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707
2019-05-29 13:16:00 -07:00
a0111aaf0d move casting ops from prim to aten (#21002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21002
ghimport-source-id: 4c88a54a3ecb76c5ca3c2c328b749350860a166d

Differential Revision: D15523444

Pulled By: wanchaol

fbshipit-source-id: 642342bf1ccea83c88897bc023979a32ee01addf
2019-05-29 12:36:47 -07:00
dd903eb645 Add start and step parameters for range in torchscript (#20795)
Summary:
Fixes #18440

I calculate a derived index from `start,stop,step` as `start + step*index`. When `start=0` and `step=1` (the defaults/`range(n)`), this is the same behavior as before.

Unluckily, it seems that we do not optimize out operations like `x*1` or `x+0`. That means that we're doing lots of redundant operations when we don't need to. EDIT: More specifically, it seems like we only do this optimization for (tensor, scalar): https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/peephole.cpp#L128

The most annoying part of this code is calculating the number of iterations, given `start, stop, step`. I ended up going with the formula `(abs(stop-start) + abs(step)-1)//abs(step)`. Other intuitively appealing formulas like `(stop-start + step -1)//step` don't work for negative numbers.

I tried using `SymbolicVariable` for the calculations, but it seems that `symbolicvariable` only outputs ops for `tensors`, not the integers we have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20795

Differential Revision: D15446869

Pulled By: Chillee

fbshipit-source-id: 6085545ace04e25985c6ac870226f7a651f670d5
2019-05-29 12:31:29 -07:00
fa8c132e24 Revert D15502768: [pytorch][PR] [jit] Make ScriptModule.training an attribute instead of a parameter
Differential Revision:
D15502768

Original commit changeset: 3022f2d57ec6

fbshipit-source-id: 5cd08d3c3a75e38e3aa9b75a0c0059a2c6c85a1e
2019-05-29 12:18:18 -07:00
94b9706017 fix dequantize_linear (#21035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035

Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear`

Differential Revision: D15521931

fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616
2019-05-29 12:18:15 -07:00
cbf2a4f5c4 print a warning if a type annotation prefix is invalid according to mypy (#20884)
Summary:
This PR adds a check that prints a warning if a type annotation prefix isn't what mypy expects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20884

Differential Revision: D15511043

Pulled By: Krovatkin

fbshipit-source-id: 9038e074807832931faaa5f4e69628f94f51fd72
2019-05-29 11:56:55 -07:00
a6bb15493d Removed accidental TensorFlow dependency (#21066)
Summary:
I accidentally added a TF dependency in #20413 by using the from tensorboard.plugins.mesh.summary import _get_json_config import.

I'm removing it at the cost of code duplication.

orionr, Please review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21066

Reviewed By: natalialunova

Differential Revision: D15538746

Pulled By: orionr

fbshipit-source-id: 8a822719a4a9f5d67f1badb474e3a73cefce507f
2019-05-29 11:18:10 -07:00
f2199a34eb Hook to store additional metadata about environment (#20863)
Summary:
In larger system environment, there's usually a need to store some information about how the model was created (e.g. from which process, workflow, by which user, etc). It's almost like JPEG metadata written by camera.

This PR adds a low-level c++ hook to allow population of additional files in zip container based on environment. The reason to have it a low-level hook instead of top-level API wrapper (e.g. `m.save_with_metadata`) is to capture all usages of the saving API transparently for user.

Let me know if there are concerns.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20863

Differential Revision: D15487941

Pulled By: dzhulgakov

fbshipit-source-id: 120c5a4c9758aa82846bb51a1207f923e3da1333
2019-05-29 10:11:58 -07:00
00c1584979 Added possibility to index scalars by bool masks (#21030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21030
ghimport-source-id: 7a66ca096c62d050a38a6fcc9f6b2d61e387eb34

Differential Revision: D15530498

Pulled By: izdeby

fbshipit-source-id: d5d38f9610caa55fb7179d41f568c5ea5fa1f2e2
2019-05-29 09:32:55 -07:00
1d4685c20f Improve test_proper_exit error printing (#20166)
Summary:
This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166

Differential Revision: D15536504

Pulled By: ezyang

fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356
2019-05-29 07:51:31 -07:00
aa42742df0 ctc_loss: fix backward when 2d target tensor is larger than max_target_length (#20971)
Summary:
Previously, we didn't work when 2d target tensors had extra columns at the end. Now we just ignore those.
Also fix the confusion in the doc example regarding the number of classes.

Thank you, ypw-rich for the report with reproducing example.

Fixes: #20522
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20971

Differential Revision: D15535481

Pulled By: ezyang

fbshipit-source-id: 397e44e20165fc4fa2547bee9390d4c0b688df93
2019-05-29 05:13:00 -07:00
55f5eb3c47 DilatedMaxPool2d: small cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20936

Differential Revision: D15514542

Pulled By: ezyang

fbshipit-source-id: 6341a4bb8a9ee0b632c32a013ea609d842a21962
2019-05-29 05:06:33 -07:00
f8565121d9 Port dilated_max_pool3d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20933

Differential Revision: D15525485

Pulled By: ezyang

fbshipit-source-id: 6ff44f11d984903cd20d79cfad04963e6443e6ca
2019-05-29 04:58:42 -07:00
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
3038cf8eee Remove THSTensor and SparseTensorRef (#20877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877
ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea

Differential Revision: D15480353

Pulled By: li-roy

fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f
2019-05-29 01:37:03 -07:00
9faa409b56 Fix __irshift__ dispatch (#21047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21047
ghimport-source-id: 8c781c9882eebb07325a1fc7aa6f340bbec18886

Differential Revision: D15529160

Pulled By: li-roy

fbshipit-source-id: d9a444e42df5c509ae10849ba6f8006fbec830c5
2019-05-29 01:03:34 -07:00
8dda19b79f Remove extraneous TensorId checks in as_strided (#21045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21045
ghimport-source-id: e95fbf50bccf6ebc613bb13fb16915254912f22d

Differential Revision: D15528971

Pulled By: li-roy

fbshipit-source-id: c721cc6280dff6e14c5533681d0b35aaa8f98f00
2019-05-29 00:53:53 -07:00
d76546a463 Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (#20932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20932
ghimport-source-id: f0a7f12ffd77aec063a088b18c6b1d108c712df8

Differential Revision: D15501251

Pulled By: zdevito

fbshipit-source-id: 91e6e5944d2663b673afde45fc6eed22f31101c4
2019-05-29 00:14:25 -07:00
5c53aa4869 Make build with makefiles less noisy (#21053)
Summary:
https://github.com/pytorch/pytorch/pull/17783 has made ninja and makefile builds to print out build commands unconditionally, this has made the build log very verbose, e.g. ROCm CI build log becomes >13mb. Large build log make searching for real error hard.
https://github.com/pytorch/pytorch/pull/20508 has reverted the ninja change, and this one reverts the makefile change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21053

Differential Revision: D15533412

Pulled By: bddppq

fbshipit-source-id: ad89b617d06acc670d75d4cf25111a4081e9c95e
2019-05-29 00:08:45 -07:00
9b147961c4 Fix get_gpu_memory_info in non-cuda builds (#21054)
Summary:
#21024 broke master
cc akyrola
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21054

Reviewed By: akyrola

Differential Revision: D15533406

Pulled By: bddppq

fbshipit-source-id: 0dcfa0ce865e109b46280ef1786dbc7a8af30739
2019-05-28 23:05:15 -07:00
ffdce79078 Deprecate variadic inputs of checkpoint_sequential (#21006)
Summary:
I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`.

I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith.

Please review this pull request. Any comment will be my pleasure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006

Differential Revision: D15530801

Pulled By: soumith

fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183
2019-05-28 21:33:45 -07:00
d23d04f17f Allow nondet_tol for nondeterminism in gradcheck and gradgradcheck (#20980)
Summary:
gradcheck currently includes a determinism check (although only trying twice and seeing if results match).
This can lead to flaky tests, e.g. in #20971, but also #13818.
This adds nondet_tol for both gradcheck and gradgradcheck. It does not change / reenable any tests yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20980

Differential Revision: D15530129

Pulled By: soumith

fbshipit-source-id: 04d7f85b5b59cd62867820c74b064ba14f4fa7f8
2019-05-28 21:26:13 -07:00
d190450a35 Fix typo in CyclicLR docs (#21021)
Summary:
Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021

Differential Revision: D15530109

Pulled By: soumith

fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286
2019-05-28 21:18:50 -07:00
f1fe4b1114 add simple memory analyzer and log warning if GPU underutilized (#21024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21024

Add a new pybinded call to CUDAGetMemoryInfo.

Reviewed By: wesolwsk

Differential Revision: D15520607

fbshipit-source-id: f6d04e48f7d7cb089fc52fa8835cfee3f452d2f1
2019-05-28 19:58:54 -07:00
1bed5f39f4 Fix warning in register_c10_ops by making index unsigned (#20964)
Summary:
Just an annoying warning that's been popping up a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20964

Differential Revision: D15531064

Pulled By: Chillee

fbshipit-source-id: 9580115676c5e246481054bbfc749a551a3cca5e
2019-05-28 18:02:09 -07:00
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
c1048182be Use constants from math.h for gelu op (#20974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20974

Use constants from math.h for gelu op

Reviewed By: hl475, houseroad

Differential Revision: D15511736

fbshipit-source-id: 7d069888fb5c7c310774d056f18711365b39b8e4
2019-05-28 17:52:34 -07:00
0290897bca tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603

When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.

Reviewed By: ilia-cher

Differential Revision: D14820008

fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9
2019-05-28 17:39:23 -07:00
9a989ec469 Add an option to stop the build process once cmake terminates. (#21034)
Summary:
Add an option to setup.py to stop the build process once cmake terminates. This leaves users a chance to fine adjust build options. Also update README accordingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21034

Differential Revision: D15530096

Pulled By: soumith

fbshipit-source-id: 71ac6ff8483c3ee77c38d88f0d059db53a7d3901
2019-05-28 17:11:00 -07:00
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
2791a44948 Renaming the relu kernel and adding hypothesis tests (#20647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20647

The initial assumption was that `qint8` would be unsigned. After introduction of `quint8` and `qint8`, some tests break.

Reviewed By: jerryzh168

Differential Revision: D15332106

fbshipit-source-id: 6ed18da428915aea918a363c5f38754a3c75d06b
2019-05-28 16:46:44 -07:00
d6d192e0af Added engine information to the profiling result. (#20493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493

This helps distinguish if the op was a quantized op or not.

Reviewed By: salexspb

Differential Revision: D15337854

fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94
2019-05-28 16:41:12 -07:00
7afa75006e Enable operator profiling via command line (#20173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173

Enabled op profiling even when net type is not dag or prof dag. Also added
engine type info to summary.

Reviewed By: salexspb, ilia-cher

Differential Revision: D15177813

fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb
2019-05-28 16:41:08 -07:00
2ba608b4a0 Fixed gcd to use 64 bit integers (#21041)
Summary:
Not much to say. Fixes implementation introduced here: https://github.com/pytorch/pytorch/pull/19115
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21041

Differential Revision: D15528801

Pulled By: Chillee

fbshipit-source-id: bacd709eb711ca00156bd70480d6051b437517ed
2019-05-28 16:20:55 -07:00
28079c3906 Make ScriptModule.training an attribute instead of a parameter (#19587)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19587 [jit] Make ScriptModule.training an attribute instead of a parameter**

Remove the hack we had previously where `training` was a buffer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19587

Differential Revision: D15502768

Pulled By: driazati

fbshipit-source-id: 3022f2d57ec6849868f9225d9bc2bfb7828cb318
2019-05-28 16:06:46 -07:00
18809f7b0b Better error message in __get_state__ to let a user know that ScriptModules can't be deep-copied atm (#20885)
Summary:
Before we look into supporting `deepcopy` we could at least improve an error msg.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20885

Differential Revision: D15511023

Pulled By: Krovatkin

fbshipit-source-id: 93b8730a2cc663eee0147f14d3341d0606748eaf
2019-05-28 15:09:07 -07:00
07c4e45ca6 Some minor fixes for the changes in #20945 (#21008)
Summary:
Fixes after #20945
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21008

Differential Revision: D15526193

Pulled By: ezyang

fbshipit-source-id: 4cfabc482c149e0aeb92ae7fff04098771fe33ed
2019-05-28 14:48:50 -07:00
0885dd28c8 refactor register_prim_ops (#21001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21001
ghimport-source-id: f1b8e3999bf18fb0f3b857a13c3e3f609e1e4b4e

Differential Revision: D15523445

Pulled By: wanchaol

fbshipit-source-id: c1e29b0985bde580703a1fca9df46da773826df6
2019-05-28 14:11:04 -07:00
b85c52923b Re-land "Fix advanced indexing on "huge" Tensors" (#21019)
Summary:
This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019

Differential Revision: D15518477

Pulled By: colesbury

fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34
2019-05-28 12:45:56 -07:00
52d27890dc Improve error message for missing attribute (#20779)
Summary:
Fixes #20495 .

Now for
```python
        class A(torch.jit.ScriptModule):
            def __init__(self):
                super(A, self).__init__()

            torch.jit.script_method
            def forward(self, x):
                return x + self.whatisgoingon

        class B(A):
            def __init__(self):
                super(B, self).__init__()
            torch.jit.script_method
            def bar(self, x):
                return x * x

        A()
```
it does
```
RuntimeError:
attribute 'whatisgoingon' does not exist:
torch.jit.script_method
def forward(self, x):
    return x + self.whatisgoingon
               ~~~~~~~~~~~~~~~~~~ <--- HERE

```

I added a test in `test_jit.py` as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20779

Differential Revision: D15441138

Pulled By: Chillee

fbshipit-source-id: 88f458c36b5e32a1ffc467b27bbc28a3c5c07321
2019-05-28 12:27:52 -07:00
bc10677fcb Some name and variable cleanup (#20861)
Summary:
As a part of https://github.com/pytorch/pytorch/pull/20580 I noticed that we had some unusual variable naming in `summary.py`. This cleans it up and also removes some variables that weren't being used.

I'll wait until we have an `add_custom_scalars` test to land this.

cc lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20861

Differential Revision: D15503420

Pulled By: orionr

fbshipit-source-id: 86d105a346198a1ca543d1c5d297804402ab5a0c
2019-05-28 12:22:47 -07:00
99674eb86f Re-enable test_dag_net_forking on ROCm (#21013)
Summary:
Fixes #16229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013

Differential Revision: D15515824

Pulled By: bddppq

fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0
2019-05-28 12:12:53 -07:00
082936f033 Clarify cycliclr param docs (#20880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880

This clarifies how the momentum parameters should be used.

Reviewed By: soumith

Differential Revision: D15482450

fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766
2019-05-28 12:07:47 -07:00
68c3ef72b5 Change bound shape inference for LengthsRangeFill & GatherRanges, add more tests (#20610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20610

Change InferLengthsRangeFill
Add InferGatherRanges
add tests from ClipRangesGatherSigridHash all the way to SparseLengthsWeightedSum
add tests from SigridTransforms all the way to SparseLengthsWeightedSum

e2e test will be added in the following diff

Reviewed By: ipiszy

Differential Revision: D15382730

fbshipit-source-id: a611cd129007a273dfc43955cd99af1c4ed04efd
2019-05-28 11:33:51 -07:00
bbe3411846 Refactor schema_matching.cpp (#20549)
Summary:
It was kind of hard to read through this code so this adds a bunch of comments, no behavior should be changed

](https://our.intern.facebook.com/intern/diff/15499974/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20549

Pulled By: driazati

Differential Revision: D15499974

fbshipit-source-id: 95bf660c3b2bab1c90a2250696cece68bd1925cc
2019-05-28 10:55:09 -07:00
ff6cda0da6 Generate TH functions outside of Type (#20309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20309
ghimport-source-id: d0a0195be53f991f20eb0fbb03edf3814f18b831

Differential Revision: D15509848

Pulled By: li-roy

fbshipit-source-id: 35aafdcb9bb868a41f75cf422c48d357f8655d67
2019-05-28 02:55:51 -07:00
eacb311810 Move 1d tensor checks to TH (#20859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20859
ghimport-source-id: 675cea2c31b48bb5e3b4676640021ace783ea3a8

Differential Revision: D15509850

Pulled By: li-roy

fbshipit-source-id: 468b3b1249d58dd8104643d61d263d1f9b0308bf
2019-05-28 02:55:48 -07:00
d2f14db6cb Change view dispatch to abstract (#20308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20308
ghimport-source-id: cac8d130d45cc36e51d1661c15ad98c10353ea54

Differential Revision: D15509849

Pulled By: li-roy

fbshipit-source-id: 9576028b7075f58c431dc8c12a38c4c5a34c9340
2019-05-28 02:55:41 -07:00
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
82aecfad6a Native ATen/Parallel backend (#20087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20087
ghimport-source-id: bcfc8a86abe0893e4a380fe6f6123e2082ba4317

Differential Revision: D15248663

Pulled By: ilia-cher

fbshipit-source-id: fdb7a8860c85d8202026b629cb7fa344782bd2c4
2019-05-28 01:40:54 -07:00
f4b434a6a5 Fix incorrect torch version in CMake (#21007)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20525
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21007

Differential Revision: D15515260

Pulled By: soumith

fbshipit-source-id: 149084cce276c5e76ca0c5c0872c5e990c47bdfd
2019-05-27 23:46:49 -07:00
0556141339 fix small typo muliprocessing -> multiprocessing (#20998)
Summary:
Minor typo fix in docstring.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998

Differential Revision: D15514698

Pulled By: soumith

fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851
2019-05-27 21:36:13 -07:00
5ddbfc97e9 Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors
Differential Revision:
D15501945

Original commit changeset: e876e678e866

fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581
2019-05-27 20:26:37 -07:00
3b0d431bf5 Check for incompatible versions between CUDA and MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20945

Differential Revision: D15514576

Pulled By: ezyang

fbshipit-source-id: 3c0b8b64edce236a84a7195605d437a00a67b7f4
2019-05-27 19:22:21 -07:00
0d35f14565 Update cuSPARSE namespace collision w/ CUDA 10.1 Update 1 (#20889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20889
ghimport-source-id: 8f5f500fa542d4992cd9213923e1af8de115ee58

Differential Revision: D15495545

Pulled By: ezyang

fbshipit-source-id: 60057cf13694158299a8124b1a787cb4e3c21d21
2019-05-27 18:43:32 -07:00
9d9751f634 Convert dequantize_linear to an internal function _dequantize_linear (#20938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938

Dequantize_linear need not be exposed to the front end users.
It will only be used for the jit passes for q-dq insertion and op
substitution.

Differential Revision: D15446097

fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d
2019-05-27 15:40:21 -07:00
8e3311c5e2 Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653)
Summary:
Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653

Differential Revision: D15398888

Pulled By: cpuhrsch

fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929
2019-05-27 15:12:58 -07:00
6e76813a39 fix SyncBatchNorm doc (#20991)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20991

Differential Revision: D15513518

Pulled By: soumith

fbshipit-source-id: 9618c0b2442e013e4d37793cdb04cb4f4b1b141c
2019-05-27 14:46:58 -07:00
ebc8d7170e fix the bug for mkldnn clone (#20943)
Summary:
This PR is to solve the bug for clone a MKLDNN tensor, please see the issue https://github.com/pytorch/pytorch/issues/20895.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20943

Differential Revision: D15511516

Pulled By: mrshenli

fbshipit-source-id: 05b41d6c7eaf8703521f4c768b8f26ec8501dc5e
2019-05-27 12:09:52 -07:00
6480d3f140 Revert D15511921: [pytorch][PR] BatchSampler now uses list.clear() instead of creating new objects
Differential Revision:
D15511921

Original commit changeset: e943d21e75e1

fbshipit-source-id: 933b7ef74c7a530f0a2cc087c8ee6f0455cf9239
2019-05-27 10:51:24 -07:00
482ae8e6b2 BatchSampler now uses list.clear() instead of creating new objects
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20976

Differential Revision: D15511921

Pulled By: soumith

fbshipit-source-id: e943d21e75e19f9154a0570f3188cc3ce174083e
2019-05-26 23:45:26 -07:00
ecf012213b Update submodule URL based on redirection. (#20973)
Summary:
Changes:
  - protobuf has been moved to protocolbuffers/protobuf a while ago.
  - cpuinfo has been moved to pytorch/cpuinfo and updated in FBGEMM recently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20973

Differential Revision: D15511926

Pulled By: soumith

fbshipit-source-id: 2c50373c9b245524f839bd1059870dd2b84e3b81
2019-05-26 22:29:21 -07:00
bb89827e1d Update cuda pinned memory note to include tensor.to (#20977)
Summary:
separate bits of changes from #19228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977

Differential Revision: D15511919

Pulled By: soumith

fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec
2019-05-26 22:22:06 -07:00
1e8f129a05 In setup.py, also check some submodules of submodules. (#20937)
Summary:
Sometimes users forget using the "--recursive" option when they update submodules. This added check should help expose this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20937

Differential Revision: D15502846

Pulled By: mrshenli

fbshipit-source-id: 34c28a2c71ee6442d16b8b741ea44a18733b1536
2019-05-26 18:43:24 -07:00
8dbdd00f87 tweak tqdm to have download speed in kB/MB/etc (#20908)
Summary:
This changes the progress bars in `_download_url_to_file` from saying things like `49773343.40it/s` to `47.5MB/s`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20908

Differential Revision: D15511223

Pulled By: soumith

fbshipit-source-id: 2422eb5fb486f9ef4bd69c556c4ed1775b8b2860
2019-05-26 15:34:47 -07:00
5ab6e07180 .view(...) now suggests .reshape(...) instead .contiguous().view(...)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20968

Differential Revision: D15511236

Pulled By: soumith

fbshipit-source-id: 673fc2982ad6ea287fdd0cff2684bdc2317a6709
2019-05-26 15:34:44 -07:00
c611630b9d Fix subscripts in RNN documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20949

Differential Revision: D15510760

Pulled By: soumith

fbshipit-source-id: 51e9dbea7d8c8194e46e12311e397deff32dbe2f
2019-05-26 14:57:40 -07:00
a3a458ed30 Fix align corner docs (#20961)
Summary:
I believe the `True` and `False` in the doc are reversed :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20961

Differential Revision: D15510806

Pulled By: soumith

fbshipit-source-id: 62566bb595e187506b23dedc24892e48f35b1147
2019-05-26 14:57:37 -07:00
5e69e76aba Remove padding_mode from torch.nn.functional.conv{1,2,3}d's docstr (#20891)
Summary:
Fixes #20694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20891

Differential Revision: D15510790

Pulled By: soumith

fbshipit-source-id: aa3630693c7446bf18a390cb49c4df9bc9c59eea
2019-05-26 14:52:51 -07:00
4c5b1e3460 Update nccl submodule to v2.4.6 (#20882)
Summary:
Fixes #20630

Haven't tested it yet. Let's see if it passes all CI tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20882

Reviewed By: pietern

Differential Revision: D15483561

Pulled By: mrshenli

fbshipit-source-id: 5f0730a04d92906af077b2fe2170b674ca371e6c
2019-05-26 13:00:26 -07:00
9310e600f6 Use a simpler way to delete recursive function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20913

Differential Revision: D15508071

Pulled By: mrshenli

fbshipit-source-id: ad9a0ab4295bb0f1063d43682a10c124d8384635
2019-05-26 12:17:25 -07:00
66e6571eb8 fixed issue #20921 (#20922)
Summary:
For tensor creation ops like `torch.zeros` and `torch.ones`, the docs [0], [1] use `sizes` as the first argument to the function call while the correct argument is `size`.  This is tested for pytorch 1.1 installed using pip on ubuntu 19.04

An example

```
>>> torch.zeros(2, 3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.zeros(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: zeros() missing 1 required positional arguments: "size"
>>> torch.zeros(size = (2, 3))
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.ones(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ones() missing 1 required positional arguments: "size"
>>> torch.ones(size = (2, 3))
tensor([[1., 1., 1.],
        [1., 1., 1.]])
```

[0]: https://pytorch.org/docs/master/torch.html#torch.zeros
[1]: https://pytorch.org/docs/master/torch.html#torch.ones
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20922

Differential Revision: D15498741

Pulled By: mrshenli

fbshipit-source-id: 963324ffa004d62ca77ce30ed6f0c3932b5b79b7
2019-05-25 22:22:18 -07:00
83fe92870d Update multiprocessing note now that shared CUDA tensors are refcounted (#19904)
Summary:
The mp notes are not updated after https://github.com/pytorch/pytorch/pull/16854. (The torch.multiprocessing page is.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19904

Differential Revision: D15509661

Pulled By: soumith

fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0
2019-05-25 17:40:42 -07:00
bdce5533fe Fix pytorch_macos_10_13_py3_test (#20944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20944
ghimport-source-id: da2dcbfaff4e0e75f6b4e836fc1af1d8aee11c56

Differential Revision: D15508912

Pulled By: ezyang

fbshipit-source-id: 6758a8a516d0a875a5f6bbbb12e43d899bcf2161
2019-05-25 08:17:34 -07:00
81e70ffa19 fix bug of not using get_score_cls_index in BoxWithNMSLimitOp (#20868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20868

When `input_boxes_include_bg_cls` is false (which means `input_scores_fg_cls_starting_id` is 0), It doesn't map the class index of score currectly when sorting and limiting the detections over all classes after nms.

Reviewed By: newstzpz

Differential Revision: D15472706

fbshipit-source-id: dc1e808b63ad09fb4bd95acf866771bb3fa92d69
2019-05-24 22:31:01 -07:00
2fb665a9df Add warning about memory overhead when using multiple tiny tensors (#20801)
Summary:
added note in docs regarding #19408
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20801

Differential Revision: D15503351

Pulled By: mrshenli

fbshipit-source-id: 7ab371a7992233fb867aadd4bb6b74fccd232c33
2019-05-24 21:45:51 -07:00
c7e0722814 allow pass ordered dict for nn sequential (#20796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20796
ghimport-source-id: 9f895a2a6ebc71984196b868dc3ea6a12286bc81

Differential Revision: D15505330

Pulled By: wanchaol

fbshipit-source-id: 2922c56606b477a34f4e6433fa790d5b2de9d77a
2019-05-24 20:31:05 -07:00
b93bdf6989 Fix advanced indexing on "huge" Tensors (#20919)
Summary:
This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

Fixes #20888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919

Differential Revision: D15501945

Pulled By: colesbury

fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0
2019-05-24 16:25:04 -07:00
430d1a2761 Attempt to fix flaky test_structseq_repr (#20931)
Summary:
Previously, this used `crepr` afer the decref of `repr`. This is not
allowed because `repr` owns the cached copy of `crepr`.

Let's see if this fixes the contbuild.

See #20926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20931

Differential Revision: D15501929

Pulled By: colesbury

fbshipit-source-id: 24141ba62df8758d2a3998cf7c2054be09088b6a
2019-05-24 15:55:44 -07:00
b1df8bfe8a Reduce set of build/tests which run on PRs. (#20930)
Summary:
Resubmit of #20775

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20930

Differential Revision: D15503173

Pulled By: ezyang

fbshipit-source-id: a5de8eacf6b29ee26f07ac53c915fff3f4d32569
2019-05-24 15:25:37 -07:00
c46c6a4fe6 Zero slice bug (#20914)
Summary:
Bug reported internally at FB:

```python
>>> t=torch.from_numpy(np.empty((0,4)))
>>> t[:,1::2]*=1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76
```

This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays.

(We could *also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914

Differential Revision: D15497860

Pulled By: umanwizard

fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9
2019-05-24 15:10:59 -07:00
3858e1684b Don't print backtrace for interpreter errors (#20925)
Summary:
Eager Python errors don't include a backtrace so script shouldn't either

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20925

Pulled By: driazati

Differential Revision: D15499952

fbshipit-source-id: 1169f13ba5578cd52948725eda73de8229146bb1
2019-05-24 14:58:48 -07:00
371bd043d6 register ResizeNearestOp to C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20928

Reviewed By: smessmer

Differential Revision: D15499661

fbshipit-source-id: 5af24d5c9d7ff739b8355e19dfe66b496bc026a5
2019-05-24 14:39:11 -07:00
b5a5e296aa Support 3D mesh/point cloud (#20413)
Summary:
I started adding support for the new **[mesh/point cloud](https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/g3doc/tensorboard.md)** data type introduced to TensorBoard recently.

I created the functions to add the data, created the appropriate summaries.
This new data type however requires a **Merged** summary containing the data for the vertices, colors and faces.

I got stuck at this stage. Maybe someone can help. lanpa?

I converted the example code by Google to PyTorch:
```python
import numpy as np
import trimesh

import torch
from torch.utils.tensorboard import SummaryWriter

sample_mesh = 'https://storage.googleapis.com/tensorflow-graphics/tensorboard/test_data/ShortDance07_a175_00001.ply'
log_dir = 'runs/torch'
batch_size = 1

# Camera and scene configuration.
config_dict = {
    'camera': {'cls': 'PerspectiveCamera', 'fov': 75},
    'lights': [
        {
            'cls': 'AmbientLight',
            'color': '#ffffff',
            'intensity': 0.75,
        }, {
            'cls': 'DirectionalLight',
            'color': '#ffffff',
            'intensity': 0.75,
            'position': [0, -1, 2],
        }],
    'material': {
        'cls': 'MeshStandardMaterial',
        'roughness': 1,
        'metalness': 0
    }
}

# Read all sample PLY files.
mesh = trimesh.load_remote(sample_mesh)
vertices = np.array(mesh.vertices)
# Currently only supports RGB colors.
colors = np.array(mesh.visual.vertex_colors[:, :3])
faces = np.array(mesh.faces)

# Add batch dimension, so our data will be of shape BxNxC.
vertices = np.expand_dims(vertices, 0)
colors = np.expand_dims(colors, 0)
faces = np.expand_dims(faces, 0)

# Create data placeholders of the same shape as data itself.
vertices_tensor = torch.as_tensor(vertices)
faces_tensor = torch.as_tensor(faces)
colors_tensor = torch.as_tensor(colors)

writer = SummaryWriter(log_dir)

writer.add_mesh('mesh_color_tensor', vertices=vertices_tensor, faces=faces_tensor,
                colors=colors_tensor, config_dict=config_dict)

writer.close()
```

I tried adding only the vertex summary, hence the others are supposed to be optional.
I got the following error from TensorBoard and it also didn't display the points:
```
Traceback (most recent call last):
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 302, in run_wsgi
    execute(self.server.app)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 290, in execute
    application_iter = app(environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 309, in __call__
    return self.data_applications[clean_path](environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/wrappers/base_request.py", line 235, in application
    resp = f(*args[:-2] + (request,))
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 252, in _serve_mesh_metadata
    tensor_events = self._collect_tensor_events(request)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 188, in _collect_tensor_events
    tensors = self._multiplexer.Tensors(run, instance_tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 400, in Tensors
    return accumulator.Tensors(tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 437, in Tensors
    return self.tensors_by_tag[tag].Items(_TENSOR_RESERVOIR_KEY)
KeyError: 'mesh_color_tensor_COLOR'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20413

Differential Revision: D15500737

Pulled By: orionr

fbshipit-source-id: 426e8b966037d08c065bce5198fd485fd80a2b67
2019-05-24 14:30:58 -07:00
6063ffd055 Specify dispatch key with kernel (#20821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821

Change registration API. Instead of

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>()
        .dispatchKey(CPUTensorId()));

it is now

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(CPUTensorId()));

This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers.

The semantic problem behind this is that the dispatch key is a *kernel config parameter* and not an *operator config parameter* while things like autograd wrappers, alias info, and actually the kernel itself are *operator config parameters*. And while previously, the different kind of config parameters have been mixed, this diff now separates them.

Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example.

    // what is this supposed to do?
    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .aliasInfo(DEFAULT)
        .dispatchKey(CPUTensorId()));

If we get more kernel config parameters in the future, we could introduce something like this

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(torch::RegisterOperators::kernelOptions()
            .dispatchKey(CPUTensorId())
            .otherConfig());

but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility.

A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call:

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel1>(CPUTensorId())
        .kernel<Kernel2>(CUDATensorId()));

Reviewed By: dzhulgakov

Differential Revision: D15455790

fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11
2019-05-24 14:23:35 -07:00
a2328a27e9 Improve torch.cdist performance (#20605)
Summary:
Fix based on https://github.com/pytorch/pytorch/issues/15253
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20605

Differential Revision: D15396123

Pulled By: ifedan

fbshipit-source-id: 3ed373e68339a35360f083d4aad1b655abcaf97e
2019-05-24 14:06:55 -07:00
4501dc305d Assert against using Operator methods not supported when exporting it to c10 (#17818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14392459

fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52
2019-05-24 13:45:01 -07:00
c8f404a68e Revert D15499918: Reduce set of build/tests which run on PRs.
Differential Revision:
D15499918

Original commit changeset: 992e3f91f95d

fbshipit-source-id: 86fc43d3da7ea3e3a32e95fc4f4f3de6cbd5d49b
2019-05-24 12:55:04 -07:00
d03265b44f Reduce set of build/tests which run on PRs. (#20775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20775
ghimport-source-id: 8d05ed03b8a841233d578a38b7c84bd1152c08e5

Differential Revision: D15499918

Pulled By: ezyang

fbshipit-source-id: 992e3f91f95dd9c0564e5ed6793dd1b286ddba00
2019-05-24 12:29:52 -07:00
dee11a92c1 Use Device instead of Backend in TensorIterator (#20690)
Summary:
This PR also moves Device::validate into the header file, which makes
statements like `Device d = kCPU` effectively free.

Device includes the device's index, so TensorIterator::compute_types
now implicitly checks that all CUDA inputs are on the same GPU.
Previously, this was done ad-hoc in places like TensorIterator::binary_op.

Note that zero-dim Tensor (scalars) are NOT required to be on the
same device as other inputs because they behave almost like Python numbers.
TensorIterator handles copying zero-dim Tensors to the common device.

Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU
and GPU, but not between different GPUs (because Backend didn't encode
the GPU index). This removes that restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690

Differential Revision: D15414826

Pulled By: colesbury

fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f
2019-05-24 12:14:08 -07:00
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
47043220ee Update version strings to 1.2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20812

Differential Revision: D15451892

Pulled By: gchanan

fbshipit-source-id: 07355dbd446053a69b5cf4e3be1842aa1075c71f
2019-05-24 11:07:29 -07:00
a640c81536 Add llvm8 installation step. (#20879)
Summary:
Add ability to build docker container with llvm8.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20879

Differential Revision: D15497037

Pulled By: rdzhabarov

fbshipit-source-id: d673d1ddd4156c95516e61223b397c2f9bce1214
2019-05-24 10:51:53 -07:00
fa20327618 Update Refinement Docs (#20912)
Summary:
To say that we don't do refinement on module attributes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20912

Differential Revision: D15496453

Pulled By: eellison

fbshipit-source-id: a1ab9fb0157a30fa1bb71d0793fcc9b1670c4926
2019-05-24 10:17:55 -07:00
8c4b2a835b Remove extra workspace queries in matrix inverse computation (#20904)
Summary:
Earlier, the workspace size query and allocation was placed inside the loop.
However, since we have batches of matrices with the same number of rows and columns, the workspace size query and allocation for every matrix in the batch is redundant.

This PR moves the workspace size query and allocation outside the loop, effectively saving (batch_size - 1) number of queries and allocation (and consequently the deallocation).

There is a tremendous speedup in inverse computation as a result of this change.

Changelog:
- Move workspace query and allocation outside the batch loop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20904

Differential Revision: D15495505

Pulled By: ezyang

fbshipit-source-id: 226729734465fcaf896f86e1b1a548a81440e082
2019-05-24 09:59:37 -07:00
4109ec1278 In Dockerfile, do not install unecessary packages, use conda to install ninja (saving one layer), and use "." to refer to WORKDIR to reduce redundancy. (#20881)
Summary:
- Do not install unecessary packages in the Docker image.
- In the Docker image, use conda to install ninja (saving one layer)
- When workdir is set, use "." to refer to it to reduce redundancy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20881

Differential Revision: D15495769

Pulled By: ezyang

fbshipit-source-id: dab7df71ac107c85fb1447697e25978daffc7e0b
2019-05-24 09:32:40 -07:00
6af2482612 Leave it as an option for whether to colorize output during build (#20771)
Summary:
Currently PyTorch forces color output due to #20662. But users should be left an option to turn it off because redirection of the output to a file would be messed if color output is forced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20771

Differential Revision: D15495677

Pulled By: ezyang

fbshipit-source-id: 9d89bbed40d0b67368554305394763a54c5ff6f5
2019-05-24 09:22:52 -07:00
ec45baf4dd tensor_illustration with correct numbers and better fonts for README file (#20751)
Summary:
Fix of README tensor image for issue #20641
Numbers are fixed, symbols made more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20751

Differential Revision: D15495706

Pulled By: ezyang

fbshipit-source-id: b6013574d16253ec681fc57143efe3d53952fbe9
2019-05-24 09:18:18 -07:00
ef1fdc27a3 Raise TypeError when the argument to isinf and isfinite is not a tensor (#20817)
Summary:
Currently when the argument to isinf and isfinite is not tensor, a ValueError is raised. This, however, should be a TypeError, because the error is a type mismatch.

In the error message, "str(tensor)" is replaced by "repr(tensor)" because, when an error occurs, a printable representation of the object is likely more useful than the "informal" string version of the object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20817

Differential Revision: D15495624

Pulled By: ezyang

fbshipit-source-id: 514198dcd723a7031818e50a87e187b22d51af73
2019-05-24 09:18:15 -07:00
87040af498 Fix documentation for attention mask shape (#20850)
Summary:
Attention mask should be of shape `(L, S)` since it is added to `attn_output_weights`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20850

Differential Revision: D15495587

Pulled By: ezyang

fbshipit-source-id: 61d6801da5291df960daab273e874df28aedbf6e
2019-05-24 09:10:11 -07:00
a5c90aaf47 Use "length of the RNN input" instead of "length of the RNN"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20873

Differential Revision: D15495570

Pulled By: ezyang

fbshipit-source-id: e3b4cd67ccf97d0053ac053c3bcb74415b928c0a
2019-05-24 09:03:50 -07:00
3e4f213e82 Instructions for how to update pytorch-ci-hud when updating binary builds (#20758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20758
ghimport-source-id: ffb4c97c42c6efbb16ea5d93ea8af1bdf71cb1e4

Differential Revision: D15435639

Pulled By: ezyang

fbshipit-source-id: a12bde8b0b11bbe0d0280b6b3994d9c65dc4f5cc
2019-05-24 07:20:06 -07:00
c3d05e86cc Resend "Split ATen/Parallel into interface and backend" (#20825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20825
ghimport-source-id: 0371fbd37cb37635647d473d5ac9f2859e787061

Differential Revision: D15458073

Pulled By: ilia-cher

fbshipit-source-id: cd27d0da1691f6be1183cd152348ac0d93a53996
2019-05-24 02:03:06 -07:00
6b74856747 Fix init_thread calls in thread pool initialization (#20848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20848
ghimport-source-id: e542858a198252838c1f3100dbfbe90fd3960f07

Differential Revision: D15466918

Pulled By: ilia-cher

fbshipit-source-id: e75d38f51edd5b508c4ca28a292e4141e90f209f
2019-05-24 01:14:31 -07:00
1bb728fe14 Change the quantizer to match the behavior of the FBGEMM implementation (#20892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20892

FBGEMM uses 64 bit values. Need to change our implementation to match

Reviewed By: jerryzh168

Differential Revision: D15487664

fbshipit-source-id: 29cba26093c6f9aeafce14982c1ae12149e63562
2019-05-24 00:46:08 -07:00
fc941d3bca Catchall kernels instead of fallback kernels (#20773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20773

This removes the feature to register fallback kernels that are called when no other kernel matches.
Instead, we introduce the concept of catchall kernels that are always called independent of inputs.
If you only have a fallback/catchall kernel and no kernels with concrete dispatch keys, then both concepts behave in the same way.
The difference is that we now disallow operators to have both, a catchall kernel and kernels with concrete dispatch keys.
This was possible before when they have been fallback kernels.

The reason for this change is that we anticipate needing a method_missing feature in backends, i.e. a backend-wide fallback to call when the backend doesn't specify a kernel for an operator.
We are not clear on precendence between this backend-wide fallback and an operator level fallback. Disallow fallbacks for now so we are free to choose later without breaking backwards compatibility.

Reviewed By: dzhulgakov

Differential Revision: D15438977

fbshipit-source-id: cb3aa764a1659d909ee21a7bd8ec3d32438aafaa
2019-05-23 23:47:51 -07:00
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
8cde4c4d22 Remove Variable::Impl and DifferentiableViewImpl (#17072)
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.

After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.

**Note that this PR is BC-breaking in the following use cases:**

**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.

**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
    # Change gradient to a sparse tensor
    params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))

grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad)  # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072

Differential Revision: D14075257

Pulled By: yf225

fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
2019-05-23 21:09:04 -07:00
f93e0619f3 Adding ShufflenetV2 to caffe2's benchmark suite. (#20180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20180

Adding ShufflenetV2 (by Ma et. al. 2018) to the caffe2's benchmark
suite.

To run, use: `buck run mode/opt caffe2/caffe2/python/examples:imagenet_trainer -- --train_data null --batch_size 128 --epoch_size 3200 --num_epochs 2 --num_gpus 2 --model shufflenet`

Reviewed By: bddppq, xw285cornell

Differential Revision: D15094282

fbshipit-source-id: 0e1ce9c5975868e917b0f179e2c5b15647a76b4e
2019-05-23 20:40:17 -07:00
3aa7ee6fe6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 17161d7e1e742b402715f8ed006e5b3abfa78561
2019-05-23 20:40:14 -07:00
cfb6c4a8ee Updating submodules
Reviewed By: yns88

fbshipit-source-id: 58b230ad12620032f391733c7f9c1e44aeaa390b
2019-05-23 19:49:06 -07:00
62af37aa88 dropout symbolic_script should respect the training flag (#20760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760
ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0

Reviewed By: jamesr66a

Differential Revision: D15486511

Pulled By: suo

fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0
2019-05-23 18:17:17 -07:00
bd53c8eb93 Move torchvision install out of onnx test script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20890

Differential Revision: D15486657

Pulled By: bddppq

fbshipit-source-id: 3acd7386d1f070cad9bd43d6e74244b706c0dc16
2019-05-23 18:02:48 -07:00
d5b7138a2c Dict is a reference type (#20669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20669

Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.

Reviewed By: dzhulgakov

Differential Revision: D15404911

fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
2019-05-23 15:24:31 -07:00
93d5503f34 bug fix 19374 - fix for upsample export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20116

Differential Revision: D15256899

Pulled By: houseroad

fbshipit-source-id: cf0dfd679d528fbb77f483e23071f4a96fb27091
2019-05-23 14:48:23 -07:00
48bf7b9be8 Fix oscillation in coalesceInsertedDataDependencies (#20833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833

Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here.

Reviewed By: tracelogfb

Differential Revision: D15463880

fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575
2019-05-23 14:04:20 -07:00
2d96876d88 Use conda torchvision version (#20865)
Summary:
This tries to fix the following error on current master:
```
May 23 16:18:47 Traceback (most recent call last):
May 23 16:18:47   File "main.py", line 7, in <module>
May 23 16:18:47     from torchvision import datasets, transforms
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/__init__.py", line 1, in <module>
May 23 16:18:47     from torchvision import models
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/__init__.py", line 11, in <module>
May 23 16:18:47     from . import detection
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/__init__.py", line 1, in <module>
May 23 16:18:47     from .faster_rcnn import *
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
May 23 16:18:47     from torchvision.ops import misc as misc_nn_ops
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/ops/__init__.py", line 1, in <module>
May 23 16:18:47     from .boxes import nms, box_iou
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/ops/boxes.py", line 2, in <module>
May 23 16:18:47     from torchvision import _C
May 23 16:18:47 ImportError: /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19NonVariableTypeMode10is_enabledEv
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20865

Differential Revision: D15481736

Pulled By: yf225

fbshipit-source-id: 67d4fd70652ccc709b44cb15392d6e44a8fe9235
2019-05-23 13:49:59 -07:00
b6d0f6c85a Move THCTensor_{random, clampedRandom, cappedRandom} to ATen (#20620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20620
ghimport-source-id: 7c09c2462021e3fa5adef61570a575964ff16125

Differential Revision: D15454050

Pulled By: ezyang

fbshipit-source-id: 5b0421c56445baf19dbdbdd9680af128a5cdf443
2019-05-23 13:44:16 -07:00
48424a6c94 Avoid dynamic dispatch inside the omp loop in AdaptiveAvgPool2d (#20366)
Summary:
This PR changes CPU implementation of `AdaptiveAveragePool2D` by
- move dispatch to outside the OpenMP loop
- support fp16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20366

Differential Revision: D15456069

Pulled By: ezyang

fbshipit-source-id: 00fa2916f8b136af9f5c8b5db0eca4619f9f5bac
2019-05-23 13:29:29 -07:00
cf0268e51c Modify cos to cosh in Vec256 (#20797)
Summary:
Minor typo fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20797

Differential Revision: D15469308

Pulled By: ezyang

fbshipit-source-id: 3288ad69316e296e46d861737c5b09e0ea1e694b
2019-05-23 13:24:09 -07:00
70caa2efe2 Add mkldnn sigmoid operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20820

Reviewed By: dzhulgakov

Differential Revision: D15455866

fbshipit-source-id: 712b06dfbd441051dc284a1acdf94926df09bc1d
2019-05-23 12:51:57 -07:00
8dedb04c26 Enable torch.jit.trace for mkldnn modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20800

Differential Revision: D15447892

fbshipit-source-id: 78e76523c5412c020a2bc22d6998ff7b36356720
2019-05-23 12:51:54 -07:00
63585c3b81 Add support for save and load mkldnn modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20799

Reviewed By: wanchaol

Differential Revision: D15447891

fbshipit-source-id: e34de946c79282fb934a5c52ff1def41c7993c75
2019-05-23 12:51:50 -07:00
5f83c5d834 Fix build error with MSVC (#20853)
Summary:
Close #20642

Possibly broken by #19816
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20853

Differential Revision: D15474620

Pulled By: jerryzh168

fbshipit-source-id: 99b52d92a93bac7cab52537f1ebdbd286d4b2cfe
2019-05-23 12:11:29 -07:00
31e2d20c5e Dictionarize check_inputs coming from trace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20813

Differential Revision: D15466836

Pulled By: Krovatkin

fbshipit-source-id: ffdb418592b76dc67c65c59f4dc7303f08734f97
2019-05-23 11:17:55 -07:00
2c556a9489 fix the input/output type mismatch (#20829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20829

as title

Reviewed By: jamesr66a

Differential Revision: D15461937

fbshipit-source-id: 02c7150c0e8d020030ae8898008f718c74850dca
2019-05-23 11:08:21 -07:00
9c57d8df42 Make LayerNorm.normalized_shape a tuple
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20832

Pulled By: driazati

Differential Revision: D15464693

fbshipit-source-id: 244f24d6917b17dde5e33ff852c716fb053b7ca5
2019-05-23 10:51:08 -07:00
99b3f5cd70 Fixes error with custom scalars, fixes #20579 (#20580)
Summary:
When adding custom scalars like this
```python
from torch.utils.tensorboard import SummaryWriter

with SummaryWriter() as writer:
    writer.add_custom_scalars({'Stuff': {
        'Losses': ['MultiLine', ['loss/(one|two)']],
        'Metrics': ['MultiLine', ['metric/(three|four)']],
    }})
```
This error is raised:
```
TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorboard.SummaryMetadata.PluginData got list.
```

Removing the square brackets around `SummaryMetadata.PluginData(plugin_name='custom_scalars')` should be enough to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20580

Differential Revision: D15469700

Pulled By: orionr

fbshipit-source-id: 7ce58034bc2a74ab149fee6419319db68d8abafe
2019-05-23 10:17:36 -07:00
a16708a1ae Workaround python2.7 find_module limitation / explicitly close file (#20782)
Summary:
fix #20781 #20757
hmm I don't know an easy way to add a test to make sure it runs against a package installed as .egg. But i tested it locally with torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20782

Differential Revision: D15443600

Pulled By: ailzhang

fbshipit-source-id: 285eb0d9a44d6edb8e93618fa293f4feb431d2ae
2019-05-23 09:44:17 -07:00
ec57d1f18a Port dilated_max_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20691

Differential Revision: D15435960

Pulled By: ezyang

fbshipit-source-id: 548b7cc42e52ad2c641ec7d9cf78028d9411d02e
2019-05-23 09:04:04 -07:00
f039401bf2 Add back at::_copy_from for use by XLA (#20783)
Summary:
XLA needs a way to override CPUTensor.copy_(XLATensor), but we only
dispatch on the "self" argument. This inverts the dispatch order when
"src" is an unhandled type.

Note that things like XLATensor.copy_(CPUTensor) never enter this
implementation.

cc dlibenzi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20783

Differential Revision: D15443187

Pulled By: colesbury

fbshipit-source-id: 4ee93ba598ef0fed2a99c0683aae30cb50a1f99c
2019-05-23 08:47:20 -07:00
80aed36fb6 fix a couple of typos in README markdown (#20819)
Summary:
was reading the README on github and came across a couple of typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20819

Differential Revision: D15469603

Pulled By: nairbv

fbshipit-source-id: 0ed7868de2d4e6d82557a8c170783966f8a1afd7
2019-05-23 08:11:25 -07:00
8fc069fa17 add batch of string ops (#20826)
Summary:
First batch of https://github.com/pytorch/pytorch/issues/20769, handles `isupper`, `islower`, `isdigit`, `isspace`, `isalnum`, `isalpha`, `upper`, `lower`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20826

Differential Revision: D15466986

Pulled By: eellison

fbshipit-source-id: d1df65721da803dfa30e28fdd9b874405be6bc7d
2019-05-23 08:01:16 -07:00
90182a7332 Install torchvision from master
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20836

Differential Revision: D15464705

Pulled By: bddppq

fbshipit-source-id: abe2ac2de2bf4c8d07334e6b2565c738c40428ae
2019-05-23 02:16:57 -07:00
d35a587958 Remove cpu_half, cpu_bool, cuda_bool from native_functions.yaml (#20552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20552
ghimport-source-id: 0ef4e85b40f3b927564257f44f72f671251acaf1

Differential Revision: D15362154

Pulled By: li-roy

fbshipit-source-id: b2477582389099c6696dca33f1371e8e136e32b6
2019-05-22 22:58:40 -07:00
41100d4027 Add PerChannelAffineQuantizer (#20764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20764

att

Reviewed By: dskhudia

Differential Revision: D15367364

fbshipit-source-id: 1d3ebf356ceac73b0fa4493209839d1c66d4d5b3
2019-05-22 19:16:52 -07:00
a21cf76575 Revert D15459166: [pytorch][PR] add batch of string ops
Differential Revision:
D15459166

Original commit changeset: 0ed908022475

fbshipit-source-id: d0a04228605e3437a02961a525eed8f8b3b59c17
2019-05-22 19:07:50 -07:00
5952ca8d9f Remove duplicated _optimize_trace and use core (#20394)
Summary:
The duplicated code of `_optimize_trace` in _pytorch_graph.py is used to bypass some optimization step which causes missing scope.

It seems that most of the problematic steps have been fixed recently. Standard models implemented in torchvision are visually inspected before the commit. However, the `+=` in 50d54a82d1/torchvision/models/resnet.py (L63) will let f4d9bfaa4d/torch/onnx/utils.py (L159) produce a bad result. It can be fixed by replacing it with `out += identity`. This also implies that `+=` has non-intuitive behavior.

cc orionr ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20394

Reviewed By: NarineK

Differential Revision: D15452204

Pulled By: orionr

fbshipit-source-id: eaa4c13f16551c78dc6419f1e22eb2c560af4cc5
2019-05-22 18:34:20 -07:00
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
cde611a66c Quantized Conv2d operator (#20772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20772

Copy of D15178352

A conflicting commit landed at the same time as D15178352 that removed registering kernels using IntArrayRef, Hence, D15178352 was revered. Using std::vector instead.

Reviewed By: zafartahirov

Differential Revision: D15437237

fbshipit-source-id: cd2f1caebcc720352b48ce25d716cb1ca49a5197
2019-05-22 17:53:24 -07:00
aebcd80ae4 add batch of string ops (#20826)
Summary:
First batch of https://github.com/pytorch/pytorch/issues/20769, handles `isupper`, `islower`, `isdigit`, `isspace`, `isalnum`, `isalpha`, `upper`, `lower`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20826

Differential Revision: D15459166

Pulled By: eellison

fbshipit-source-id: 0ed908022475e27011803cc4af7cf393a4312783
2019-05-22 17:33:04 -07:00
7aa3887f43 make wildcards alias only each other (#20670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20670
ghimport-source-id: f5704c49fcb829e4668441f31fcf9305da22335c

Reviewed By: jamesr66a

Differential Revision: D15447567

Pulled By: suo

fbshipit-source-id: 391236806838de2524410e26946456441e562470
2019-05-22 16:50:09 -07:00
90910fc6cb Mark values entering containers as wildcards (#20556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20556
ghimport-source-id: d7c62e38a2f6928f6f8d988c26a38ea8f8cff8b6

Reviewed By: jamesr66a

Differential Revision: D15447568

Pulled By: suo

fbshipit-source-id: 77ebc11b571b8517d3bad3ee1b3ee5ac037542b2
2019-05-22 16:50:06 -07:00
28be521e39 Fix bug in exporting node with multiple outputs by scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20256

Differential Revision: D15422040

Pulled By: houseroad

fbshipit-source-id: 5de2a992d7d99a48905c39a1878eb0b3b68d6a3f
2019-05-22 16:29:36 -07:00
c2e3e79afc fix pow bug on overloads and clean up (#20824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20824
ghimport-source-id: ceb1b64e2866ec8577800a8c378d8222a62cf199

Reviewed By: cpuhrsch

Differential Revision: D15458009

Pulled By: wanchaol

fbshipit-source-id: 51546d142d2c84e961d8b12ae85a2988a342da3b
2019-05-22 16:21:18 -07:00
98928f4d79 Allow both Variables and Tensors in c10 kernel interface (#20816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20816

Previously, the c10 dispatcher expected ops to be called with Variables and unwrapped them to Tensors before calling into the kernel.
The kernel was expected to return Tensors that were re-wrapped into Variables before passing them on into the system.

However, that doesn't work with kernels that call other operators. One recent example was a kernel that returned the result of `torch::ones()` as output.
Now, with this diff, the c10 dispatcher still passes Tensors to the kernel and Variables back into the system, but it accepts ops to be called with both Tensors or Variables
and kernels are also allowed to return either.

After https://github.com/pytorch/pytorch/pull/17072 , we should be able to get rid of the whole wrapping/unwrapping logic.

Reviewed By: hl475

Differential Revision: D15453963

fbshipit-source-id: 7602b7f2bc43e8ceb8a8c0e97aafcc53d4c47b6c
2019-05-22 16:03:12 -07:00
9ea009fe8b Add as_quantized_tensor (#20740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740

Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.

Differential Revision: D15232416

fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
2019-05-22 15:19:45 -07:00
12bc81ae2a Change comparison ops result dtype to bool [Part1] (#20767)
Summary:
This is the first part of the planned changes to change the comparison operations result tensor dtype from Byte to Bool. You can see the whole list of changes (not cleaned up) [here](https://github.com/pytorch/pytorch/pull/19332). As the PR is too big for a single review im breaking it into pieces.

**Changes in this PR:**
1. Enable these methods for bool tensors:
- maskedSelect
- maskedSelectBool
- bitand
- cbitand
- bitor
- cbitor
- bitxor
- cbitxor
- sign
- equal
- neg

2. Add bool clause for the TH version of sign method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20767

Differential Revision: D15436446

Pulled By: izdeby

fbshipit-source-id: 8d2494b5f4873cd79c7f1a40d2cb045cadfad51a
2019-05-22 15:12:46 -07:00
6ec3c12255 Update references to minimum CUDA and cuDNN version (#20718)
Summary:
I didn't update the Windows references because I wasn't sure if they apply to CUDA 9. peterjc123 what should the Windows section say?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20718

Differential Revision: D15459276

Pulled By: colesbury

fbshipit-source-id: 917e22f8ac75378d88c962c226b5a42b6799c79a
2019-05-22 14:54:54 -07:00
05543153dd CUDA implementation of fakequant (#20252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20252

Add CUDA implementation for fakequant op for quantization aware training.

Reviewed By: zafartahirov

Differential Revision: D15243386

fbshipit-source-id: 37610ab046786ffc69aaec5235e5df8304c353d6
2019-05-22 14:46:39 -07:00
fdb923996d Revert D15445092: Some minor fix to unblock the Bert model quantization
Differential Revision:
D15445092

Original commit changeset: 22da41a56ecb

fbshipit-source-id: eca9a85900bf48fe6a9da5cfff61606a10f0c3de
2019-05-22 14:25:14 -07:00
cfc98ae714 fix add_histogram_raw (#20688)
Summary:
This is a porting of the fix from:
https://github.com/lanpa/tensorboardX/issues/421

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20688

Reviewed By: NarineK

Differential Revision: D15415093

Pulled By: orionr

fbshipit-source-id: d32a6298218fbc6fe315aa0f18b57e0c8ef92627
2019-05-22 14:06:21 -07:00
fd2aa93b37 Exposing LengthsSum/Mean/Max in pytorch (#20802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20802

Need this for sequence model

Reviewed By: dzhulgakov

Differential Revision: D15448529

fbshipit-source-id: cd5abe3b689fc0e02feff10faf8cd61c99369f4f
2019-05-22 13:55:19 -07:00
8d7a025703 ONNX Export Scatter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18543

Differential Revision: D14658639

Pulled By: houseroad

fbshipit-source-id: 5d7821b54d2fc93f71120155adf328897d13aff6
2019-05-22 13:31:54 -07:00
fea4a56af3 Add ability to filter metric schema in LayerModelHelper (#20786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20786

Add a method to LayerModelHelper to filter metrics_schema. A general model builder may add metric schema that is not needed in some situations. This change add the ability to skip those unneeded.

Reviewed By: alex1o1o7cloud

Differential Revision: D15418140

fbshipit-source-id: 520f5dffd9938cf206cb1352e2953a4d4d2b6ab1
2019-05-22 12:26:20 -07:00
810816a1f9 Automatic update of fbcode/onnx to cc2333a3f929caca7223b98699237f19388dd585 (#20763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20763

Previous import was ead449a30d026a7a0a59e2ba0a42ca8e52ec2359

Included changes:
- **[cc2333a3](https://github.com/onnx/onnx/commit/cc2333a3)**: Version Conversion of Min, Max, Mean from opset 7 to 8 (#2032) <Ksenija Stanojevic>
- **[5d0975f4](https://github.com/onnx/onnx/commit/5d0975f4)**: Fix auto_pad shape inference bug (#2028) <stevenlix>
- **[819afd05](https://github.com/onnx/onnx/commit/819afd05)**: Version Conversion from opset 8 to 9 (#2007) <Ksenija Stanojevic>
- **[6c913669](https://github.com/onnx/onnx/commit/6c913669)**: fix macro ONNX_DISALLOW_COPY_AND_ASSIGN bug (#2017) <one-hello>

Reviewed By: BIT-silence

Differential Revision: D15425957

fbshipit-source-id: b799357930e8c9421e9bfcbfd97907e086862a6d
2019-05-22 11:37:01 -07:00
4e0d098ace Fix optimizer type hint (#20648)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20548
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20648

Differential Revision: D15453935

Pulled By: ezyang

fbshipit-source-id: 8778e819c58fdc2620f123ec5b5fd568e23b7705
2019-05-22 11:27:40 -07:00
795a1a6ffa When detecting numpy, assign relavant variables outside the try block (#20739)
Summary:
When detecting the presence of NumPy using import, move numpy-related variable assignments outside the try block (i.e., to an else block) to improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20739

Differential Revision: D15453916

Pulled By: ezyang

fbshipit-source-id: d3c37f2b290846be3c6a1462251cbb3e95d493be
2019-05-22 11:27:36 -07:00
fd95947e68 Revert D15248618: Split ATen/Parallel into interface and backend
Differential Revision:
D15248618

Original commit changeset: 060879266bc8

fbshipit-source-id: fc5cbb030b87613c9e15100118c3d4a064097c20
2019-05-22 09:55:51 -07:00
70ecddfd76 Some minor fix to unblock the Bert model quantization (#20787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20787

Set requires_grad=False for bias: this will block the jit tracing.
The as_type fix: The input tensor shape and output tensor shape will be different, which will trigger the assertion failure at https://fburl.com/0m8xy7tc.

Reviewed By: jamesr66a

Differential Revision: D15445092

fbshipit-source-id: 22da41a56ecb9ac092585d0cc1ff0658fb9d631b
2019-05-21 23:13:08 -07:00
a501e7d5be Add quant-dequant nodes for bias. (#20045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20045

This pass adds quant-dequant nodes for bias. This pass requires
quant-dequant pass for activations and weights to be done as it is required
to compute the qparams for bias

Differential Revision: D15179141

fbshipit-source-id: 3aab9fceefcadc3fa42a4e802d9b1e18addad78a
2019-05-21 21:59:37 -07:00
c2d0e7316f Add DictType to Metadata (#20770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20770

Add dict type since it's part of the pytorch built-in system, and sparse features and text features will be converted to Dict

Reviewed By: pritamdamania87

Differential Revision: D15436255

fbshipit-source-id: 239adbd6a8f68be29020fe656d790f6872f1f0e9
2019-05-21 21:53:06 -07:00
70eb315da4 Use AT_INTERNAL_ASSERT in test_base (#20555)
Summary:
as title. We were using AT_ASSERT, which is newly deprecated. In this case, we do in fact want an internal assertion since this is used in testing code to describe expected behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20555

Differential Revision: D15362964

Pulled By: suo

fbshipit-source-id: 984bfe71a774571611f3bbd81767d3cdb878a6fd
2019-05-21 21:25:07 -07:00
4a85e7955c Rename FC to Linear in the test routine (#20716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20716

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410823

fbshipit-source-id: e82fc241ee288b41304675cb087c0cdcd60d7148
2019-05-21 19:58:19 -07:00
77651615c8 fbgemm precision argument (#20790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20790

att

Reviewed By: jianyuh

Differential Revision: D15445903

fbshipit-source-id: fd338aea55e40eecc780be881e67417679e2ea35
2019-05-21 19:26:15 -07:00
c4a3b4d528 Split ATen/Parallel into interface and backend (#20057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20057
ghimport-source-id: c583f61bf661c994eb4d0625748a299e892a7246

Differential Revision: D15248618

Pulled By: ilia-cher

fbshipit-source-id: 060879266bc8616916fe220adef6ae6c0b076fbd
2019-05-21 19:15:47 -07:00
adbab82846 int_repr for different quantized types (#20656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20656

att

Reviewed By: zafartahirov

Differential Revision: D15398134

fbshipit-source-id: b02899d4ff33598416f65cf76b2ecc62adee243b
2019-05-21 17:57:42 -07:00
c1d6bcf301 Use SmallVector to allocate Compound operands inline. (#20762)
Summary:
Reduces load time for serialized ResNet-18 by 5.5%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20762

Differential Revision: D15437364

fbshipit-source-id: 2ba34dd229a1054553d0ee09f044ce1915377d78
2019-05-21 16:37:52 -07:00
c9da01194a Optimize pytorch layer_norm forward (#20345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20345

Seperate from D15194600
Optimize pytorch layer_norm op part 1:
  optimize layer_norm_forward_cpu
  import Eigen Maps for the performance of reduction

Reviewed By: zheng-xq

Differential Revision: D15290608

fbshipit-source-id: cf2c208dfd6fbcbc4c69db3ed60278d9bee156b5
2019-05-21 15:59:49 -07:00
9cec8ae146 use tensoriterator instead of th for fill_ implementation. (#20719)
Summary:
Moves fill_ to aten as suggested in:
https://github.com/pytorch/pytorch/pull/20336#issuecomment-493260729

borrows from cpuhrsch's PR: https://github.com/pytorch/pytorch/pull/18876/files#diff-0d1178f1a4ce15aeb760d251974e6924

Co-authored-by: cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20719

Differential Revision: D15439420

Pulled By: nairbv

fbshipit-source-id: cbcc313cda61a528cecc4a28d601871565e6110c
2019-05-21 15:45:45 -07:00
7a0c6d528a Fix copy_transpose_valid check (#20759)
Summary:
Fixes #20755

(Was broken in #20685)

cc vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759

Differential Revision: D15433712

Pulled By: colesbury

fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f
2019-05-21 15:37:37 -07:00
5acc664f9d make magic methods work with casts too (#20654)
Summary:
Previous implementation of magic methods extended from BuiltinOperators, but it should be able to work with other sugared values, such as casts.

I was also considering making CastValue's and BuiltinOperators's extend from a MagicMethod super class, and having them try to call into the super's before their own call. However, not all Builtin Operators have corresponding magic methods so i did it this way instead (although there are workarounds for that).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20654

Differential Revision: D15434469

Pulled By: eellison

fbshipit-source-id: 813fa00bf8b5b9ada46505075ebf984d8eee6aef
2019-05-21 14:23:06 -07:00
e6f22e1b89 Change Bias to QTensor with qint32(int32_t) (#20713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20713

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410734

fbshipit-source-id: c00f409278736cf9e3205f7d36dda1b96120f47d
2019-05-21 14:17:37 -07:00
b9a150ede0 Change Weight to QTensor with qint8(int8_t) (#20712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20712

As Title says.

Differential Revision: D15410696

fbshipit-source-id: 48147a79d8cc47a724eb473796a37a1c64f8e883
2019-05-21 14:17:34 -07:00
ac2314fdeb Fix a bug in quantize_linear (#20711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20711

For uint8_t, ```std::numeric_limits::digits``` returns 8;
For int8_t, ```std::numeric_limits::digits``` returns 7.

FBGEMM wants to get the ```qparams.precision``` to be always 8 for both int8_t and uint8_t.

Reviewed By: jerryzh168

Differential Revision: D15410695

fbshipit-source-id: 17dc3842d7c426947454c201bcb167b87b7301ce
2019-05-21 14:17:31 -07:00
32803b52f6 Update Conda description in PyTorch README (#20726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20726

Edward says it doesn't actually provide compilers,
but it does provide dependencies, so let's mention that instead.

Reviewed By: ezyang

Differential Revision: D15423316

fbshipit-source-id: 9b384f88e5bf7a3d2c132508620c276b49e1569f
2019-05-21 14:12:30 -07:00
5d8879cf6d Auto-convert GPU arrays that support the __cuda_array_interface__ protocol (#20584)
Summary:
This PR implements auto-conversion of GPU arrays that support the `__cuda_array_interface__` protocol (fixes #15601).

If an object exposes the `__cuda_array_interface__` attribute, `touch.as_tensor()` and `touch.tensor()` will use the exposed device memory.

#### Zero-copy
When using `touch.as_tensor(...,device=D)` where `D` is the same device as the one used in `__cuda_array_interface__`.

#### Implicit copy
When using `touch.as_tensor(...,device=D)` where `D` is the CPU or another non-CUDA device.

#### Explicit copy
When using `torch.tensor()`.

#### Exception
When using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`.

#### Lifetime
`torch.as_tensor(obj)` tensor grabs a reference to `obj` so that the lifetime of `obj` exceeds the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20584

Differential Revision: D15435610

Pulled By: ezyang

fbshipit-source-id: c423776ba2f2c073b902e0a0ce272d54e9005286
2019-05-21 14:06:46 -07:00
847d9c57d1 Improve the recommended citation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20768

Differential Revision: D15436734

Pulled By: ezyang

fbshipit-source-id: d073f3b76a60bd8edf1e7799a1bb153d04a09bb1
2019-05-21 13:51:56 -07:00
bb20956e3c Add support for CMake switches for VS 2019 (#20752)
Summary:
Appending `arch` to the generator name is not supported for VS starting from VS 2019.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20752

Differential Revision: D15436740

Pulled By: ezyang

fbshipit-source-id: 20057aae8f708d82619927bf2cb87dd1bc2df312
2019-05-21 13:46:39 -07:00
47dc65fe76 add str comparisons (#20761)
Summary:
add string comparisons
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20761

Differential Revision: D15434616

Pulled By: eellison

fbshipit-source-id: c00c7bac6308dbcc6a9e46b92421f49fb2d5a81c
2019-05-21 12:47:50 -07:00
cca923c481 Add dequantize_linear for JIT pass (#20107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107

att

Reviewed By: nishantpdce

Differential Revision: D15202187

fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc
2019-05-21 12:26:48 -07:00
cc02a1af61 Throw error if multiple kernels registered (#20737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20737

If someone tries to register multiple kernels in the same .op() call, we're now throwing an error.

Differential Revision: D15425660

fbshipit-source-id: 6d2f1444da3e16a6a98863d847965c2aa211e046
2019-05-21 12:17:01 -07:00
f3d827f311 Hipify fb/quantize
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20725

Reviewed By: bddppq

Differential Revision: D15407710

fbshipit-source-id: e5fdeee7e2dffd43cfdd6fab6193eb8a80902c02
2019-05-21 10:51:36 -07:00
b5edeca39d Split cpu/gpu in caffe2/distributed + some clean up (#20674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20674

A few targets in caffe2/caffe2/distribute needs to be split too, otherwise won't compile. Also some clean ups and make select_gpu_type to gpu_library_selector

Differential Revision: D15406019

fbshipit-source-id: 6455ab885b248502b48d4c7565597e00fecfd547
2019-05-21 10:51:33 -07:00
d7cd2d7a8c compile with -fcolor-diagnostics (#20662)
Summary:
Let there be color!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20662

Differential Revision: D15434110

Pulled By: suo

fbshipit-source-id: a317ae72ad72e0b8249f55c9c8d31f420c78c040
2019-05-21 10:32:55 -07:00
c790f10e2d Fix missing cudatoolkit dependency in binary linux tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20732

Differential Revision: D15434025

Pulled By: pjh5

fbshipit-source-id: 74a5798d14b6e61cdcdc784c159294b87264d3de
2019-05-21 10:27:15 -07:00
e3970d66d4 Fixing upload_binary_htmls again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20736

Differential Revision: D15433417

Pulled By: pjh5

fbshipit-source-id: 58964a341226b536be899855058422cb82aa054b
2019-05-21 10:16:08 -07:00
fac307a5cf Revert D15178352: [pt1][quant] Quantized Conv2d operator
Differential Revision:
D15178352

Original commit changeset: 2e5453283137

fbshipit-source-id: 73cf64c483eedbd41a047e7593c0c92bbd33008c
2019-05-21 09:59:57 -07:00
eca7fa35a4 Fix -Wattributes warning on older versions of gcc (#20587)
Summary:
building with cuda and gcc 4.8.5-28, we see many warnings like:

[893/1645] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THCUNN/caffe2_gpu_generated_ELU.cu.o
/home/bvaughan/repos/pytorch/c10/util/ArrayRef.h:277:48: warning: ‘deprecated’ attribute directive ignored [-Wattributes]
 using IntList C10_DEPRECATED_USING = ArrayRef<int64_t>;

This change prevents those warnings on the older compiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20587

Differential Revision: D15432749

Pulled By: nairbv

fbshipit-source-id: fd707afcbd6564f96617378d7cd6d62d941a052b
2019-05-21 09:47:40 -07:00
712c60f960 Fixing missing miniconda path in macos smokes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20727

Differential Revision: D15433407

Pulled By: pjh5

fbshipit-source-id: 2f5d4e1e49068e9597f7052deb70a287b91e482b
2019-05-21 09:47:37 -07:00
29b1b59449 Quantized Conv2d operator (#20064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20064

Initial implementation of quantized convolution operator using fbgemm.

Reviewed By: zafartahirov

Differential Revision: D15178352

fbshipit-source-id: 2e5453283137dc165e9a20164ffc138fa8caf88a
2019-05-21 09:13:42 -07:00
d73caca2a1 Add mandatory ScalarType nodes as input to the quant-dequant nodes. (#20468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20468

ScalarType node is mandatory for activations and parameters now.
This change inserts ScalarType node for all the quant-dequant nodes. For the activations, currently the default value is at::ScalarType::Undefined. Remove this and explicitly pass the at::ScalarType::QUint8 dtype

Differential Revision: D15331600

fbshipit-source-id: 5b51e0b42e694bf409026af4783a12da6d7e234b
2019-05-20 20:01:17 -07:00
371cf109a3 Increase static tolerance for negative feature ids
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20671

Reviewed By: Wakeupbuddy

Differential Revision: D15401078

fbshipit-source-id: a946b1df6fae2851d60fadf32e57feb44ba95f38
2019-05-20 19:09:22 -07:00
0beecbdaad fix soft_nms_cpu call in BoxWithNMSLimit (#20738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20738

D15348610 introduces a bug of misaligned arguments.

Reviewed By: isameer

Differential Revision: D15425627

fbshipit-source-id: b6345863847426ae04eb31245d13f7fcb69d0355
2019-05-20 18:49:41 -07:00
fbdafdffa1 Move bucketize_op to open source
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19952

Reviewed By: houseroad

Differential Revision: D15145552

fbshipit-source-id: e0074c878a5c164324a9cc477783285dedffd188
2019-05-20 18:03:27 -07:00
320c38555e Refactor CUDA copy and general copy dispatch (#20685)
Summary:
Copy.cu goes from 308 to 190 lines of code. In general it uses, the same
copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy
using temporary buffers. The pointwise kernel has slightly improved
performance when broadcasting due to faster index calculation.

This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only
entry-point now is "`copy_`".

A mini-benchmark is here:
https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992

Before:
https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18
After:
https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc

Results were measured on 2.2 GHz Broadwell; no-turbo; one thread;
compiled with GCC 7.3.0. (Results are slower than typical usage due to
turbo being off.)

The only significant differences is in the CUDA [1024] -> [1024, 1024]
broadcasting copy which is ~25% faster. I don't expect a noticeable
difference in real programs.

CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect
anyone to notice that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685

Differential Revision: D15414819

Pulled By: colesbury

fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440
2019-05-20 17:09:44 -07:00
cf7ef5e631 Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564

Reviewed By: bddppq

Differential Revision: D15106712

fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993
2019-05-20 16:57:11 -07:00
0bfc0eeef7 restore hidden visibility by default for Linux builds (#20461)
Summary:
Symbols are given hidden visibility by default on Linux to emulate the behavior on Windows.  This helps developers catch visibility issues in their streamlined Linux dev environment before being surprised, late in the process, by Windows errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20461

Reviewed By: kostmo

Differential Revision: D15410410

Pulled By: dzhulgakov

fbshipit-source-id: 1d684b5a9a80b692966a775c3f1c56b7c72ffc95
2019-05-20 16:49:37 -07:00
be1f83c350 Fix dll linkage for tensor type ids (#20547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20547

-

Differential Revision: D15359988

fbshipit-source-id: 680115a6b73f64c9b02f86eccb8feb799adc6c90
2019-05-20 16:25:09 -07:00
410c7210db Add save() to torch._C.Function (#20386)
Summary:
Fixes #20017

This wraps the `torch._C.Function` currently returned from `torch.jit.script` and `torch.jit.trace` in a `ScriptFunction` and `TracedFunction` respectively, both of which are just wrappers to hold the function.
](https://our.intern.facebook.com/intern/diff/15403161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20386

Pulled By: driazati

Differential Revision: D15403161

fbshipit-source-id: 94fb9f32929e62a00be6cf7512ea144ec9b91e0b
2019-05-20 16:19:51 -07:00
987f1ccf49 Add "ndim" property to tensor (#20565)
Summary:
For compatibility with numpy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20565

Differential Revision: D15374390

Pulled By: umanwizard

fbshipit-source-id: 4ab209a5fb27d8ba27ee7eb6b67b858ce2480594
2019-05-20 16:10:50 -07:00
6ae99aa5bc onnx/caffe2 tests: Do not execute models with CPU-only operators on GPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20720

Reviewed By: bddppq

Differential Revision: D15422322

Pulled By: houseroad

fbshipit-source-id: c79795434157ff5f0a7b2774fd40edc71cf35ba7
2019-05-20 16:04:45 -07:00
be33434d85 Unify the addQuantDequantNode api for inputs and outputs from quant nodes (#20677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20677

With new changes in IR, it is possible to insert nodes after param
nodes in graph. Thus we do not need to have two methods for inserting q-dq
nodes to input or output to quantizable nodes.

Differential Revision: D15406354

fbshipit-source-id: 1963762f434fd82877fa76a272e8520c342b6069
2019-05-20 15:27:45 -07:00
cf548ba683 De-deprecate old list and dict APIs (#20709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20709

- Remove ArrayRef based API. This is neither the old nor the planned new API.
- De-deprecate kernels based on std::vector and std::unordered_map. We don't have the Dict/List based API figured out entirely yet, so we shouldn't push people towards using them.
  std::vector and std::unordered_map will get deprecated again once we figured out List/Dict.

Reviewed By: dzhulgakov

Differential Revision: D15417025

fbshipit-source-id: bfbb33c762e43487bb499bc8cc36d515e678f8fc
2019-05-20 13:53:00 -07:00
c062175803 Remove unused var (ws_) and use vars in undefined case for compile (#20667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20667

Compilation errors:
```
xplat/caffe2/caffe2/utils/signal_handler.h:31:10: error: private field 'SIGINT_action_' is not used [-Werror,-Wunused-private-field]
  Action SIGINT_action_;
         ^
xplat/caffe2/caffe2/utils/signal_handler.h:32:10: error: private field 'SIGHUP_action_' is not used [-Werror,-Wunused-private-field]
  Action SIGHUP_action_;
         ^
xplat/caffe2/caffe2/utils/signal_handler.h:33:17: error: private field 'my_sigint_count_' is not used [-Werror,-Wunused-private-field]
  unsigned long my_sigint_count_;
                ^
xplat/caffe2/caffe2/utils/signal_handler.h:34:17: error: private field 'my_sighup_count_' is not used [-Werror,-Wunused-private-field]
  unsigned long my_sighup_count_;
                ^
4 errors generated.

xplat/caffe2/caffe2/share/fb/stylizer/median_blur_ops.cc:593:14: error: private field 'ws_' is not used [-Werror,-Wunused-private-field]
  Workspace* ws_;
             ^
1 error generated.
```

Reviewed By: bwasti

Differential Revision: D15402928

fbshipit-source-id: 5b98499850aa659fd37ab8e7f2e75166787b8129
2019-05-20 13:52:57 -07:00
af6eea9391 Add the support of feature store example in pytorch model in fblearner (#20040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20040

Add the support of feature store example in fblearner pytorch predictor, end to end

Reviewed By: dzhulgakov

Differential Revision: D15177897

fbshipit-source-id: 0f6df8b064eb9844fc9ddae61e978d6574c22916
2019-05-20 12:58:27 -07:00
9fbce974c9 torch::jit::RegisterOperators forwards to c10::RegisterOperators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20383

Reviewed By: zdevito

Differential Revision: D15300937

fbshipit-source-id: 740fe323fc0945759651116ae61aff4d36319d73
2019-05-20 12:41:04 -07:00
74bdcd44c4 Remove tab. (#20715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20715
ghimport-source-id: 600e244581b37152d86614cca6c9fb5fee6cdcde

Differential Revision: D15417984

Pulled By: ezyang

fbshipit-source-id: 939425de1a95ecc3798384e121b12faaba3a27b8
2019-05-20 11:57:18 -07:00
10445c0404 Finish removal of AT_CHECK, officially deprecate the macro. (#20600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20600

All future uses of AT_CHECK will fail our CI.

Reviewed By: jerryzh168

Differential Revision: D15375397

fbshipit-source-id: 5582664d6c7c4f1a56ae45647eb1bca49fed2866
2019-05-20 11:57:15 -07:00
c1fa449763 Break reference cycle in load_state_dict (#20397)
Summary:
load_state_dict includes a recursive inner function `load` that captures
Tensors through the close-over variable `state_dict`. Because it's
recursive, it also captures itself leading to a reference cycle.

This breaks the reference cycle so that any Tensors in state_dict can be
collected immediately instead of waiting until the next GC cycle.

Alternatively, we could have passed `state_dict` and `metadata` as
arguments to load to prevent capture of Tensors. (That would still
result in cyclic garbage, but not any cyclic garbage of Tensors).

See:
https://github.com/pytorch/pytorch/issues/20199#issuecomment-491089004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20397

Differential Revision: D15414834

Pulled By: colesbury

fbshipit-source-id: 4c2275a08b2d8043deb3779db28be03bda15872d
2019-05-20 11:46:00 -07:00
796e359601 Refactor builtin ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20666

Pulled By: driazati

Differential Revision: D15404419

fbshipit-source-id: c52bfa408c3caabb0a14779308686cf27fed349b
2019-05-20 11:21:18 -07:00
c0a2a3b22b Add a new method SummaryWriter.flush() (#20607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20607

Add a new method SummaryWriter.flush()  that iterates through all of the FileWriters and flushes them

Reviewed By: orionr

Differential Revision: D15380124

fbshipit-source-id: 1975f3f61c5ae3754552bfdb23f2cd78f687d19f
2019-05-20 11:05:12 -07:00
f215db9b92 InsertGuards pass
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20438

Differential Revision: D15342655

Pulled By: Krovatkin

fbshipit-source-id: a193e582d621b99f848573fb4478e7b62265dc9f
2019-05-20 10:49:19 -07:00
036a159fb9 Audit AT_ASSERT sites in TensorImpl.h; doc improvements (#20649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20649

I went through every occurrence of AT_ASSERT in this file and
thought about whether or not it should be TORCH_INTERNAL_ASSERT
or TORCH_CHECK.  I think I did a good job at it.  Some thoughts:

- In order to decide if a check is "internal" or not, we must
  think about where the separation between userspace and our internals
  are.  I think any code that utilizes the PyTorch or Caffe2 C++ frontends
  count as userspace.  An important collorary is that the majority of operator
  code "counts" as userspace, even though it lives in our repository.  This
  is inline with TCB (trusted computing base) thinking: you want the TCB to
  be as small as possible, and because we have a *lot* of operator
  implementations, they should not count as TCB.

- The primary test I applied when considering an AT_ASSERT was whether or
  not I could trigger this error by just making method calls on caffe2::Tensor
  or at::Tensor.  If I could, that made it a TORCH_CHECK.  This covers most
  of the misapplications of TORCH_INTERNAL_ASSERT.  One place I didn't
  do this was the "is variable" checks; I think you have to work a bit
  harder to trigger this case, and userspace code is not mixing up
  Variables and Tensros.

- I updated the docs for device_opt_, explaining when it could be nullopt.
  (The nullopt checks here are TORCH_CHECK, because you can trigger them
  by taking an undefined tensor and poking the methods.)

Differential Revision: D15395576

fbshipit-source-id: 1c51b396012e7d949fbb4258092cf80e5e6f851b
2019-05-20 09:54:37 -07:00
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
d0c742134d #20028 (#20696)
Summary:
Hi, ezyang
Sorry to trouble you.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20696

Differential Revision: D15413694

Pulled By: ezyang

fbshipit-source-id: 1c19d18e00c3a66a52bb9230aa25d7530f6e659c
2019-05-20 07:51:55 -07:00
cda9e995e2 Benchmark repeat op. (#20016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20016

PT's repeat op benchmark

Reviewed By: zheng-xq

Differential Revision: D15166941

fbshipit-source-id: b1ed7af790460456210b60bfb4e44a08657e9612
2019-05-20 07:34:54 -07:00
8acaa286b7 Make CUDACachingAllocator::recordStream() a no-op on null ptrs (#20658)
Summary:
Fixes #20651

Communication collectives in `torch.distributed` call `CUDACachingAllocator::recordStream()` on input and output tensors to prevent their memory blocks being freed too early. `CUDACachingAllocator` uses tensor's data pointer to track memory blocks, which does not accept null pointers. However, empty tensor's `storage().data()` might be null. In this case, as there is no associated memory block for the empty tensor, it should be fine to make `recordStream()` a no-op.

Tests only cover `broadcast` empty tensors for GLOO backend, because GLOO does not support empty inputs (facebookincubator/gloo/issues/179). It can be addressed in either `ProcessGroupGloo` or GLOO itself. Will add more tests when that gap is filled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20658

Differential Revision: D15399371

Pulled By: mrshenli

fbshipit-source-id: d29ebd1c72fddae49531f32695f81b89e42e5a4d
2019-05-20 07:13:51 -07:00
071971476d Fix Binomimal overflow when logits is large (#20679)
Summary:
This PR fixes  #17843. In addition (test locally), this still maintains the continuity of log_prob which is addressed in https://github.com/pytorch/pytorch/pull/15962

cc neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20679

Differential Revision: D15413311

Pulled By: ezyang

fbshipit-source-id: 4fc0ca755ae6a85aa7deb2206dab675f82f9aa25
2019-05-20 06:52:29 -07:00
9b1dbffba5 Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
5835165ce3 Add get/set_num_interop_threads into torch.h include (#20659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20659
ghimport-source-id: 4858d03a9f89c613f64901c3430a7b212f76eb95

Reviewed By: dzhulgakov

Differential Revision: D15399780

Pulled By: ilia-cher

fbshipit-source-id: c1c3cb628c5ee664468f9d181bcd76a5105a89fd
2019-05-20 00:34:59 -07:00
4598729399 better handling of getenv 2019-05-19 23:19:52 -07:00
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
7b9ee598d6 separate option for FE_OVERFLOW (#20476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476

There're overflow exceptions happening for legitimate computation like
for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1
This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy.

Reviewed By: hx89

Differential Revision: D15332947

fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c
2019-05-19 16:05:27 -07:00
dd050b7b91 Replace AT_ASSERT with TORCH_INTERNAL_ASSERT/TORCH_CHECK (#20668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20668

AT_ASSERT is deprecated. Replace it wil the new exception handling macro.

Differential Revision: D15404401

fbshipit-source-id: 7ae9f5471f1f8ad95e7bb835380e098c00313d9c
2019-05-18 01:48:11 -07:00
96a1f7695f Support plot norm of specific embeddings of a LUT in diagnose_options (#19809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19809

as title

Reviewed By: chocjy

Differential Revision: D15100505

fbshipit-source-id: cba290fd4317b260e2bf1689b9ca215d3d19a9e2
2019-05-18 01:08:45 -07:00
2308257483 delete brodcasting ops from shape analysis resize aliasing (#20661)
Summary:
According to https://pytorch.org/docs/stable/notes/broadcasting.html, in-place operations do not allow the in-place tensor to change shape as a result of the broadcast. Therefore our shape analysis could keep the shape information on inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20661

Differential Revision: D15406477

Pulled By: wanchaol

fbshipit-source-id: 8ab60e783292f2fe26e5fdecfb64bec43bca6826
2019-05-18 00:07:32 -07:00
e74869473d De-deprecate parts of the legacy API (#20561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20561

We previously planned to deprecate the direct passing of a kernel function or lambda to the op() call, e.g.

    static auto registry = RegisterOperators().op("my::op", &func);

and push users towards the options based API:

    static auto registry = RegisterOperators().op("my::op", RegisterOperators::options().kernel<decltype(func), &func>());

because that has a slightly lower performance overhead when calling the kernel.

However, that overhead is negligible for all but exotic use cases, so there's no reason to push users towards a more verbose API.
This diff removes the deprecation warning from that API.

However, if you use the API together with deprecated types like std::unordered_map, you will now get a deprecation warning there.

Reviewed By: zdevito

Differential Revision: D15364271

fbshipit-source-id: 56dae0c5870bbab16ad19ba5178f4bea9eafed9f
2019-05-17 20:54:45 -07:00
cb6be42403 Options based registration API (#20514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514

Change API from

    static auto registry = c10::RegisterOperators()
      .op("my::op",
        c10::kernel(...),
        c10::dispatchKey(...)
      );

to

    static auto registry = c10::RegisterOperators()
      .op("my::op", c10::RegisterOperators::options()
        .kernel(...)
        .dispatchKey(...)
      );

because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better.

Reviewed By: zdevito

Differential Revision: D15346348

fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167
2019-05-17 20:54:42 -07:00
85fad0597c Add qint8 type (int8_t) (#19984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984

Add qint8 for QTensor, with underlying type of int8_t

Reviewed By: jianyuh

Differential Revision: D15150715

fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7
2019-05-17 20:35:05 -07:00
986c9eb537 Add a pybind for Module::get_functions. (#20594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20594
ghimport-source-id: 96f8046fd3aed49d8b29312ce2f8d7e0ea5e5787

Differential Revision: D15375132

Pulled By: ZolotukhinM

fbshipit-source-id: 7ad87e29d965e459c49be1be8a08f86ed2a0b4db
2019-05-17 20:00:28 -07:00
2af5911a95 Modify the Insert quant-dequant test cases to look for q-dq pattern (#20672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20672

Current test case looks for q->int_repr->dq pattern and constant nodes also.
The prim::Constant nodes are not guaranteed to be present at same point in graph.
So we modify the test case to only look for the q->int_repr->dq nodes.

Differential Revision: D15405606

fbshipit-source-id: 2086ffb5bbd328d2a9a55f4c2a2de342575194d3
2019-05-17 18:46:27 -07:00
e42665cf39 Some small performance fixes for c10 dispatcher (#20472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20472
ghimport-source-id: d118bf8d48eea3faf241a7288fcad1bb6a5f051f

Differential Revision: D15332284

Pulled By: li-roy

fbshipit-source-id: a8d9e50a440a7ad3ee730f70c0fcae06ae848cbd
2019-05-17 17:35:56 -07:00
d9dcfacd9e Improve CPUAllocator OOM message (#20618)
Summary:
Spotted while debugging some problem

Before
```
>>> torch.empty(10**15)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
```

After
```
>>> torch.empty(10**15)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 4000000000000000 bytes. Error code 12 (Cannot allocate memory)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20618

Reviewed By: ezyang

Differential Revision: D15390400

Pulled By: dzhulgakov

fbshipit-source-id: 31f448303e4bd5f8c2bad8ca0f05bcece22a4b5e
2019-05-17 16:14:49 -07:00
e79610c0df Fix missing env for update_binary_size job
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20645

Differential Revision: D15400130

Pulled By: pjh5

fbshipit-source-id: d2a07fc5608ab3a96026b60e16bd12add8e0c9d4
2019-05-17 15:46:50 -07:00
c267d0c869 Misc error message improvements (#19369)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19369 [jit] Misc error message improvements**
)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19369

Pulled By: driazati

Differential Revision: D14982975

fbshipit-source-id: 18dae3d56c3a01280f66223a69e2e52d15d3b651
2019-05-17 15:30:58 -07:00
3be2f7c8e6 SubgraphMatcher: add attributes support. (#20602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20602
ghimport-source-id: fa3225bb5d70729d6a1bcf88295d031707d986a1

Differential Revision: D15377635

Pulled By: ZolotukhinM

fbshipit-source-id: ebd385e7b9436429d0ad76ed3d932925a29f6456
2019-05-17 15:10:02 -07:00
0b9b929d14 Use python type string for user facing error msgs (#20657)
Summary:
Otherwise users see something like (Tensor, Tensor)? and don't know what the ? means.

First commit is formatting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20657

Differential Revision: D15400225

Pulled By: eellison

fbshipit-source-id: cf826790bf2ddafd34f6d5c144526cad9904770b
2019-05-17 15:04:53 -07:00
c819d76789 Add list(string) (#20617)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#20617 [jit] Add list(string)**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20617

Pulled By: driazati

Differential Revision: D15397739

fbshipit-source-id: 55a1e54f0ed2d9fd3ce44a8bb64bb4ba63181c9a
2019-05-17 14:57:20 -07:00
a543586bff Add _enable_recursive_script to try to script all Python functions (#19578)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19578 [jit] Try to script all Python functions**

This adds the `torch.jit._enable_recursive_script` context manager, which will try to compile any Python functions it sees. It's hidden behind an internal context manager for now since it's incomplete (doesn't work for script_methods/Python submodules). If it can't compile the Python function it outputs an error.
](https://our.intern.facebook.com/intern/diff/15386727/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19578

Pulled By: driazati

Differential Revision: D15386727

fbshipit-source-id: 4e308f67677b8e9fccfc525a91bb2f4585062048
2019-05-17 14:50:45 -07:00
cd28ff5395 Add support for __getstate__/__setstate__ on module (#20242)
Summary:
Adds support for `__getstate__` and `__setstate__` on modules that are called as part of export (`torch.save()`) and import (`torch.jit.load`).
* `__getstate__` and `__setstate__` must be TorchScript functions with the signatures `() -> T` and `(T) -> None` respectively
* The results of `__getstate__` are stored using the pickler in `states.pkl` with one for each module in definition order (`__getstate__` returns `None` by default if an imlpementation is not provided)
    * This prevents sharing between `__getstate__` and attributes, but this should be fine since their use is mostly unrelated (attributes are for storing values to be used in script methods, `__getstate__` for running arbitrary computations during import)

Follow up
* Somehow replacing `__getstate__`/`__setstate__` with a `ScriptMethodStub` makes `MyScriptModule().__getstate__()` call `ScriptModule.__getstate__()` when used in Python. This should be fixed so semantics in Python are preserved, but it doesn't affect the typical usage.
](https://our.intern.facebook.com/intern/diff/15287161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20242

Pulled By: driazati

Differential Revision: D15287161

fbshipit-source-id: b3f5f33ab74a21a89e6d15460af63aff75cab2d8
2019-05-17 14:43:14 -07:00
5f14ef8cc1 Split out gpu/cpu targets based on gpu_library_targets (#20633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20633

Merge the c2_gpu and is_amd_build logic in targets files.

Reviewed By: dzhulgakov

Differential Revision: D15176621

fbshipit-source-id: 9185b394ffcb305fd8d94dc7c7c92780bf10a511
2019-05-17 13:07:10 -07:00
79c5dc313c Remove unnecessary format literals from error message.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20646

Differential Revision: D15394795

fbshipit-source-id: 8033cf03341244b2b6a119e3c59f48ee6fe959cc
2019-05-17 10:45:40 -07:00
26dfeffacd Remove TH/THC link for single matrix inverse (#20534)
Summary:
- Earlier, we had to use the legacy implementation of `getri` for single matrix inverse from TH and THC
- Now, this has been moved to ATen

Changelog:
- Move single matrix inverse implementation to ATen
- Remove unused code in TH and THC resulting from the change
- Minor modifications made to single matrix CPU function implementations in ATen to avoid redundancy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20534

Differential Revision: D15393383

Pulled By: ezyang

fbshipit-source-id: 81972111cd9757d15f1d634f294c93fd0f35636c
2019-05-17 10:03:36 -07:00
839a69f587 Revert D15393514: [pytorch][PR] Refine CosineAnnealingWarmRestarts doc for issue #20028
Differential Revision:
D15393514

Original commit changeset: 03f270a577fc

fbshipit-source-id: 3633f4e9916bdadf018288a64df89078b14af563
2019-05-17 09:55:56 -07:00
8c9f4c560a Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448)
Summary:
This addresses #18862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448

Differential Revision: D15393465

Pulled By: ezyang

fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028
2019-05-17 09:44:26 -07:00
a212a5b97a ir.cpp, module.cpp: clang-format. (#20592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20592
ghimport-source-id: 98dc62a9595c6b94706960274ce9beebacc9ca00

Differential Revision: D15375131

Pulled By: ZolotukhinM

fbshipit-source-id: 7edbb14a337d1646b48756eef4163846648cbd93
2019-05-17 09:21:32 -07:00
b90790ab1b Don't split 256-bit AVX2 load/store intrinsics (#20609)
Summary:
Recent versions of GCC split unaligned load and store intrinsics into
two 128-bit instructions. On old processors (Sandy Bridge) this was a
bit faster for unaligned data, but bit slower for aligned data. On new
processors (Intel Haswell+, recent AMD) splitting loads is slower on
both aligned and unaligned data.

Clang, MSVC, and ICC do not split unaligned load and store intrinsics.

There's a good explanation here:
https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd#tab-top

Splitting load and store intrinsics makes no sense in our AVX2
configuration because the CPUs that support AVX2 instructions are the
same CPUs where splitting is disadvantageous on all data alignemnt.

Note that this doesn't change the AVX configuration (used by CPUs that
support AVX but not AVX2). It's possible this would be benficial for
that configuration too (our data is usually 32-byte aligned), but I'd
prefer the conservative change for now.

torch.add generated assembly (hot loop) (GCC 7.3.0)
before:
https://gist.github.com/colesbury/066376537bccd514daf8fe4ab54d8295

after:
https://gist.github.com/colesbury/8b4b948145001d44b225c51d2428bb91

Timing of `torch.add(x, y, out=z)` for size 10240 (1 thread, Broadwell,
no turbo):
before: 7.35 us after: 6.39 us

(Take the torch.add timings with a grain of salt. The difference in timings
is much larger than I would expect.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20609

Differential Revision: D15385800

Pulled By: colesbury

fbshipit-source-id: 66415b148a3b19360b9de9881af594ab46547b6f
2019-05-17 09:16:17 -07:00
000d73ccde fix WAR race (#20182)
Summary:
was flagged by racecheck.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20182

Differential Revision: D15393536

Pulled By: ezyang

fbshipit-source-id: ad4849c9fb2c8feb966be1c4ca0dadd7360f58fe
2019-05-17 09:06:52 -07:00
3c69c9a7fe Refine CosineAnnealingWarmRestarts doc for issue #20028 (#20267)
Summary:
Fixes #20028
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20267

Differential Revision: D15393514

Pulled By: ezyang

fbshipit-source-id: 03f270a577fc3e0414d3f07d97512a409b08f7cd
2019-05-17 09:02:28 -07:00
cfb87c1022 Update documentation for CTCLoss (#20422)
Summary:
Change `Inputs` to `Shape` to unify the format of CTCLoss `class`, and add the type of `Output` in `Shape`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20422

Differential Revision: D15393484

Pulled By: ezyang

fbshipit-source-id: 5b49647f9740de77db49a566fa2de74fcecd9110
2019-05-17 09:02:25 -07:00
35e0015c70 Export sign onnx operator (#20470)
Summary:
A trivial commit that supports exporting sign operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20470

Differential Revision: D15393446

Pulled By: ezyang

fbshipit-source-id: 12fb1c147d016205abf814907d667f7d8b074ae1
2019-05-17 08:57:22 -07:00
4e551a7edb Make C10_NODISCARD macro more portable for nvcc+clang. (#20324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20324
ghimport-source-id: e51181c82f87c946b5ffcb87b0ad71a056cb4659

Differential Revision: D15359317

Pulled By: ezyang

fbshipit-source-id: d88798f13a61c74456641ddec8250c08ce8af240
2019-05-17 08:57:19 -07:00
690efa5220 Remove checks for CUDA 8 in LU-based tests (#20482)
Summary:
CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482

Differential Revision: D15393438

Pulled By: ezyang

fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e
2019-05-17 08:51:56 -07:00
110ed511a4 Make check-doxygen.sh output more interpretable. (#20362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20362
ghimport-source-id: ac791884dc6d3954f69d8fc997b2b561f435e0e7

Differential Revision: D15375139

Pulled By: ezyang

fbshipit-source-id: c8aa0f991430269090e068f828810bae7aa39a07
2019-05-17 08:47:11 -07:00
1136ad59f9 Enable simd and loop vectorizer with MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20530

Differential Revision: D15392676

Pulled By: ezyang

fbshipit-source-id: c8fda0c7835127f81adf55016223bb4dc14ff40a
2019-05-17 08:38:56 -07:00
fa4ca4e70e Emphasize all DDP forward() outputs must participate in computing loss (#20586)
Summary:
CC borguz chenyangyu1988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20586

Reviewed By: ezyang

Differential Revision: D15373674

Pulled By: mrshenli

fbshipit-source-id: b986918b3592616a9bcc88fba1b8fd53016f68d7
2019-05-17 07:35:49 -07:00
c941abbc0a Fix upsample kernel launch / reorder arguments (#20505)
Summary:
this is a follow up for https://github.com/pytorch/pytorch/pull/19630
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20505

Differential Revision: D15392706

Pulled By: ezyang

fbshipit-source-id: 5a8a7aacdbcf740508baf2b6e0c081c4e5a0390f
2019-05-17 07:30:50 -07:00
3bc0bd9534 Fix caffe2 build failure on Windows (#20574)
Summary:
Fixes #20568.
Looks like CMake is passing `/MD` when we call `add_library`. We need to fix these with C source files too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20574

Differential Revision: D15392682

Pulled By: ezyang

fbshipit-source-id: c92034d8725fcec48fd7db6cf5322868e956dc6b
2019-05-17 07:21:42 -07:00
4c806a9e8a Allow tuples for scale_factor argument in nn.Upsample (#20581)
Summary:
Fixes #20523 .

nn.Upsample was unable to accept tuple inputs for the scale_factor argument due to direct casting to float, which was done in #17732.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20581

Differential Revision: D15392622

Pulled By: ezyang

fbshipit-source-id: b56ba8197a5bbf8891bc7e1bebf5cad63dcab04d
2019-05-17 07:14:18 -07:00
409200df59 Move inter-op settings into ATen/Parallel (#20050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20050
ghimport-source-id: cc102bab8abf3e56c099245976786317ed63ea14

Differential Revision: D15248576

Pulled By: ilia-cher

fbshipit-source-id: 55ddcb7af387ddfc68a42ac7167de07ea648e249
2019-05-17 03:12:02 -07:00
36d3398aa5 Clang-format ImageInputOp (#20441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20441

This op is fairly complex and the fact that it isn't formatted
correctly makes things that much harder to reason about. Clean it up.

Reviewed By: dreiss

Differential Revision: D15220006

fbshipit-source-id: 30632d8bdbf15f96e73d8b6c96c5f29c052e6e7c
2019-05-16 23:00:09 -07:00
ea9c6e7581 eliminate FE_INVALID in unit test (#20502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20502

Following D15307410 removing more floating point exceptions in unit tests

Reviewed By: hx89

Differential Revision: D15340930

fbshipit-source-id: 269fc75e0800bc9d39126767a0f3ca15cd8b0cad
2019-05-16 21:55:28 -07:00
e4c7f59fbc Shallow-copy indices and values in sparse tensor ctor (#20614)
Summary:
(Reopens https://github.com/pytorch/pytorch/pull/20330 and fixes test error.)

After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20614

Differential Revision: D15385811

Pulled By: yf225

fbshipit-source-id: e963fcf5e4097f8c881b56145f408565d97cf5c1
2019-05-16 18:35:05 -07:00
3c86d597c4 update legacy plus one for mpscnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20554

Reviewed By: jerryzh168

Differential Revision: D15362378

fbshipit-source-id: 070cd8314257386036dca89167c738c6602b3f33
2019-05-16 18:17:18 -07:00
8bdbd59d0c handle box plus one for gpu generate_proposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20553

Reviewed By: newstzpz

Differential Revision: D15362108

fbshipit-source-id: 53b1ef132288855f8977748442bfe5e5806c6c6e
2019-05-16 18:17:15 -07:00
373e6a78bf make box plus one a legacy argument in detection ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20550

Reviewed By: newstzpz

Differential Revision: D15348610

fbshipit-source-id: 12b1e119e9bc9191ba9f2aa6d695ef215780c349
2019-05-16 18:17:12 -07:00
220e6894c5 Rename qint8 data type (#19932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932

In preparation to add int8_t data type for QTensor

Reviewed By: zafartahirov

Differential Revision: D15137838

fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7
2019-05-16 18:09:28 -07:00
980982ac09 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: d0158f7e77a915ffbc28c10de8864d2ae9b24e6f
2019-05-16 16:06:55 -07:00
2ddf126b96 Revert D15373683: [pytorch][PR] [BC-breaking] Shallow-copy indices and values in sparse tensor ctor
Differential Revision:
D15373683

Original commit changeset: 32e7275d7121

fbshipit-source-id: ed1786ee9ffa11f7c14c9cd10be6db48285dc57a
2019-05-16 15:22:48 -07:00
4f02321a9a Shallow-copy indices and values in sparse tensor ctor (#20330)
Summary:
After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20330

Differential Revision: D15373683

Pulled By: yf225

fbshipit-source-id: 32e7275d7121e17937c7cc258e8a60bb0848ff25
2019-05-16 15:04:23 -07:00
21ef4cc615 Improve bmm performance on CPU by applying TensorAccessor (#20266)
Summary:
Currently `bmm()` has very heavy performance overhead on CPU due to construction/deconstruction of `TensorImpl`. Applying `TensorAccessor` when indexing tensor data can greatly improve the performance.

I tested this on `fairseq` Transformer model. Results on Xeon 6148 (20*2 cores 2.5GHz) indicate this PR improves Transformer training performance by approximately **10%** (seconds per iteration reduced from **3.60** to **3.21**). Considering the fact that `bmm()` takes only **14%** of the total time, 10% overall improvement indicates `bmm()` itself improves by roughly **3x**.

Before:
```
| epoch 001:   0%| | 43/25337 [02:34<25:17:11,  3.60s/it, loss=16.179, nll_loss=16.137, ppl=72045.59, wps=1320, ups=0, wpb=4758.767, bsz=136.558, num_updates=43, lr=6.45e-06, gnorm=6.88
```

After:
```
| epoch 001:   0%| | 23/25337 [01:13<22:32:48,  3.21s/it, loss=17.072, nll_loss=17.068, ppl=137419.42, wps=1478, ups=0, wpb=4746.870, bsz=128.348, num_updates=23, lr=3.45e-06, gnorm=10.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20266

Differential Revision: D15262201

Pulled By: cpuhrsch

fbshipit-source-id: c2e4e406c06714b04cc7534f3da71e986eddca35
2019-05-16 14:01:48 -07:00
fa189641b5 Add export for __and__ & __or__ (#17894)
Summary:
In onnx spec, the supported input/output type for `And` and `Or` is `Bool` only.
Thus in exporting, cast to/from `Bool` is inserted for input/output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17894

Reviewed By: zrphercule

Differential Revision: D15103148

Pulled By: houseroad

fbshipit-source-id: 3e1068ea236c743260d42882fb11f0e3a21707e6
2019-05-16 13:52:06 -07:00
61012080c8 split and register CollectAndDistributeFpnRpnProposals with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20509

Reviewed By: newstzpz

Differential Revision: D15302181

fbshipit-source-id: 7d3b29b667cd900f2976101f35200e1ee20b0f64
2019-05-16 13:40:46 -07:00
d784636b39 Scope: Move implementations from .h to .cpp file. (#20593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20593
ghimport-source-id: e1e9a7f98158c23adf02d5ed2763ab1e33bb2997

Differential Revision: D15375134

Pulled By: ZolotukhinM

fbshipit-source-id: 8d8e0c1e0ef7697ded59a4b19e2d9de7c5230294
2019-05-16 13:18:04 -07:00
75d04900fe Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 2ee799db589f63e7b9336a02d047afcc768e8b58
2019-05-16 09:48:39 -07:00
5821a76b8e Forcing gcc ABI and safer bash scripts, v2 (#20540)
Summary:
First time this was merged it broke master and was reverted. This time I do not add ```set -u``` to the .circleci/scripts/setup* scripts. There's still a chance that ```set -u``` breaks the binary builds on master, but at least those can be fixed in parallel and don't completely eliminate signal from all merges.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20540

Differential Revision: D15373444

Pulled By: pjh5

fbshipit-source-id: 0203c20865827366ecd8fa07b2db74d255549ed1
2019-05-16 09:40:01 -07:00
66c6133264 fix empty dropout (#20541)
Summary:
Fix for #20499
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20541

Differential Revision: D15372461

Pulled By: ezyang

fbshipit-source-id: cdc237a98244515a573216a6dac4826261c973f9
2019-05-16 09:33:51 -07:00
a837c00acd Removing unnecessary comments (+fix flake8)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589

Differential Revision: D15373655

Pulled By: VitalyFedyunin

fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1
2019-05-16 09:19:34 -07:00
5f8e849d84 eliminate FE_INVALID in optimizer related operators and tests (#20501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20501

Fixing unit tests related to optimizer related operators and tests

Reviewed By: hx89

Differential Revision: D15307410

fbshipit-source-id: e5400c26e08f26191ee542fe6b02e0a69bc4e1ae
2019-05-16 08:23:46 -07:00
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
09f22d10a6 Infer schema for experimental ops (#20513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20513

They've been using an old API, switch them to the new one instead.

Reviewed By: li-roy

Differential Revision: D15346349

fbshipit-source-id: 538eb460897ec6addebeebf88b316eb0d6b1dd6f
2019-05-16 01:29:35 -07:00
9bd3305592 Allow nested lists/dicts in legacy operator API (#20379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20379

The legacy custom op API allowed nesting of std::unordered_map and std::vector. While we haven't figured out yet how to do that with the new API,
we at least have to keep backwards compatibility. This diff adds the feature so we can switch to the new API without breaking third party code.

Reviewed By: li-roy

Differential Revision: D15287693

fbshipit-source-id: bb5b8429fddf6298719cbf567b584ed371f8fc81
2019-05-16 01:29:32 -07:00
456b889353 Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() (#20496)
Summary:
Previously, the caller of `shallow_copy_and_detach()` is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of `shallow_copy_and_detach()` to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.

For similar reasons, we want to enforce users of `shallow_copy_and_detach()` to pass `allow_tensor_metadata_change` to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20496

Differential Revision: D15363620

Pulled By: yf225

fbshipit-source-id: a65e74738b10452668d6dc644b43aad5b3d8c9e6
2019-05-15 21:02:48 -07:00
3caf4e6985 Remove weak_script in MultiheadAttention function. (#20563)
Summary:
Remove weak_script. After recently splitting the forward() function in MultiheadAttention module, we notice a memory leak on GPU. Fix the problem by removing those "weak_script" decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20563

Differential Revision: D15368262

Pulled By: zhangguanheng66

fbshipit-source-id: 475db93c9ee0dbaea8fb914c004e7d1e0d419bc2
2019-05-15 20:10:39 -07:00
7db1fb84fa Use slimmer exception raising code when on mobile. (#20543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20543

All of that code for concatenating strings together adds up. Just discard it all for mobile builds.

Reviewed By: ljk53

Differential Revision: D15353447

fbshipit-source-id: a82dd0b884335d662605aabf7dd3d09dfcc1478b
2019-05-15 19:45:18 -07:00
1891614aa5 Add GivenTensorInt16Fill (#20515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20515

Needed by the upcoming quantized version of GenerateProposals

Reviewed By: dzhulgakov

Differential Revision: D14430952

fbshipit-source-id: ea852f04cc4b070f8fbe7a1e6535bba4d5b230fd
2019-05-15 19:45:15 -07:00
5917ec2c52 Print registry warning only when DEBUG is set (#20398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20398

Reduce logging volume from the Registry

Reviewed By: nairbv

Differential Revision: D15312262

fbshipit-source-id: e3546c288d6e1a396b2a4b08204a418aca889437
2019-05-15 19:29:05 -07:00
c129ab06e9 Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439

This is the QTensorProto workflow for multi group quantization in C2 side.
No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50.

Reviewed By: yinghai

Differential Revision: D15096919

fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c
2019-05-15 19:24:08 -07:00
51e40ab832 Add scalar type info to tensor print (#20483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20483
ghimport-source-id: 31bfc51af1060e83492315b96884fc725a1eb84f

Differential Revision: D15334010

Pulled By: li-roy

fbshipit-source-id: 199b575855146a7336d57c165191a16e7e1b5785
2019-05-15 19:03:21 -07:00
abb3698976 Add QInt32 ScalarType and qint32 data type (#19816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816

We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`

Differential Revision: D15094174

fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
2019-05-15 18:50:18 -07:00
1a0f753e6e Fixing typos in schema description for BatchMatMul (#20512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20512

Fixing typos in the description of schema for one of the inputs for BatchMatMul operator.

Reviewed By: jianyuh, BIT-silence

Differential Revision: D15343879

fbshipit-source-id: 06354e8e6b0d79fea937ed2703bb457b2d04f859
2019-05-15 18:06:30 -07:00
b3e510518b Tensor codemod for instance_norm (#20517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20517

fixing a bug in instance_norm

Reviewed By: ezyang

Differential Revision: D15349006

fbshipit-source-id: 2496f7f372118d2713c12a6e9b3357bf6c640b71
2019-05-15 17:51:37 -07:00
ca24e18c7e Add an AssertError check back to MultiheadAttention module (#20492)
Summary:
Fix a typo in the doc.
Add an AssertError check back to MultiheadAttention module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20492

Differential Revision: D15349008

Pulled By: cpuhrsch

fbshipit-source-id: 2d898345f03787c713e537673613a748ad826b34
2019-05-15 17:28:25 -07:00
161566187c enable CopyVector for type of int on CUDA (#20520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20520

as title

Reviewed By: xianjiec

Differential Revision: D15351010

fbshipit-source-id: 99466de9da0abdffe26d6919768dcb4e52cb2ff1
2019-05-15 16:53:51 -07:00
4c23c34e79 Computing var/stddev and mean at the same time (#18731)
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
2019-05-15 16:42:38 -07:00
08bdd694f9 Extract feature length information from SigridTransforms op (#20384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20171

Extract feature length information from SigridTransforms op

Reviewed By: ipiszy

Differential Revision: D15219408

fbshipit-source-id: 307d2b65b208d3af6977d90246d0372795c45815
2019-05-15 16:21:57 -07:00
428104c60a Automatic update of fbcode/onnx to ead449a30d026a7a0a59e2ba0a42ca8e52ec2359 (#20542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20542

Previous import was e08efaa35ed54362dfa283240506c003175889b7

Included changes:
- **[ead449a3](https://github.com/onnx/onnx/commit/ead449a3)**: fix array range bug (#2015) <one-hello>
- **[0442d426](https://github.com/onnx/onnx/commit/0442d426)**: Relax constraint on subgraph input/output type and shape (#2009) <Bowen Bao>

Reviewed By: zrphercule

Differential Revision: D15350320

fbshipit-source-id: 2cc5db926785cda0b79efb6747da3900361dba76
2019-05-15 15:12:59 -07:00
8226330af3 Extend testAvailableArgTypes (#20374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20374

This test case now also tests that the argument type works correctly in kernels that
- don't return outputs
- return multiple outputs

Reviewed By: li-roy

Differential Revision: D15298233

fbshipit-source-id: 82ab9d81b55b4f9fb34d66a155cc426af8592e25
2019-05-15 14:57:40 -07:00
f89ab7b623 Allow Dict type in c10 operators (#20373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20373

- Add support for Dict<Key, Value> arguments and returns to c10 operators
- Add support for std::unordered_map<Key, Value> to the legacy API (but not to c10 kernels)

Reviewed By: li-roy

Differential Revision: D15298235

fbshipit-source-id: 6d9793db1f12bea377f508a9b33a495ebe0bec18
2019-05-15 14:57:37 -07:00
a821e11127 Speed up RecordFunction with sampled callbacks (#20307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20307
ghimport-source-id: 94adc3c3102bb6b9cb60cf6c6112e350aa954aaf

Differential Revision: D15276308

Pulled By: ilia-cher

fbshipit-source-id: c536063669d8414b4ce0b09fd5dc0d76f1e94bb5
2019-05-15 14:48:49 -07:00
b55d2dcc84 Publish c10::RegisterOperators as torch::RegisterOperators (#20334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20334

-

Reviewed By: li-roy

Differential Revision: D15284557

fbshipit-source-id: fdd1d9f2910dbd05a869eef13ccdc68c80e6bd81
2019-05-15 13:45:07 -07:00
852f8526c5 Replace AT_CHECK with TORCH_CHECK [shard 5/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20431

Reviewed By: jerryzh168

Differential Revision: D15318266

fbshipit-source-id: 500719451202458fae312aa196c0c60098d6a541
2019-05-15 12:54:08 -07:00
5243fe0350 Allow static inheritence for ScriptModules (#20503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20503
ghimport-source-id: 35684175f0806485b074ea136548823ad1bc1c30

Differential Revision: D15341555

Pulled By: zdevito

fbshipit-source-id: ad19da3306914196dcbbcee829dcb0a9f22e3722
2019-05-15 12:41:55 -07:00
da3e74b21c define use_cuda in dropout backward to allow peephole optimization to… (#20289)
Summary:
… work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289

Differential Revision: D15350262

Pulled By: wanchaol

fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4
2019-05-15 11:36:06 -07:00
bd047d812e Recursively checkout submodules for Pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20537

Differential Revision: D15354830

Pulled By: pjh5

fbshipit-source-id: 40902c450756dc127d34c9ec64e78d33edb6c5c9
2019-05-15 10:48:27 -07:00
72bb84c518 Provide a few default args for numpy translation (#20451)
Summary:
Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy.

For now, they are as follows:

* `keepdim` -> `keepdims`
* `dim` -> `axis`
* `input` -> (any of `a`, `x`, `x1`)
* `other` -> `x2`

Basic examples:
```python
>>> t=torch.randn(10,10)
>>> torch.sum(x=t, axis=1)
tensor([ 0.5199, -0.3768,  4.3619, -0.9105,  1.1804,  1.0837, -0.9036,  0.2365,
         1.1171, -0.0999])
```
```python
>>> torch.add(x1=5, x2=6)
tensor(11)
```

The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451

Differential Revision: D15337521

Pulled By: umanwizard

fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5
2019-05-15 10:13:17 -07:00
83649ef081 Replace AT_CHECK with TORCH_CHECK [shard 1/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20426

Reviewed By: jerryzh168

Differential Revision: D15318160

fbshipit-source-id: 4d1fb341ab47147d760d527b901de6ce54182753
2019-05-15 08:44:54 -07:00
2827f3ded6 Portable way of the warning clause
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20484

Differential Revision: D15353119

Pulled By: ezyang

fbshipit-source-id: a708548554728ec34c51a8032ceb2b12f16a8d5c
2019-05-15 08:14:01 -07:00
15c0091d8a Fix GetLastError in THAllocator for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20526

Differential Revision: D15352969

Pulled By: ezyang

fbshipit-source-id: 50a9ee10c1c80cfe737c96dd8af63a2b034686ae
2019-05-15 08:13:57 -07:00
73a97387c1 Replace AT_CHECK with TORCH_CHECK [shard 9/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435

Reviewed By: jerryzh168

Differential Revision: D15318877

fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1
2019-05-15 08:05:59 -07:00
365fc26571 Replace AT_CHECK with TORCH_CHECK [shard 8/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20434

Reviewed By: jerryzh168

Differential Revision: D15318396

fbshipit-source-id: dcd0f51be2d64b9440bb95ce8f40acb12545c2f4
2019-05-15 08:05:56 -07:00
d1623f4cc9 Replace AT_CHECK with TORCH_CHECK [shard 3/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20428

Reviewed By: jerryzh168

Differential Revision: D15318209

fbshipit-source-id: e492aaa79146cfce9489bdb354cc539d7c4220a7
2019-05-15 07:40:50 -07:00
9d09f5df6c Replace AT_CHECK with TORCH_CHECK [shard 7/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20432

Reviewed By: jerryzh168

Differential Revision: D15318289

fbshipit-source-id: 6c443ac848fe28a1e3e8d7f33a12cd50f80b3e40
2019-05-15 07:40:47 -07:00
101067703e Fix strtod for MSVC (#20490)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20408. Tested locally by Jonas1312.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20490

Differential Revision: D15353137

Pulled By: ezyang

fbshipit-source-id: 0c0aefe54b11d50f703171700838af51f7666418
2019-05-15 07:40:44 -07:00
97e1f07ffc Replace AT_CHECK with TORCH_CHECK [shard 10/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20436

Reviewed By: jerryzh168

Differential Revision: D15318926

fbshipit-source-id: 71a43070cc50cc174f703ebc595f1d87c6fc1e91
2019-05-15 07:35:37 -07:00
8e26759f14 Back out "[pytorch][PR] Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds"
Summary: Original commit changeset: 571bba8a93ea

Reviewed By: pjh5

Differential Revision: D15349783

fbshipit-source-id: 75c3e2b9b97e0ac0e8bcdef93e53b0d475c6fa38
2019-05-15 00:02:55 -07:00
fd18b89c98 shape inference for learning rate op (#20020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20020

Add shape inference for LearningRate op. The output (lr) should have similar shape with input (iteration), but not the same type (float vs int).

Reviewed By: un-disclosed

Differential Revision: D15112300

fbshipit-source-id: 09969aefa15172a6f3c70cd9b2548e3020da5d7a
2019-05-14 23:34:32 -07:00
33f421027c Allow recency weight pooling for fp16 (#20506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20506

as titled

Reviewed By: alex1o1o7cloud

Differential Revision: D15342758

fbshipit-source-id: 89e7cb6d7b9511ef6c70611359736328571d7fc0
2019-05-14 20:13:38 -07:00
ea13b53856 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 63e9b4a8cf5b15a6ba20d1946aac36c1604d8079
2019-05-14 19:02:43 -07:00
254de9e8ec Removing cyclic dependency (#20511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511

Removed cyclic dependency of caffe2/core/net.h and workspace.h

Differential Revision: D15303412

fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e
2019-05-14 18:55:19 -07:00
ace506fb38 Dict (#20372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20372

Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.

Reviewed By: zdevito

Differential Revision: D15298234

fbshipit-source-id: b5ad368a9e9516030805cd8f5f1b02e3986933c0
2019-05-14 18:37:02 -07:00
56fb5e03b5 refactor registerStoragePyTypeObject (#20467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20467

for upcoming changes in Storage for QInt8

Reviewed By: ezyang

Differential Revision: D15330865

fbshipit-source-id: 2840e59c0bf088983f792fd724de41b3bb3dec55
2019-05-14 18:22:33 -07:00
ea38fbfc5c Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds (#20243)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/17492
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20243

Differential Revision: D15348101

Pulled By: pjh5

fbshipit-source-id: 571bba8a93eaa9806db3f3d38697c26b5285da7a
2019-05-14 18:02:42 -07:00
358fb51e77 Replace AT_CHECK with TORCH_CHECK [shard 6/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20430

Reviewed By: jerryzh168

Differential Revision: D15318250

fbshipit-source-id: eaee93447d757124a0c9fb5dcde503ae6a065912
2019-05-14 16:00:59 -07:00
5b45355431 Replace AT_CHECK with TORCH_CHECK [shard 2/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20427

Reviewed By: jerryzh168

Differential Revision: D15318190

fbshipit-source-id: 15518a683d7b662ef00f255134aaf9dbd183f099
2019-05-14 16:00:56 -07:00
71af7c46bb Replace AT_CHECK with TORCH_CHECK [shard 4/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20429

Reviewed By: jerryzh168

Differential Revision: D15318222

fbshipit-source-id: daf693c34b4ee92e302eee679ed76a862715d1bb
2019-05-14 15:50:16 -07:00
9610f150d7 stop build spew on development (#20508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20508
ghimport-source-id: 26a16e2918fb93058c7740afb85070e0d29b4d1b

Differential Revision: D15343207

Pulled By: zdevito

fbshipit-source-id: b6d8858024cc440d59cf88d69e0fbc0e67dc85ce
2019-05-14 15:30:52 -07:00
24cd0e08cf identify important circleci builds (#20498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20498
ghimport-source-id: b62b5bcf73ce87b1054cad053fd1cc118a586cf6

Differential Revision: D15342506

Pulled By: suo

fbshipit-source-id: 9889103d23affe0d7eea0abfd801bae46d5238a2
2019-05-14 15:16:06 -07:00
9e7f22b223 Remove dependencies from Caffe2Go on PyTorch JIT (#20463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463

Source file changes mostly involve ifdef'ing-out references to JIT code
from files that are part of Caffe2Go.  Update Internal build scripts to
remove those files from our globs.

After this, changes to most of the JIT files should not trigger mobile CI.

Reviewed By: dzhulgakov

Differential Revision: D15329407

fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412
2019-05-14 14:36:08 -07:00
3479777519 UpSample GPU Porting (#19630)
Summary:
resolves #16158
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19630

Differential Revision: D15335765

Pulled By: ezyang

fbshipit-source-id: 03dd590c715a65c20ac99674a5d77179cd4a50fc
2019-05-14 11:58:21 -07:00
7ffc37e022 Add ShapeInference for AtomicIter Op (#20021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20021

Add shape inference for AtomicIter operator. The operator takes two blobs iteration and iter_mutex as input and outputs iteration, which should have the same type and shape as the input.

Reviewed By: un-disclosed

Differential Revision: D15111643

fbshipit-source-id: 0d06413305cc4c6257c0cfabf62fb874970803bc
2019-05-14 11:43:21 -07:00
6e82b1c77d Split nn.MultiHeadAttention into Module + functional (#20415)
Summary:
Moving functions from torch/nn/modules/activation.py to torch/nn/functional.py. For functions not implemented (_get_input_buffer and _set_input_buffer), a TODO is added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20415

Differential Revision: D15318078

Pulled By: jamarshon

fbshipit-source-id: 5ca698e2913821442cf8609cc61ac8190496a3c6
2019-05-14 08:41:28 -07:00
b46a630836 Update Sleef to include fix for FMA4 detection (#20450)
Summary:
FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

Once we verify that this change fixes https://github.com/pytorch/pytorch/issues/12112, I'll make a PR for upstream Sleef.

The mapping of registers to reg is:

```
  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx
```

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal
instruction"
errors on AMD CPUs that do not support FMA4.

See https://github.com/pytorch/pytorch/issues/12112
See https://github.com/shibatch/sleef/issues/261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20450

Differential Revision: D15324405

Pulled By: colesbury

fbshipit-source-id: 96fb344c646998ff5da19e4cdbf493f5a4e9892a
2019-05-14 08:33:18 -07:00
101176870e eliminate FE_INVALID exceptions related to fp16 conversion (#20390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20390

duc0 Ngo implemented observing floating point exceptions but there were a couple of places where we have "benign" floating point exceptions leading to false positives. This diff eliminates one source of such false positives, namely using _mm256_cvtph_ps and _mm256_cvtps_ph for partially uninitialized array for the remainder loop.

Reviewed By: hx89

Differential Revision: D15307358

fbshipit-source-id: 38f57dfdd90c70bc693292d2f9c33c7ba558e2c9
2019-05-13 23:42:01 -07:00
8e9692df27 codemode change missing [from D13586737]
Summary: as title

Reviewed By: jerryzh168

Differential Revision: D15327669

fbshipit-source-id: e262dacb097e91475b1925ec40b375ec6722ad5a
2019-05-13 20:44:04 -07:00
e8fb5f35f0 Bump torch proto version (#20444)
Summary:
Tagging along to changes in #20191 which added more support for types in the pickler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20444

Pulled By: driazati

Differential Revision: D15321463

fbshipit-source-id: 985061bf5070a7d7bad58ea8db11d531f3d13e74
2019-05-13 18:32:16 -07:00
a9aaf698a4 add c2 benchmark runs in cpp (#20108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108

Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first.

Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that.

Reviewed By: zheng-xq

Differential Revision: D15155976

fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3
2019-05-13 17:01:08 -07:00
d2da3ee601 temporarily disbale layernorm AD (#20442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20442
ghimport-source-id: c246ade4ee9ee31b2e3413efff3ea6a246e1837e

Differential Revision: D15321524

Pulled By: wanchaol

fbshipit-source-id: 22c77d08c91af2d83dfd2c4a84cafc56e9240033
2019-05-13 16:35:50 -07:00
f0829f37c8 Rename AT_ASSERT to TORCH_INTERNAL_ASSERT; other macro updates (#20321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20321

First part of https://github.com/pytorch/pytorch/issues/20287

- Rename `AT_ASSERT` to `TORCH_INTERNAL_ASSERT`
- Make `TORCH_INTERNAL_ASSERT` work with variadic inputs
- Deprecated `AT_ASSERT` and `AT_ASSERTM`
- Rename `AT_CHECK` to `TORCH_CHECK`
- Make `TORCH_CHECK` give a better error message when no arguments are
  provided
- Deprecate `AT_ERROR` in favor of `TORCH_CHECK(false, ...)`
- Deprecate `AT_INDEX_ERROR` in favor of `TORCH_CHECK_INDEX(false, ...)`
- Rename `AT_WARN` to `TORCH_WARN`

No use sites are changed; I'll work on that in follow up patches
(or disable the deprecation, if necessary.)

Differential Revision: D15278439

fbshipit-source-id: 7e0ed489d4e89e5f56b8ad7eafa72cb9a06065ee
2019-05-13 16:16:42 -07:00
1364104054 Fix version counter sharing in set_data() (#20391)
Summary:
In https://github.com/pytorch/pytorch/pull/18223/files#diff-77a6f3462f2233b921d3042412fed6d3R178, we used `auto saved_version_ = data_.unsafeGetTensorImpl()->version_counter().current_version()` and then `new_data_impl_copy->set_version_counter(saved_version_)`, which actually doesn't preserve the original semantics that `var.set_data(tensor)` should keep `var`'s version counter object intact. This PR fixes the bug and adds test to make sure it doesn't happen again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20391

Differential Revision: D15323430

Pulled By: yf225

fbshipit-source-id: e3ba49b51ec8ccecd51c80cb182387f74cfd2b2b
2019-05-13 16:03:42 -07:00
3a0b27b73d Move at::NonVariableTypeMode to TensorImpl, and check it in is_variable() (#20392)
Summary:
As part of the Variable/Tensor merge, we allow passing Tensor with AutogradMeta into ATen ops, but we want to make sure they are not treated as Variables (i.e. their `is_variable()` is false). This PR makes the necessary change to make this work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20392

Differential Revision: D15321899

Pulled By: yf225

fbshipit-source-id: c2ab09db73c63bd71ba2d8391095f4d6b4240a9a
2019-05-13 15:49:23 -07:00
2dc9152dbe Automatic update of fbcode/onnx to e08efaa35ed54362dfa283240506c003175889b7 (#20443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20443

Previous import was 5bde6371620b76302864bce90f521d72eda95d0e

Included changes:
- **[e08efaa3](https://github.com/onnx/onnx/commit/e08efaa3)**: Fix shape inference logic for TopK operator (#2005) <Hariharan Seshadri>
- **[d80ea947](https://github.com/onnx/onnx/commit/d80ea947)**: Nullary variadic (#1889) <G. Ramalingam>
- **[50dc186b](https://github.com/onnx/onnx/commit/50dc186b)**: Removed setting MD/MDd flags manually through cmake. The MTd/MT part is still necessary. Looks like CI fails without it. (#1995) <Alexander Yermolovich>
- **[e7f81c5e](https://github.com/onnx/onnx/commit/e7f81c5e)**: Move NonMaxSupression to object_detection folder (#2001) <Hector Li>
- **[86ab4517](https://github.com/onnx/onnx/commit/86ab4517)**: Prevent using invalid iterator, fix arithmetics. (#2004) <Dmitri Smirnov>

Reviewed By: zrphercule

Differential Revision: D15302141

fbshipit-source-id: 146c346c188934e5125371b261ecfde93b4aa166
2019-05-13 14:47:11 -07:00
824d4f9957 Needed fixes for binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20385

Differential Revision: D15321396

Pulled By: pjh5

fbshipit-source-id: de7ca1ac928bdea3bcf6c78e84c7e9b786bcff52
2019-05-13 11:58:50 -07:00
6c3b8a24ff Make sure reducer=None is not used when fp16 embedding is enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20349

Reviewed By: hyuen

Differential Revision: D15291545

fbshipit-source-id: fa5fd0b97aeca6e5f45866908f3f205b701c931b
2019-05-13 11:53:14 -07:00
63c05bffcb Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20440

Pulled By: driazati

Differential Revision: D15320614

fbshipit-source-id: dc650c478e39d0c3e6b660c2d9ef93b3479df1ac
2019-05-13 11:37:27 -07:00
7799ea5eb3 Port adaptive_avg_pool3d to ATen (#19898)
Summary:
Resolves #18065.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19898

Differential Revision: D15240607

Pulled By: ezyang

fbshipit-source-id: 00cf23ed20c1757d5eef71fd8c6a2f53d372e341
2019-05-13 11:29:22 -07:00
5268b7dfaf Remove support for CUDA 8 (#20298)
Summary:
1.1.0 stopped support for CUDA 8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20298

Differential Revision: D15294639

Pulled By: ezyang

fbshipit-source-id: b9411bfe456f93f1529b745dc83b7d6310df684d
2019-05-13 11:24:22 -07:00
62957ab0a1 Tiny spelling mistake fix. (#20425)
Summary:
"then the output would also has k tensors" -> "then the output would also have k tensors"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20425

Differential Revision: D15320152

Pulled By: zou3519

fbshipit-source-id: b04e2ccd29c6a3e33ad1040d0ea975a01a7bd9b5
2019-05-13 11:18:53 -07:00
67414714e5 Move THCTensor_(uniform) to ATen (#20292)
Summary:
As a first step for this plan: https://github.com/pytorch/pytorch/issues/19508#issuecomment-485178192, this PR moves `THCTensor_(uniform)` to ATen. Major changes are:
- `uniform_` cuda kernel now utilizes a philox generator.
- the kernel also utilizes TensorIterator
- the kernel uses a grid-stride loop to achieve peak effective bandwidth

- Since the engine has changed from `curandStateMTGP32` to `curandStatePhilox4_32_10`, the randoms generated now will be different.
- Here is the diff showing codegen changes: https://gist.github.com/syed-ahmed/4af9ae0d42b6c7dbaa13b9dd0d1dd1e8 (BC breaking change if any)

- Philox4_32_10 is known to pass the standard TestU01 Big Crush test (https://www.thesalmons.org/john/random123/papers/random123sc11.pdf) and hence the quality of random numbers generated isn't an issue when compared to the previously used `curandStateMTGP32`.
- I have added a test case in `aten/src/ATen/test/cuda_distributions_test.cu` which verifies that philox offset is incremented properly

The benchmark was done on a DGX station with 4 V100s.
I modified the script from jcjohnson 's [multinomial benchmark](https://github.com/jcjohnson/pytorch-multinomial-benchmark) to produce this notebook which shows that there is a general speedup with this PR and a regression hasn't been introduced: https://gist.github.com/syed-ahmed/9d26d4e96308aed274d0f2c7be5218ef

To reproduce the notebook:
- Run https://gist.github.com/syed-ahmed/4208c22c541f1d30ad6a9b1efc1d728f in a container with the current pytorch top of tree with the command: `python uniform_benchmark.py --stats_json before.json`
- Apply this diff to the current pytorch top of tree and run the same script in a container with the command: `python uniform_benchmark.py --stats_json after.json`
- Run the notebook attached above with the `after.json` and `before.json` in the same directory

The effected bandwidth was calculated using the script (thanks to ngimel ): https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
Following are the numbers before and after.
```
uniform, size, elements 65536 forward 5.168914794921875e-06 bandwidth (GB/s) 50.71548098597786
uniform, size, elements 131072 forward 5.056858062744141e-06 bandwidth (GB/s) 103.67860705101367
uniform, size, elements 262144 forward 7.164478302001953e-06 bandwidth (GB/s) 146.357621001797
uniform, size, elements 524288 forward 1.1217594146728515e-05 bandwidth (GB/s) 186.9520302275877
uniform, size, elements 1048576 forward 1.923084259033203e-05 bandwidth (GB/s) 218.10297600317384
uniform, size, elements 2097152 forward 3.640890121459961e-05 bandwidth (GB/s) 230.39992200138826
uniform, size, elements 4194304 forward 6.778717041015625e-05 bandwidth (GB/s) 247.49839679819922
uniform, size, elements 8388608 forward 0.00012810707092285157 bandwidth (GB/s) 261.92490202361347
uniform, size, elements 16777216 forward 0.00025241613388061524 bandwidth (GB/s) 265.86598474620627
uniform, size, elements 33554432 forward 0.000497891902923584 bandwidth (GB/s) 269.5720239913193
```
```
uniform, size, elements 65536 forward 5.550384521484375e-06 bandwidth (GB/s) 47.22988091821306
uniform, size, elements 131072 forward 5.581378936767578e-06 bandwidth (GB/s) 93.93520954942333
uniform, size, elements 262144 forward 6.165504455566406e-06 bandwidth (GB/s) 170.071404141686
uniform, size, elements 524288 forward 6.3276290893554685e-06 bandwidth (GB/s) 331.4277702414469
uniform, size, elements 1048576 forward 8.509159088134765e-06 bandwidth (GB/s) 492.91639239047356
uniform, size, elements 2097152 forward 1.2989044189453124e-05 bandwidth (GB/s) 645.8218077979443
uniform, size, elements 4194304 forward 2.347707748413086e-05 bandwidth (GB/s) 714.6211452997259
uniform, size, elements 8388608 forward 4.4286251068115234e-05 bandwidth (GB/s) 757.6715389250498
uniform, size, elements 16777216 forward 8.672237396240235e-05 bandwidth (GB/s) 773.8356427961071
uniform, size, elements 33554432 forward 0.00016920566558837892 bandwidth (GB/s) 793.2224227438523
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20292

Differential Revision: D15277761

Pulled By: ezyang

fbshipit-source-id: 8bfe31a01eeed77f0ed6e7ec4d2dda4c6472ecaa
2019-05-13 09:38:28 -07:00
5f7ef09f57 math module support: gcd, copysign, erf, erfc, expm1, fabs, gamma, lgamma (#19707)
Summary:
eellison driazati Refer to issue #19026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19707

Differential Revision: D15302632

Pulled By: eellison

fbshipit-source-id: 68ff13b478b93cc33703ef3276b5fa727c8ff31a
2019-05-13 08:55:23 -07:00
41673d477c Disable incremental_state function in MultiheadAttention module. (#20177)
Summary:
To fully support incremental_state function, it requires several additional utils available in fairseq. However, we lack a problem for the unit test. Therefore, the incremental_state function will be disable for now. If it is needed in the future, a feature request could be created. Fixed #20132

Add some unit tests to cover the arguments of MultiheadAttention module, including bias, add_bias_kv, add_zero_attn, key_padding_mask, need_weights, attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20177

Differential Revision: D15304575

Pulled By: cpuhrsch

fbshipit-source-id: ebd8cc0f11a4da0c0998bf0c7e4e341585e5685a
2019-05-13 08:21:15 -07:00
f8aa6a8f44 Make a deep copy of extra_compile_flag dictionnary (#20221)
Summary:
See issue #20169
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20221

Differential Revision: D15317126

Pulled By: ezyang

fbshipit-source-id: 0a12932db4f6ba15ea1d558fa329ce23fe2baef6
2019-05-13 08:11:39 -07:00
30bdb8c0d7 Hotfix for caffe2 windows build (#20417)
Summary:
We don't need to overlay vc env when not using ninja. CMake will deal with it automatically. Overlaying is a no-op when the env is the same with the generator specified but will generate the error "Cannot find CMAKE_CXX_COMPILER" when they are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20417

Differential Revision: D15317081

Pulled By: ezyang

fbshipit-source-id: 5d9100321ecd593e810c31158f22c67d3e34973b
2019-05-13 08:03:45 -07:00
f496ea36b2 DataLoader: add error detection for worker_init_fn (#20150)
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150

Differential Revision: D15314891

Pulled By: ezyang

fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
2019-05-12 18:28:56 -07:00
163f0e182c Fix bug in non_blocking copy (#20305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20305
ghimport-source-id: eb3dacb10fd93bbb5a6bbe078ed1ec842163d0e6

Differential Revision: D15276094

Pulled By: li-roy

fbshipit-source-id: 4728f419aa050e6c94a4f62231fa1a86caa556a7
2019-05-11 15:20:19 -07:00
6a8f55796a Add quant-dequant nodes for weights
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20041

Differential Revision: D15178086

fbshipit-source-id: 8cb060d72b68e44bf042338924f203ae62d74f6a
2019-05-11 14:03:10 -07:00
9499c7b7ee Profiling GraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19994

Differential Revision: D15307752

Pulled By: Krovatkin

fbshipit-source-id: 7b35191042199ef16823487e15fe639968cbdc89
2019-05-10 23:05:47 -07:00
f4d9bfaa4d Support Exports to Multiple ONNX Opset (#19294)
Summary:
Support exporting multiple ONNX opsets (more specifically opset 10 for now), following the proposal in https://gist.github.com/spandantiwari/99700e60919c43bd167838038d20f353.
And add support for custom ops (merge with https://github.com/pytorch/pytorch/pull/18297).

This PR will be followed by another PR containing the changes related to testing the ops for different opsets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19294

Reviewed By: zrphercule

Differential Revision: D15043951

Pulled By: houseroad

fbshipit-source-id: d336fc35b8827145639137bc348ae07e3c14bb1c
2019-05-10 18:37:12 -07:00
1129b3344a move DistillBatchLRLoss Layer from open source to fb
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20291

Reviewed By: chocjy

Differential Revision: D15272181

fbshipit-source-id: 2e0964fa1b1031607134548bb87c4e103c5b1383
2019-05-10 17:46:04 -07:00
3f3ee5600a make trace's errors more helpful in terms of what it can and can't do when tracing module's methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20368

Differential Revision: D15305340

Pulled By: Krovatkin

fbshipit-source-id: bafb13002df5c9741160e96205e0846243dde3ec
2019-05-10 17:40:21 -07:00
7d3d5b73f4 Add multiline type annotation support for Python frontend (#14922)
Summary:
This allows multiline type comments in accordance with [PEP 484](https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code)

```python
torch.jit.script
def foo(x   # type: Tensor
        y   # type: Tuple[Tensor, Tensor]
        ):
    # type: (...) -> Tuple[Tensor, Tensor]
    return x, x
```](https://our.intern.facebook.com/intern/diff/15268432/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14922

Pulled By: driazati

Differential Revision: D15268432

fbshipit-source-id: e9add8d8025e42390a14a643835d15cc67a2f33e
2019-05-10 17:27:41 -07:00
3a39ce0f41 Fix reflection on weak modules, copy attributes (#20190)
Summary:
* Constructs a new type at runtime so that `isinstance` checks work for
weak modules assigned to `ScriptModule`s
* Fix some extraneous names in `__constants__`
* Add `in_features` and `out_features` to `nn.Linear` `__constants__`

Fixes #19363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20190

Pulled By: driazati

Differential Revision: D15302350

fbshipit-source-id: 1d4d21ed44ab9578a4bc2a72396a82e9bbcd387c
2019-05-10 17:14:49 -07:00
00d0ddb140 Add all list specializations to pickler (#20191)
Summary:
TensorList, DoubleList, and BoolList were missing from the pickler, so
this adds them.

As a follow up a lot of the code for these could be templated and cut
down

](https://our.intern.facebook.com/intern/diff/15299106/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20191

Pulled By: driazati

Differential Revision: D15299106

fbshipit-source-id: f10c0c9af9d60a6b7fb8d93cea9f550b1a7e2415
2019-05-10 17:14:42 -07:00
6197eed409 Eliminate a const_cast.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20371

Differential Revision: D15298327

fbshipit-source-id: 566ec842e971ba6f80e333bb874cc5fd2c36b02e
2019-05-10 15:43:12 -07:00
a4ae689636 quantize_rnn_modules in ensemble_export (#20365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20365

Enable quantization of `torch.nn.LSTM` module in the decoder of PyTorch natively exported beam search models.

Reviewed By: jmp84

Differential Revision: D15260631

fbshipit-source-id: bbdd3a30c2c2110986eb7aa7ff11ce1c9090ddf4
2019-05-10 15:33:41 -07:00
a0c2829194 Preserve log_dir arg and member for SummaryWriter (#20382)
Summary:
Given that  tensorboardX and our PyTorch 1.1 release had `log_dir` as the argument for SummaryWriter initialization and member variable (which some users access), we need to  preserve this name. However, we might deprecate this in the future and I've added a `get_logdir` method that can be used in the future.

cc natalialunova, lanpa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20382

Reviewed By: NarineK

Differential Revision: D15300941

Pulled By: orionr

fbshipit-source-id: a29a70fcbc614a32ebfa6c655962fdff081af1af
2019-05-10 14:59:47 -07:00
3aa414c8f2 Add documentation to Dispatch.h (#20339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20339
ghimport-source-id: 99b6cf9f03be8d55674a43c8a7e19a42b75143f1

Differential Revision: D15295652

Pulled By: ezyang

fbshipit-source-id: f3b127ed3f53f13931596c06e181056940443290
2019-05-10 14:31:03 -07:00
10a9ef833c Avoid unnecessary refcount bump in unary operators. (#20331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20331
ghimport-source-id: 26588acca171555182714c6b6f89610564e228a1

Differential Revision: D15295193

Pulled By: ezyang

fbshipit-source-id: 44b7a8b4c9d41003a6559be2af0bd4f0ada54b31
2019-05-10 14:23:22 -07:00
75c6d37bac profile on uses
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20049

Differential Revision: D15242492

Pulled By: Krovatkin

fbshipit-source-id: f6eab3ec9d7a2f59f19905d757cf7c6f9ad2fdf6
2019-05-10 14:18:03 -07:00
c6255a57e4 Remove CPU_tensor_parallel_kernel_apply2 (#20207)
Summary:
This code is unused and has been superseded by TensorIterators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20207

Differential Revision: D15240832

Pulled By: cpuhrsch

fbshipit-source-id: 4f600bb8645f9b28a137e2cefb099978f5152d05
2019-05-10 14:04:32 -07:00
6dc70aa513 add test coverage for make_np (#20317)
Summary:
addresses https://github.com/pytorch/pytorch/pull/16196#discussion_r276381946

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20317

Differential Revision: D15289400

Pulled By: orionr

fbshipit-source-id: 914416a8c1369d95656f556c6e05348957789466
2019-05-10 13:59:48 -07:00
ce033485eb Convenience APIs for script objects (#20226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20226
ghimport-source-id: 22937d72e35ec4eba38019284a368453089fe3eb

Differential Revision: D15243625

Pulled By: suo

fbshipit-source-id: 5e9fb773da244f9ef201dba524155c3b19b2b4e0
2019-05-10 13:03:58 -07:00
50149fb66b Adds quantized addition and renames sum to add (#20233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20233

Adding a quantized addition (without relu)

Reviewed By: jianyuh

Differential Revision: D15245791

fbshipit-source-id: 34ede5d805d9ab0d31e8ae87cefb110504bd3c87
2019-05-10 12:53:33 -07:00
35fed93b1e Adding Poisson NLL loss to libtorch (#19316)
Summary:
This PR add Poisson NLL loss to aten and substitute the python implementation with a call to the c++.

Fixes #19186.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19316

Differential Revision: D15012957

Pulled By: ezyang

fbshipit-source-id: 0a3f56e8307969c2f9cc321b5357a496c3d1784e
2019-05-10 11:57:49 -07:00
ed25b8a667 Add a FAQ entry to explain Cannot insert a Tensor that requires grad as a constant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20181

Differential Revision: D15296816

Pulled By: Krovatkin

fbshipit-source-id: 382e3a0aa982774771e98050cbc65d144cbd959e
2019-05-10 10:55:33 -07:00
ea5c9c9267 Update installing.rst (#20354)
Summary:
Delete useless `cd`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20354

Differential Revision: D15296154

Pulled By: soumith

fbshipit-source-id: 2042b56c91b33e302b0ed9c77f29b9b64079fa98
2019-05-10 10:04:06 -07:00
4ba28deb6e Unify libtorch and libcaffe2 (#17783)
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch".  This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so".  This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.

The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.

The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783

Differential Revision: D15284178

Pulled By: kostmo

fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
2019-05-10 09:50:53 -07:00
872bab22c6 Some essential changes needed before updating the Windows AMI (#20353)
Summary:
1. Add cuda 10.1 build
2. Turn on openmp loop support for VS 2019
3. Remove legacy code about selective builds

Tested through CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20353

Differential Revision: D15294806

Pulled By: ezyang

fbshipit-source-id: 0acf5c3fbbc398fd9ebdf9f97653499d39638432
2019-05-10 09:08:51 -07:00
d68802ba47 Sparse half embeddings on cuda (#19695)
Summary:
```
import torch
a = torch.nn.Embedding(3, 4, sparse=True).half().cuda()
a(torch.LongTensor([1, 0]).cuda()).sum().backward()

```
gave: `RuntimeError: torch.cuda.sparse.HalfTensor is not enabled`

This PR enables sparse.HalfTensor on cuda. Still won't work for CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19695

Differential Revision: D15281162

Pulled By: nairbv

fbshipit-source-id: 0d83d946a059393bd53d8b8102e2daa9b4c02588
2019-05-10 08:00:55 -07:00
148e90ba2a Give clear error message when attempting to merge struct which can't be merged.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19804

Differential Revision: D15098833

fbshipit-source-id: 2950e247c74e125e033cd9cfbf5631eee5298ea0
2019-05-10 07:01:01 -07:00
c2c0a32155 Remove setting logger level in caffe2.python.checkpoint (#19803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19803

There is no reason to set a specific logging level for this module. Removing it to just use the default logging level.

Differential Revision: D15098834

fbshipit-source-id: 1654c04500c19690ddde03343f2e84b04bb0f1ef
2019-05-10 07:00:58 -07:00
2a875fc126 Fix THD->c10 dependency to gflags.h (#20319)
Summary:
Fixed #20250

Not sure if there's any specific design reason to `add_dependecy()` and manually add a few include dir, instead of linking the target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20319

Differential Revision: D15294584

Pulled By: ezyang

fbshipit-source-id: 97f813a6b1829dad49958e0f880b33eb95747607
2019-05-10 06:50:58 -07:00
8726b27333 Fix overlay_vc_env when called by legacy python (#20304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20155.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20304

Differential Revision: D15292369

Pulled By: zdevito

fbshipit-source-id: 7da2e0cb85c98d0fcd4461d39e2a8c57391db60e
2019-05-10 06:44:58 -07:00
c397134d6b Revert D15156384: Dict
Differential Revision:
D15156384

Original commit changeset: b9313ec4dd9a

fbshipit-source-id: 3b44f49ec4eaba692cfb2cfe46e5f98102e337d9
2019-05-10 06:11:25 -07:00
85d56852d3 Revert D15227620: Allow Dict type in c10 operators
Differential Revision:
D15227620

Original commit changeset: c1ea6c12165e

fbshipit-source-id: b2cecfffdd38b7c97e20d0ee81915ea10daf8460
2019-05-10 06:11:22 -07:00
c744468e36 Revert D15227621: Extend testAvailableArgTypes
Differential Revision:
D15227621

Original commit changeset: 83db7536e906

fbshipit-source-id: 8ecc443bd18787de52572fde06df408e1c52c50d
2019-05-10 06:11:18 -07:00
99874f87cb Use registry for BoundShapeInferencer (#20341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20341

Att

Reviewed By: ipiszy

Differential Revision: D15288131

fbshipit-source-id: 956ced99cc5c5b8199f81f7baa844fe8a0505456
2019-05-10 01:16:02 -07:00
a0e5240afc Fix DistributedDataParallelTest.test_accumulate_gradients (#20351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20351

This was broken because of a merge race between #20282 and the stack in #20236.

Cleaned up the test and comments a bit as well.

Differential Revision: D15292786

fbshipit-source-id: a4379ea700cad959d3a6921fc5ddf9384fb8f228
2019-05-09 23:27:18 -07:00
02df1ccd9c Remove const_cast's from subgraph matcher. (#20303)
Summary:
The trick here is that creating a mapping from const values to
const values means that downstream clients that want to mutate
the output of the mapping are stuck.  However, a mapping from
const values to non-const values is just fine and doesn't put
constraints on downstream clients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20303

Differential Revision: D15284076

fbshipit-source-id: 16206fd910dd5f83218525ca301b1889df0586cb
2019-05-09 18:07:14 -07:00
e47b210075 Adding setup job as prereq to html update jobs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20325

Differential Revision: D15287710

Pulled By: pjh5

fbshipit-source-id: 2bbed3a46c4affb5ae4e6dd4feb1dda59aeb5d04
2019-05-09 16:25:22 -07:00
3afd99680c Remove SourceLocation (respin) (#20333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20333
ghimport-source-id: e64075bb82067224463e9955d10bd13967d1975d

Differential Revision: D15284081

Pulled By: zdevito

fbshipit-source-id: ac26ae48392b9daff08f460529c06af8f4e4722a
2019-05-09 16:17:33 -07:00
558c6c4d8a Make DistributedDataParallel usable with CPU models (#20236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20236

Use the new version of broadcast_coalesced that deals with both CPU
and CUDA models. Add tests that evaluate correctness of
DistributedDataParallel for CPU models.

Closes #17757.

Reviewed By: mrshenli

Differential Revision: D15245428

fbshipit-source-id: d2fa09f68593b3cd1b72efeb13f5af23ebd5c80a
2019-05-09 14:11:17 -07:00
f32c9bd5e9 Refactor core DistributedDataParallel tests (#20235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20235

The tests expected to only run for CUDA models. In a future commit we
need to update this to work for CPU models as well. Therefore, we can
no longer rely on only integers being passed for device identifiers.
With this change we pass both the materialized list of devices to use
(as `torch.Device` objects), as well as an optional list of integers.
The latter is specified to exercise the code in the
DistributedDataParallel constructor that turns a list of integers into
CUDA devices, IFF it is used to wrap a single-device CUDA module.

This commit also groups together the 'str' and non-'str' tests. These
used to test passing the list of devices as integers or as
`torch.Device` instances. These are now executed from the same test.

Reviewed By: mrshenli

Differential Revision: D15245429

fbshipit-source-id: 5797ba9db33d2c26db8e7493c91bb52f694285ac
2019-05-09 14:11:14 -07:00
caa0d0c50a Add c10d::broadcast_coalesced and tests (#20234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20234

The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.

This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.

See #17757 for more information.

Reviewed By: mrshenli

Differential Revision: D15228099

fbshipit-source-id: a2113ba6b09b68cb5328f49f4c1960031eb43c93
2019-05-09 14:11:08 -07:00
c31fccd678 Fix crash issue in conv+sum fusion for MKLDNN on caffe2 (#20139)
Summary:
The isConvFusion(...) is only for Conv op.
If non-Conv op, the crash takes place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20139

Differential Revision: D15280604

Pulled By: yinghai

fbshipit-source-id: eb45be11990b3bf7c5b45f02ebb6018444ab5357
2019-05-09 13:53:46 -07:00
8c3a7bb57f Move librosa and psutil installation from CI script to docker images build script (#20299)
Summary:
pip install librosa randomly coredump, causes CI flakiness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20299

Differential Revision: D15276270

Pulled By: bddppq

fbshipit-source-id: 9105106f41aaacf620751290b016359ef7d665b3
2019-05-09 13:48:29 -07:00
e870b11ae6 Revert D15275731: Remote SourceLocation
Differential Revision:
D15275731

Original commit changeset: f4da178c3137

fbshipit-source-id: 830b79735eb2dadc4795b5aae407826bf20ef121
2019-05-09 13:07:11 -07:00
eca91de5d2 Remote SourceLocation (#20300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20300
ghimport-source-id: 06f606c4db3b70b1d2ed9f6ed4542c3f703c4e17

Differential Revision: D15275731

Pulled By: zdevito

fbshipit-source-id: f4da178c31372c2264feb9f99476b9c9aa66c1f2
2019-05-09 11:48:29 -07:00
726661b152 profiler: improve repr for averaged events (#20281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20281

This is how it looks like now:

```
<FunctionEventAvg key=mm self_cpu_time=11.404s cpu_time=2.895ms
cuda_time=0.000us input_shapes=[[26, 4096], [4096, 1024]]>
```

Before I forgot to update the repr for these when updated it for not
averaged events

Differential Revision: D15262862

fbshipit-source-id: a9e5b32c347b31118f98b4b5bf2bf46c1cc6d0d2
2019-05-09 11:34:52 -07:00
39af9563e2 Re-enable CUDA tests for C++ API (#20238)
Summary:
CUDA tests for C++ API were not running on the CI due to a missing character in https://github.com/pytorch/pytorch/pull/11554. This PR fixes the bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20238

Differential Revision: D15279945

Pulled By: yf225

fbshipit-source-id: 1fa7439cd40b6f2fba6792eb790bacf0d67d0054
2019-05-09 11:06:51 -07:00
fe714862dd Extend testAvailableArgTypes (#20185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20185

This test case now also tests that the argument type works correctly in kernels that
 - don't return outputs
 - return multiple outputs

Reviewed By: li-roy

Differential Revision: D15227621

fbshipit-source-id: 83db7536e9065e0f8c5032d6b96e970bbaf718b3
2019-05-09 10:54:16 -07:00
e0166f4670 Allow Dict type in c10 operators (#20184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20184

- Add support for Dict<Key, Value> arguments and returns to c10 operators
- Add support for std::unordered_map<Key, Value> to the legacy API (but not to c10 kernels)

Reviewed By: li-roy

Differential Revision: D15227620

fbshipit-source-id: c1ea6c12165e07b74272cb48c6021bdb5c2d7922
2019-05-09 10:54:12 -07:00
c92129033a Dict (#19976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19976

Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.

Reviewed By: li-roy

Differential Revision: D15156384

fbshipit-source-id: b9313ec4dd9acb3b6a0035345b6ba4f2a437d1e5
2019-05-09 10:54:07 -07:00
2019f6cd51 Add unit test to ensure no gradients sync when calling ddp.module(input) (#20282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20282

Add unit test to ensure no gradients sync when calling ddp.module(input), e.g not invoking prepare_for_backward

PyText is depending on DDP for data parallel distributed training. To support accumulate gradients locally before gradients sync, we are calling orig_model.forward instead of ddp_model.forward. Add a unit test to avoid changes break the assumption.

Reviewed By: pietern, mrshenli

Differential Revision: D15263155

fbshipit-source-id: 7734e174f507690fb23ea6c52dffff4a93f9b151
2019-05-09 10:15:19 -07:00
899bddeeb6 fix typo in adaptive methods annotation (#20306)
Summary:
fixes #20215
The confusing behavior was caused by typos in type annotation :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20306

Differential Revision: D15276216

Pulled By: ailzhang

fbshipit-source-id: 1b0c9635a72a05c9b537f80d85b117b5077fbec7
2019-05-09 09:29:37 -07:00
83a80d2b31 Add test/test_namedtensor.py (#20168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20168
ghimport-source-id: 78bd3c4b6bc87c216ce33dba13b61feb87e5fe53

Reviewed By: gchanan

Differential Revision: D15278222

Pulled By: zou3519

fbshipit-source-id: 3bcdb1cb654400350d42464dd9e0d5e0a7116e1e
2019-05-09 09:09:22 -07:00
199fa12dee Add namedtensor build and test to the CI (#20163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20163
ghimport-source-id: 6a821a708a20dbfa84c31a8af067d451c59b3964

Reviewed By: gchanan

Differential Revision: D15278216

Pulled By: zou3519

fbshipit-source-id: 0bc28929f8b3ce317de06f1fc275e586c649785c
2019-05-09 09:09:19 -07:00
e01a5bf28b Add USE_NAMEDTENSOR compilation flag. (#20162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162
ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5

Reviewed By: gchanan

Differential Revision: D15278211

Pulled By: zou3519

fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf
2019-05-09 09:09:16 -07:00
f23fb66e6e Fix in file position logic: file descriptor and Python-side handle (#20270)
Summary:
This addresses #18436

The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)

This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270

Differential Revision: D15275902

Pulled By: soumith

fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
2019-05-09 08:20:01 -07:00
c406bf20a0 error instead of crashing on attempt to subclass typed tensors (#20283)
Summary:
https://github.com/pytorch/pytorch/issues/20052

typed tensors (e.g. torch.FloatTensor) can't be subclassed. Was causing
crashes and other errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20283

Differential Revision: D15278138

Pulled By: nairbv

fbshipit-source-id: 8493eac4d34dfb76b054362bf0acec02146cd0e2
2019-05-09 07:10:38 -07:00
1e35ef07e9 Switch off USE_DISTRIBUTED on default for MSVC (#20302)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20302

Differential Revision: D15277733

Pulled By: ezyang

fbshipit-source-id: a8915939d033a04f962908d19bbad840b6234e27
2019-05-09 06:29:31 -07:00
2aea5b6335 Fixed Softmax doc to specify dimension to prevent warning in 1.1.0. (#20310)
Summary:
See Issue #20301

Specifying dim in docstring example to prevent UserWarning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20310

Differential Revision: D15277734

Pulled By: ezyang

fbshipit-source-id: 2e8b748dbe743675a5a538ccbe97713aad02e8ac
2019-05-09 06:21:57 -07:00
0087069dce Use torch::get/set_num_threads without additional includes beyond torch/torch.h (#20176)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20130.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20176

Differential Revision: D15275036

Pulled By: yf225

fbshipit-source-id: 0f04e1fbfed18c07030b20e92e957ef5f2b5707d
2019-05-09 06:08:27 -07:00
916ef76817 Sort User Defined Classes (#19706)
Summary:
Schema matching for sort is a little tricky because you have to check whether the class defines the __lt__ method correctly, and you have to mark whatever effects happen in __lt__ to the sorting op as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19706

Differential Revision: D15244366

Pulled By: eellison

fbshipit-source-id: 73b3e36462c6cc40f9d8cb235b44499a67d3149e
2019-05-09 00:07:51 -07:00
1102f4c56e Bump op_version_set (#19812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19812
ghimport-source-id: 7cdb24c6a7501c6ec5f0eae07325512746f5abb9

Differential Revision: D15102803

Pulled By: zdevito

fbshipit-source-id: bedf0bd6e1170fa294c65c87df75b82d8694f89c
2019-05-08 23:21:22 -07:00
3d4d7b9082 Refactor ChunkDataReader API + fix missing headers (#19485)
Summary:
This PR restricts the BatchType template argument of ChunkDataReader to STL
vectors only. Internally, ChunkDataReader was assuming BatchType was a
vector, but the user could pass any type to the template argument,
leading to compiling issues during CPP extensions.

Additionally to the proposed API change, this PR adds missing include
headers to chunk.h. Currently the current implementation works but if
users try to create C++ extensions that implements new ChunkDataReaders
to be along with the existing ChunkDataset, the build will fail due to
the missing headers.

In terms of functionality, nothing has changed. This PR simply makes the
implementation slightly more robust for future extensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19485

Differential Revision: D15261725

Pulled By: soumith

fbshipit-source-id: 38c9465d665392ae6a2d12c5a520a4f501e1a6ca
2019-05-08 22:20:19 -07:00
bed1d7d3ff Eliminate some const_cast's.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20279

Differential Revision: D15261884

fbshipit-source-id: 8daf76a7031ca5a648995218fd14a9aa143612dc
2019-05-08 21:51:24 -07:00
d6815e1e27 Only record grad_fn in C++ Scatter and Gather when required so (#20286)
Summary:
C++ `Scatter` and `Gather` always set autograd history for input data tensors regardless whether they require grad. This hits assertion failure in `set_history(Tensor, shared_ptr<Function> grad_fn)`
where `grad_fn` cannot be nullptr. After this PR, C++ `Scatter` and `Gather` only record `grad_fn` when required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20286

Differential Revision: D15266610

Pulled By: mrshenli

fbshipit-source-id: 641df0ea36e7c922b5820c8dc3f83e2a050412b5
2019-05-08 21:04:21 -07:00
2179d5b32b Dynamic quantized full LSTM module (#20249)
Summary:
Previously we only had a Python wrapper for `torch.quantized_lstm_cell`. We had the op `torch.quantized_lstm`, but it didn't have a wrapper. This makes the wrapper
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20249

Reviewed By: driazati

Differential Revision: D15250023

Pulled By: jamesr66a

fbshipit-source-id: f05ad784d903e0ef3a62633c8bf80bad79de48ae
2019-05-08 19:46:59 -07:00
7bc8562a9a Enable ONNX constant folding in test_pytorch_onnx_caffe2.py tests. (#20290)
Summary:
This is a step towards enabling the ONNX constant folding pass by default in the PT->ONNX export. In this change we have enabled test points in `test/onnx/test_pytorch_onnx_caffe2.py`  to run with constant folding pass enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20290

Reviewed By: zrphercule

Differential Revision: D15271674

Pulled By: houseroad

fbshipit-source-id: 9e59ab46ae74b4ad8dea1a2200ecc1f3eb8aad75
2019-05-08 18:47:03 -07:00
3ee97183b0 ScaleBlobs Operator (#19660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660

Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.

Reviewed By: BIT-silence

Differential Revision: D14984233

fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611
2019-05-08 17:57:33 -07:00
4d676d53a6 split canonicalize_ops, make a decompose pass (#19988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19988
ghimport-source-id: 1dbf39e07099fa24ef9a6c0221312bf01a8011b7

Differential Revision: D15190355

Pulled By: wanchaol

fbshipit-source-id: 83f2b6557efd758810ccb4a4229d71fdebfd06e0
2019-05-08 17:21:59 -07:00
33b4afe3bb dont make alias for none value (#20112)
Summary:
Don't make an alias value for a value that is known to be None. This was preventing constant propagation from running the `out is None` check in nn.functional.normalize, and thus preventing the if statement from being inlined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20112

Differential Revision: D15267328

Pulled By: eellison

fbshipit-source-id: 5b878b0dc50944c2e7a2f583ea483dad9d6bbec3
2019-05-08 17:10:19 -07:00
8ebb86dd3a Support torch.save for saving values during execution (#18154)
Summary:
This PR makes `torch.save` call out to the pickler which saves a tensor in the same format that `torch.save()` does, the file looks like `| pickle archive 1 (includes sizes, strides, requires_grad, etc...) | pickle archive 2 (list of tensor keys) | tensor binary data |` and can be read back in with `torch.load(my_file, pickle_module=torch.jit._pickle)`

Fixes #18003

Unpickling in the JIT for things such as model parallelism will be a follow up PR
](https://our.intern.facebook.com/intern/diff/15015160/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18154

Pulled By: driazati

Differential Revision: D15015160

fbshipit-source-id: ef76a44b8c243f4794cd7e245ec8305e965bc59f
2019-05-08 16:52:53 -07:00
27c82c03c5 Unify code path for native mkldnn conv and reorder on-the-fly mkldnn conv (#19963)
Summary:
Also, current mkldnn primitive code recreate the computation everytime, causes tiny Convolutions spending significant portion of its time on the repeated codegen. ideep has implemented an lru cache to save the computation and so this change will help us improving perfs for tiny Convolutions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19963

Differential Revision: D15156527

Pulled By: bddppq

fbshipit-source-id: 6a8fbd10a213ec22cdeaff1a2bdb0d09905d1fcd
2019-05-08 15:30:05 -07:00
b3bce01e26 Have add_video use NamedTemporaryFile directly (#20223)
Summary:
address comment in #16196
https://github.com/pytorch/pytorch/pull/16196/files#r278676986

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20223

Reviewed By: natalialunova

Differential Revision: D15261528

Pulled By: orionr

fbshipit-source-id: 1aebcc6cb1c9313d890c5b506973855ebc63fb3b
2019-05-08 15:00:44 -07:00
35de90e324 Canonicalize order of If and Loop outputs (#20015)
Summary:
Canonicalize the ordering of outputs of if and loop nodes based on their first usage. Previously we were able to canonicalize output order by sorting on variable name, but this breaks down with outputs added in an early return pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20015

Differential Revision: D15266066

Pulled By: eellison

fbshipit-source-id: ba5340c068a68b1ffc73f056db194b92d3274dc4
2019-05-08 14:52:07 -07:00
1ab33fce9a Disable worker_kill & holder_iter_reference combination in test_proper_exit (#20172)
Summary:
cc nairbv
All failures I have seen are of this combination. So let's just disable it for all cases. After #20063 I find it failing for py3 once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20172

Differential Revision: D15266527

Pulled By: nairbv

fbshipit-source-id: afb9389dfc54a0878d52975ffa37a0fd2aa3a735
2019-05-08 14:39:47 -07:00
7edf9a25e8 Clarify API and add examples for all methods (#20008)
Summary:
As a part of supporting writing data into TensorBoard readable format, we show more example on how to use the function in addition to the API docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20008

Reviewed By: natalialunova

Differential Revision: D15261502

Pulled By: orionr

fbshipit-source-id: 16611695a27e74bfcdf311e7cad40196e0947038
2019-05-08 14:06:10 -07:00
4a086f700f Quantized FCRelu operator (#19750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19750

Add the FCRelu operator.

Differential Revision: D15079097

fbshipit-source-id: b8e8a591158ad2355bf7acd14476993bbc9beae6
2019-05-08 14:00:40 -07:00
e8cdfb5d23 Use QTensor with quantized FC operator (#19541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19541

For the quantized FC operator, replace the tuple (Tensor, scale, zero_point) with QTensor.

Differential Revision: D14900407

fbshipit-source-id: 164df38f3564e0a68af21b9fedaba98a44ca1453
2019-05-08 14:00:37 -07:00
8defcbfcf4 Enable caffe2 softmax tests with ROCm 2.4 (#20280)
Summary:
cc xw285cornell petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20280

Reviewed By: xw285cornell

Differential Revision: D15262695

Pulled By: bddppq

fbshipit-source-id: d72490ff599cdab0331230bc9b12075085386319
2019-05-08 13:29:11 -07:00
1deb8dde58 Quantized FC operator (#19497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19497

Implements a basic quantized FC (uint8 * int8 -> uint8) with FBGEMM APIs.

Related document:
https://fb.quip.com/rP5tAx56ApMM
https://fb.quip.com/MU7aAbzGDesu

Work Item List:
1. [DONE] currently we use prepack routines inside Quantized FC operator. Will separate it as a standalone operator soon.
2. [DONE] rebase to D14817809 and D14994781 (cpp custom types).
3. [DONE] correctness unit test.

4. [To Do] rebase to QTensor. Similar to D14565413, this will be implemented in the next Diff.

Differential Revision: D14761865

fbshipit-source-id: 031a39915fecd947afb4dd2719112b4ddc1082d3
2019-05-08 13:17:51 -07:00
7b733e4fc1 Rebase conflict fix for isFusableDevice (#20251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20251
ghimport-source-id: 0c8c1847a7979fcd77e4f6618730b170b6b8ce25

Differential Revision: D15262850

Pulled By: bwasti

fbshipit-source-id: 17ecc340a310ddbcce141cfa3ee0efa9660194d2
2019-05-08 12:14:12 -07:00
c931d7e9d2 SubgraphRewriter: Add a support for arbitrary replacement graphs in subgraph rewriter. (#20084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20084
ghimport-source-id: 91b3b0b66da00c6592a2d57c8f2a88a73c019d1a

Differential Revision: D15190191

Pulled By: ZolotukhinM

fbshipit-source-id: d57ba6b6790ea2fd277b2feb3f4a58895ed15486
2019-05-08 11:50:46 -07:00
b3324d0fe3 SubgraphRewriter: Expose runOnGraph and use it in tests. (#20083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20083
ghimport-source-id: e4d425775c2a2fb5ed334727e902a91f744b697c

Differential Revision: D15190192

Pulled By: ZolotukhinM

fbshipit-source-id: 5fbcd61fa631d8f22b5016754f8d1a46eefb19c5
2019-05-08 11:50:43 -07:00
8a6072c3bd SubgraphRewriter: Rename pattern fusion to subgraph rewrite. (#20082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20082
ghimport-source-id: f0594f4ad918288fb3158b4ecfa8010cf09dd0c2

Differential Revision: D15190193

Pulled By: ZolotukhinM

fbshipit-source-id: 81b026398c94f2fbf7487cafbb86b7364a78d827
2019-05-08 11:22:29 -07:00
1a85e57334 flake fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20278

Differential Revision: D15261153

Pulled By: eellison

fbshipit-source-id: 2b210d50bc49c7a91605f1bf957890445077d21f
2019-05-08 10:51:10 -07:00
2a104f7383 Port ATen/native to ATen/Parallel (#20043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20043
ghimport-source-id: 8003ef8ca335d4c4717a886a3a75fd78ec53ade5

Differential Revision: D15248505

Pulled By: ilia-cher

fbshipit-source-id: 7be500ed8bfb23cc36f1dd7108e344319e3e5332
2019-05-08 10:33:43 -07:00
a7db3a7591 Error out if git log fails in setup_ci_environment (#20231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20231
ghimport-source-id: f58c44a52c719bf3c5cdcaa1f1e9624466423ace

Reviewed By: ezyang

Differential Revision: D15244497

Pulled By: zou3519

fbshipit-source-id: 87ca362aaa6a9040d32080b208272dacf7e45f63
2019-05-08 10:05:42 -07:00
0626ea4300 Run 'checkout' before 'setup ci environment' on pytorch linux tests (#20213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20213
ghimport-source-id: ea395ee1151207f37e030ed0126564ff7a73bac8

Reviewed By: ezyang

Differential Revision: D15244428

Pulled By: zou3519

fbshipit-source-id: 45ba2a80a2a9d056f806b5960bfe20f3c60f1e33
2019-05-08 10:05:39 -07:00
cd72be20e0 Update ROCm 2.4 (#20253)
Summary:
xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20253

Reviewed By: ezyang

Differential Revision: D15256826

Pulled By: bddppq

fbshipit-source-id: 405c21fc727d8145c4d3ca4fe8d84804569ebe53
2019-05-08 09:35:40 -07:00
ede38bd743 Port THNN to ATen/Parallel (#20032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20032
ghimport-source-id: f8e1a4c7726585f2d1c894b64e3687ede6177282

Differential Revision: D15247960

Pulled By: ilia-cher

fbshipit-source-id: de784c0002cbe074c87f912a5824e8a75683dbf8
2019-05-08 09:31:39 -07:00
fa6a00f313 Fix memory leak in torch._dirichlet_grad() (#20244)
Summary:
Fixes https://github.com/pyro-ppl/pyro/issues/1853

This fixes a memory leak in `torch._dirichlet_grad()`. This function is used for reparametrized gradients for the `Dirichlet` and `Beta` distributions.

- [x] Could a reviewer please confirm that `freeCopyTo()` is being used correctly and doesn't need an additional `decref()`? The author is unfamiliar with PyTorch C++ memory utilities. Help appreciated.

- ran locally and confirmed leak is fixed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20244

Differential Revision: D15259008

Pulled By: ezyang

fbshipit-source-id: 222ec7d80ddd97bcdd7d54549f3e756575e8402e
2019-05-08 07:45:39 -07:00
10715ffc30 Mention issue number in the JIT workaround comments (#20222)
Summary:
This just updates the `JIT` comments with the issue number #20215.  Hopefully this will stop the proliferation of the workaround. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20222

Differential Revision: D15259007

Pulled By: ezyang

fbshipit-source-id: 5060a351aa618c6dae49d0b7a6ac9b0f57f2490a
2019-05-08 07:34:30 -07:00
a5ff09782e Fix missing files for upload jobs (#20265)
Summary:
The earlier fix to extract scripts missed an attach_workspace which was used to make the built binaries available to the nightly build upload jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20265

Differential Revision: D15259080

Pulled By: soumith

fbshipit-source-id: bf835c2cd76976b4563798ee348f7db83c7a79c1
2019-05-08 07:02:41 -07:00
2db9066a41 Fix formatting for note in eig. (#19743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19743
ghimport-source-id: fcb5f1aa3ee3d71e06ac1b8fbe6d6859a3547d63

Reviewed By: zou3519

Differential Revision: D15258642

Pulled By: ezyang

fbshipit-source-id: 7091fc3e7c829542a65ae3a490912d8d13aadfb3
2019-05-08 06:37:21 -07:00
d6f62b70f3 Fix cuda and cudnn libraries search process on Windows (#20205)
Summary:
Fixes #20202
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205

Differential Revision: D15258626

Pulled By: ezyang

fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770
2019-05-08 06:08:47 -07:00
d24c0aa82f Remove explicit checks for parallelism from TH (#20002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20002
ghimport-source-id: b2037d3226d52ac672578700f77aca215eb309a0

Differential Revision: D15232028

Pulled By: ilia-cher

fbshipit-source-id: 2c818ba819c95e1d709e0cea8c4f7816fddbcfc1
2019-05-08 02:33:31 -07:00
0ebe252c9c Port TH library to ATen/Parallel (#19105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19105
ghimport-source-id: db3e26f89d098e86215c48e464ace615193f5772

Differential Revision: D14947557

Pulled By: ilia-cher

fbshipit-source-id: 7e987e74c034646ba818f02e7bd711aba2ee3364
2019-05-08 01:07:17 -07:00
26dd65eaf8 Namespace isolation for classes (#19903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19903
ghimport-source-id: deadf59f469ad620d0ee10b089dfc9bb92171710

Differential Revision: D15118978

Pulled By: suo

fbshipit-source-id: f2b487fd65520d1b7f45cb74145634d334ef1614
2019-05-07 22:48:31 -07:00
e41aa0ed2f fix parsing bugs (#20246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20246
ghimport-source-id: 611ea44dc6c1d31cee99fe3e89628be072a4a381

Differential Revision: D15247081

Pulled By: zdevito

fbshipit-source-id: 4b836ce6b665c26bb6eb5347206c55d12807fae6
2019-05-07 19:35:51 -07:00
21a3895c7d Extract repeated scripts into files (#19674)
Summary:
The current pytorch config.yml is causing some backend performance
problems on CircleCI, due to the size of the file when all of the YAML
anchors have been expanded. You can view the "processed" config as our
internal system deal with it by running `circleci config process`.

    circleci config process .circleci/config.yml | wc -c

    Before: 2833769 bytes
    After:   558252 bytes (~80% less)

Add create a new job, `setup`, that has 2 functions:
- Assert that config.yml is up to date
- Put the .circleci/scripts directory into a workspace, so that
downstream jobs can easily access it.

The `setup` job becomes the parent of all jobs in the workflow. This
allows us to fail fast if config is invalid. It might be a good place to
add other, quick, lint checks to help fail the build faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19674

Differential Revision: D15252864

Pulled By: pjh5

fbshipit-source-id: 0778c7b8f95e7f3f33ac92fbb8862377fc9fb0ac
2019-05-07 19:04:00 -07:00
42e9a619b3 add decay parameter in ref_adagrad (#15329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15329

Add decay parameter to match with C++ Adagrad implementation.

Reviewed By: chocjy

Differential Revision: D13300991

fbshipit-source-id: db734df0202d8f5fd156f2742207d0b5a3aa7348
2019-05-07 18:58:58 -07:00
f0b5ad8919 add str builtin support (#20188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20188
ghimport-source-id: 390b889f67155eb6168a96d05307ec4e86f5c631

Differential Revision: D15228812

Pulled By: zdevito

fbshipit-source-id: 789efa2e0b2f9518ed97ac877b59637e5b0ebcf4
2019-05-07 18:51:14 -07:00
9fcf585475 Autograd profile recording in c10 custom ops (#20175)
Summary:
This ensures that custom operators registered through c10::RegisterOperators are recorded in autograd profile traces.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20175

Differential Revision: D15221311

Pulled By: jamesr66a

fbshipit-source-id: 9452b24272c2399c20a49af85b62d34cabe6e27a
2019-05-07 18:41:36 -07:00
fb9d9fbd4e smoke test for add_graph (#20007)
Summary:
Do tests with common models from torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20007

Differential Revision: D15251754

Pulled By: orionr

fbshipit-source-id: 9dc09bd407b3ccaaa310d2f4a8d53d5a7d12469d
2019-05-07 18:29:25 -07:00
3ac4d92824 tweak scripts/build_android.sh for ABI and header install (#20152)
Summary:
We now can build libtorch for Android.
This patch aims to provide two improvements to the build
- Make the architecture overridable by providing an environment variable `ANDROID_ABI`.
- Use `--target install` when calling cmake to actually get the header files nicely in one place.

I ran the script without options to see if the caffe2 builds are affected (in particularly by the install), but they seem to run OK and probably only produce a few files in build_android/install.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20152

Differential Revision: D15249020

Pulled By: pjh5

fbshipit-source-id: bc89f1dcadce36f63dc93f9249cba90a7fc9e93d
2019-05-07 17:44:07 -07:00
faf2c3ac26 Standard gamma's export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20126

Reviewed By: zrphercule

Differential Revision: D15243663

Pulled By: houseroad

fbshipit-source-id: 7f5f63f37462a844b03b98783c10e6c21f608a52
2019-05-07 17:37:25 -07:00
48e649b803 Automatic update of fbcode/onnx to 5bde6371620b76302864bce90f521d72eda95d0e (#20232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20232

Previous import was 7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef

Included changes:
- **[5bde6371](https://github.com/onnx/onnx/commit/5bde6371)**: Add shape inference for legacy auto_pad modes (#1988) <stevenlix>
- **[6c9b3407](https://github.com/onnx/onnx/commit/6c9b3407)**: Move Quantization working group to completed state (#1980) <Prasanth Pulavarthi>
- **[8eba124e](https://github.com/onnx/onnx/commit/8eba124e)**: Define the IR acronym (#1985) <Senja Filipi>

Reviewed By: bddppq

Differential Revision: D15244357

fbshipit-source-id: e1a4eaee5de44f9eb944a290afac7a678b50b033
2019-05-07 17:32:48 -07:00
6606ac5d41 Support operator overloading for UDT (#20033)
Summary:
Support operator overloading for User Defined Types, which includes desugaring `a + b` and python builtin functions which call into a method if it is defined like `len(x)`.

See https://rszalski.github.io/magicmethods/ for list of magic methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20033

Reviewed By: driazati

Differential Revision: D15246573

Pulled By: eellison

fbshipit-source-id: 03d45dd524ea2a3b40db36843d6067bede27b30d
2019-05-07 17:28:06 -07:00
9c62280ea8 Remove brew libomp from binary mac machines (#20228)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/20030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20228

Differential Revision: D15246258

Pulled By: pjh5

fbshipit-source-id: f57033af74b678566be02cdf5700b5ae6e154d4a
2019-05-07 17:09:40 -07:00
eecf52b444 Fix in benchmark_test_generator (#20237)
Summary:
Add missing import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20237

Differential Revision: D15245957

Pulled By: ilia-cher

fbshipit-source-id: 0f71aa08eb9ecac32002a1644838d06ab9faa37c
2019-05-07 17:03:25 -07:00
8a375189ea Remove flake8 E303 (too many blank lines) (#20225)
Summary:
similar to too few blank lines, I feel like this is not important enough to warrant breaking signal for all linters when it's violated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20225

Differential Revision: D15243480

Pulled By: suo

fbshipit-source-id: 37cdc18daf09e07081e42b69c72d331d81660217
2019-05-07 16:29:15 -07:00
ba3e6de4c2 Fix test_namedtuple_return (#20212)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/20198

Input matrix created randomly might be rank deficient or not positive semidefinite

Not tested yet, will look at CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20212

Differential Revision: D15241331

Pulled By: zou3519

fbshipit-source-id: b71cdc5d5b622b5423a43b86d75cfaf3d9484100
2019-05-07 15:58:25 -07:00
785583a435 Use ignore=dirty in submodules. (#20135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20135
ghimport-source-id: 73a07e07ed9485f80374262de2fb9b87e687a47a

Differential Revision: D15214187

Pulled By: zdevito

fbshipit-source-id: 2f2272f0ee7dad3935e6c31897a0b635b4e66133
2019-05-07 15:41:19 -07:00
864cfbc216 PyTorch Profiler Shape aggregation support (#20035)
Summary:
This is useful when you would like to understand performance
bottlenecks of your model.  One can use the shape analysis in order to
fit model to a roofline model of their hardware.

Please note that this feature can potentially skew profiling
results. Also timing for not nested events will become wrong. One
should only use timing for the bottom most events when shape analysis
is used. Also for the case where people don't need shapes, profiling
should not be affected. As in this case we don't collect shapes, which
is the default behavior and this diff doesn't change it.

One of the next steps
could be, for example, choosing best candidates for quantization. In
the scope of this diff I am just adding optional shapes collection
into the Even class. After that in python there is minor functionality
for providing groupping by shapes.

In the output tables shapes are being truncated but in groupping full
shape string is used as a key.

Here is an example output:

test_profiler_shapes (test_autograd.TestAutograd) ...
```
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
unsigned short      2.30%            305.031us        2.30%            305.031us        305.031us        NaN              0.000us          0.000us          1                [[30, 20]]
addmm               69.40%           9.199ms          69.40%           9.199ms          9.199ms          NaN              0.000us          0.000us          1                [[30], [128, 20], [20, 30], [], []]
unsigned short      0.98%            129.326us        0.98%            129.326us        129.326us        NaN              0.000us          0.000us          1                [[40, 30]]
addmm               27.32%           3.621ms          27.32%           3.621ms          3.621ms          NaN              0.000us          0.000us          1                [[40], [128, 30], [30, 40], [], []]
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 13.255ms
CUDA time total: 0.000us

------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
unsigned short      2.30%            305.031us        2.30%            305.031us        305.031us        NaN              0.000us          0.000us          1                [[30, 20]]
addmm               69.40%           9.199ms          69.40%           9.199ms          9.199ms          NaN              0.000us          0.000us          1                [[30], [128, 20], [20, 30], [], []]
unsigned short      0.98%            129.326us        0.98%            129.326us        129.326us        NaN              0.000us          0.000us          1                [[40, 30]]
addmm               27.32%           3.621ms          27.32%           3.621ms          3.621ms          NaN              0.000us          0.000us          1                [[40], [128, 30], [30, 40], [], []]
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 13.255ms
CUDA time total: 0.000us
```

Also added this for older aggregation test:

```
test_profiler_aggregation_lstm (test_autograd.TestAutograd) ...
======================================================================================================================================================================================================
TEST
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
lstm                     0.69%            4.606ms          5.30%            35.507ms         35.507ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.67%            4.521ms          5.27%            35.340ms         35.340ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.66%            4.399ms          5.02%            33.638ms         33.638ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.354ms          4.92%            32.958ms         32.958ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.351ms          4.96%            33.241ms         33.241ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.323ms          5.10%            34.163ms         34.163ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.304ms          4.92%            32.938ms         32.938ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.300ms          5.10%            34.172ms         34.172ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.292ms          5.05%            33.828ms         33.828ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.263ms          4.98%            33.357ms         33.357ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 670.120ms
CUDA time total: 0.000us

-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
sigmoid                  15.32%           102.647ms        15.32%           102.647ms        171.078us        NaN              0.000us          0.000us          600              [[3, 20]]
mul                      15.20%           101.854ms        15.20%           101.854ms        169.757us        NaN              0.000us          0.000us          600              [[3, 20], [3, 20]]
lstm                     12.74%           85.355ms         100.00%          670.120ms        33.506ms         NaN              0.000us          0.000us          20               [[5, 3, 10]]
addmm                    11.16%           74.808ms         11.16%           74.808ms         249.361us        NaN              0.000us          0.000us          300              [[80], [3, 20], [20, 80], [], []]
tanh                     9.89%            66.247ms         9.89%            66.247ms         165.617us        NaN              0.000us          0.000us          400              [[3, 20]]
split                    6.42%            43.019ms         6.42%            43.019ms         215.095us        NaN              0.000us          0.000us          200              [[3, 80]]
add                      5.67%            38.020ms         5.67%            38.020ms         190.101us        NaN              0.000us          0.000us          200              [[3, 80], [3, 80], []]
add                      4.81%            32.225ms         4.81%            32.225ms         161.124us        NaN              0.000us          0.000us          200              [[3, 20], [3, 20], []]
addmm                    3.79%            25.380ms         3.79%            25.380ms         253.796us        NaN              0.000us          0.000us          100              [[80], [3, 10], [10, 80], [], []]
unsigned short           3.72%            24.925ms         3.72%            24.925ms         83.083us         NaN              0.000us          0.000us          300              [[80, 20]]
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 670.120ms
CUDA time total: 0.000us

Total time based on python measurements:  691.366ms
CPU time measurement python side overhead: 3.17%
ok
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20035

Differential Revision: D15174987

Pulled By: salexspb

fbshipit-source-id: 9600c5d1d1a4c2cba08b320fed9da155d8284ab9
2019-05-07 14:47:01 -07:00
831bd1c27d support onnx export rnn with batch_first=True (#19766)
Summary:
Also fixed test_pytorch_onnx_caffe2.py rnn tests which are not really testing batch_first = True.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19766

Reviewed By: zrphercule

Differential Revision: D15220950

Pulled By: houseroad

fbshipit-source-id: 96833af7c569f1f939174ba672704f7af87d69f8
2019-05-07 14:19:25 -07:00
e58817fed9 Make graph->param_node()->next() the first node (#19788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19788
ghimport-source-id: fec4b7ea6c4cdb6bf3624262ea4e37f2641d4a6f

Differential Revision: D15094260

Pulled By: zdevito

fbshipit-source-id: b415f029afe4163e9d0bd97a4e0c56c9e625c765
2019-05-07 14:03:02 -07:00
bc5398451e Enable ROCm multi-gpu with Gloo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18640

Differential Revision: D15185822

Pulled By: bddppq

fbshipit-source-id: 1b49ab3fb0f251cfc7ef3ddd62033ae0065a4ec3
2019-05-07 09:55:47 -07:00
sdg
cf55670bdd Add proper __repr__ to LogSoftMax (#20018)
Summary:
Fixes #19961
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20018

Differential Revision: D15218171

Pulled By: ezyang

fbshipit-source-id: 36bdf44d3b7a6df6a6ec5275a74741d4b057d3b4
2019-05-07 08:38:09 -07:00
b8256280ce Working on component-wise transformations that mimic torch.cat and torch.stack (#11868)
Summary:
As discussed in #11755 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11868

Differential Revision: D10032248

Pulled By: ezyang

fbshipit-source-id: d3a81c19f65a3e716f7f1cfc0a42b86c32fc484c
2019-05-07 07:49:29 -07:00
f7a7868820 add process_group in convert_sync_batchnorm (#19240)
Summary:
In line 508.
convert_sync_batchnorm is called recursively to convert the bn to syncbn, thus the process_group also should be passed in the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19240

Differential Revision: D15240318

Pulled By: ezyang

fbshipit-source-id: 0fc9e856392824814991e5e9e8f9513d57f311af
2019-05-07 06:51:18 -07:00
2356fac9a5 Add DirichletFullyConnectedActor to Soft Actor-Critic
Summary: This can be used for problems where the action vector must sum to 1

Reviewed By: kittipatv

Differential Revision: D15206348

fbshipit-source-id: 665fbed893d8c52d451a12d3bb2e73b2638b7963
2019-05-06 23:52:35 -07:00
4ca325df87 Add Custom graph fusion (#18588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18588
ghimport-source-id: f40df177af8b87c73f04bf337f478a62133284cf

Differential Revision: D14901297

Pulled By: bwasti

fbshipit-source-id: 1b6371a5175b3d63dad542b7cc22cb82e8c6cfd0
2019-05-06 23:15:16 -07:00
19e6886576 Intra-op parallel microbenchmarks for PT (#19997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997
ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab

Differential Revision: D15231375

Pulled By: ilia-cher

fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd
2019-05-06 20:21:45 -07:00
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
8c97f0b19e Initialize Caffe2 only when running Caffe2 benchmarks (#19980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980
ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88

Differential Revision: D15229797

Pulled By: ilia-cher

fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4
2019-05-06 19:17:23 -07:00
0c7e98b765 Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993
ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635

Differential Revision: D15230363

Pulled By: ilia-cher

fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c
2019-05-06 19:12:09 -07:00
839343a482 Add USE_CUDA macro in THD DataChannelGloo for non-GPU use case (#20186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20186

We're missing two USE_CUDA macro for GPU-related code in THD's DataChannelGloo.
Also adding GlooCache back to compilation.

Differential Revision: D15227502

fbshipit-source-id: f260e1cb294d662ba0c170931913b64287d62344
2019-05-06 19:06:58 -07:00
8ca10d35e5 Add torch.nn.quantized.functional namespace (#20042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20042

Exposing torch.ops.quantized as torch.nn.quantized.functional

Differential Revision: D15178099

fbshipit-source-id: 8d65134bd727296f2750bbd2b54df0b99fc84b33
2019-05-06 18:49:58 -07:00
838ada3a62 Update logic for folding onnx::Constant nodes. (#20109)
Summary:
Currently, constant folding pass during ONNX conversion removes all onnx::Constant nodes that are parents of nodes that are folded. In situations where the parent onnx::Constant node is other subscribers downstream this could be a problem. This change updates the removal logic to remove to only those onnx::Constant nodes that do not have other subscribers downstream
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20109

Reviewed By: zrphercule

Differential Revision: D15220392

Pulled By: houseroad

fbshipit-source-id: 150788654ea1c84262becaffd6de152114bf76c0
2019-05-06 17:51:58 -07:00
877b7c1b8d Fix NameError with PYTORCH_JIT=0 (#20120)
Summary:
Right now using `PYTORCH_JIT=0` gives this error:

`NameError: name '_CachedForward' is not defined`](https://our.intern.facebook.com/intern/diff/15210046/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20120

Pulled By: driazati

Differential Revision: D15210046

fbshipit-source-id: 493716d38e0078bfe96fab3dc624ec029988cf1c
2019-05-06 17:41:09 -07:00
1cfe15ef2a temporarily disable devtoolset7 nightlies (#20174)
Summary:
toward issue #20066
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20174

Differential Revision: D15229018

Pulled By: kostmo

fbshipit-source-id: 350c16a27c6530fe7d1d36a2dc11cb5008cf30e5
2019-05-06 17:15:10 -07:00
4211f674f0 Cleanup includes in c10/core/CPUAllocator.cpp. (#19885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19885
ghimport-source-id: 1f7d228ac8d1dd9aeb70446a6baf63f62f195663

Differential Revision: D15118516

Pulled By: ZolotukhinM

fbshipit-source-id: bdaf56d97db9e70cbd36ca03349f6eabfbac2668
2019-05-06 16:06:19 -07:00
8fbde94664 lower batchmm to non-diff optimization (#19987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19987
ghimport-source-id: ca4c38312bd56d8a71f1925297deee7f64f573d3

Differential Revision: D15190356

Pulled By: wanchaol

fbshipit-source-id: 761edb08c670fcbc24a06a5b11ceddf311f75884
2019-05-06 15:58:33 -07:00
0c5dc965a4 Add logging import and failing MLP (#20115)
Summary:
Add logging import and a failed MLP model that confirms that we don't fail `add_graph` when graph optimization fails.

This addresses part of https://github.com/pytorch/pytorch/issues/18903

cc lanpa ezyang natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20115

Reviewed By: natalialunova

Differential Revision: D15206765

Pulled By: orionr

fbshipit-source-id: c40b7e2671ef845a1529a2910ba030159f53f393
2019-05-06 15:44:59 -07:00
035966d538 Add options to Operator to enable registration of alias analysis passes (#19382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19382
ghimport-source-id: aeaad3b84ea20dd95b38635ca28c5ff657187909

Differential Revision: D14990873

Pulled By: bwasti

fbshipit-source-id: e1292ac8358ca8ff5bad8d8aeaddf06c23e66067
2019-05-06 15:40:13 -07:00
5c9ab6f411 Specialize Optional[T] to T (or subtype for Tensor) or None when executing graph (#18407)
Summary:
This patch specializes `Optional[Tensor]` graph inputs to either a `DimensionedTensorType` (if a Tensor is passed) or `NoneType`. Other `Optional[T]` are specialized to `T` or `None`.

- For unwrapping (checked and unchecked) we need to keep the output type, as IR code that follows unwrapping may not work with NoneType (just as it doesn't deal with Optional). While it would not be hit during execution, it will run against the (legitimate) assumptions of the analysis passes.
- Function lookup currently will not match NoneType when it expects optional (I'm not entirely sure why this doesn't lead to unhappyness currently, but hey), I amend this at the level of the function matching code (`operator.cpp`), but see Adam's comments. We would run into trouble if we needed to select between functions whose signature only differs in Optional types with different subtypes, but we would have the same problem when calling them directly, so I would think this is OK.

- It would enable throwing away branches we can't hit. This also reduces the "blockyness" of the graph, so it may be easier to apply optimizations (e.g. fuse things in `if t is None: ...` and outside the `if`.
- Arguments passed into `Optional[Tensor]` arguments will get shape information, which is very handy.
- It get's rid of the problem that tensors passed into Optional arguments get requires_grad set erroneously #18270 (though that also affects lists, which aren't fixed here).
- `Optional[List[int]]` is needed for #18697.

- We're changing typing in a more subtle way than the `TensorType`->`DimensionedTensorType`.
- In particular, specializing to NoneType loses the Type information captured in the `OptionalType` element type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18407

Reviewed By: zdevito

Differential Revision: D15216808

Pulled By: eellison

fbshipit-source-id: 01f1a7643deaf4962c3f55eff2070d54b0e54b69
2019-05-06 15:35:03 -07:00
47f5be164a allow classes to be used in their own methods (#20106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20106
ghimport-source-id: e85bebfae59ad7720cfa64c5f88d5d5fa742c221

Reviewed By: shannonzhu

Differential Revision: D15202261

Pulled By: suo

fbshipit-source-id: ae01b868c0939cecf650bd2b5ad8bb94312eaef7
2019-05-06 15:25:52 -07:00
e37d9c8168 fix compilation order for class methods (#20094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20094
ghimport-source-id: cea2887831fcda83d50adfd2ee8414dedbebe897

Reviewed By: shannonzhu

Differential Revision: D15197079

Pulled By: suo

fbshipit-source-id: b36d5f8c666372e82f6e47a3be70f6dc835cc6ab
2019-05-06 15:25:48 -07:00
0aa7407dd0 Rearrange stopping condition in CompositeReader (#20062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20062

Previously, the batch counter is incremented even if none of the readers has data. In this diff,
1) Limiter is applied to the last reader so that the batch counter is not incremented unless the first N-1 readers have data
2) The stop blob of the last reader as the stop blob of the task so that it's checked before the counter is incremented

Reviewed By: xianjiec

Differential Revision: D15099761

fbshipit-source-id: 47ed6c728118fe453cf57ac3457085867939485b
2019-05-06 15:06:32 -07:00
b3c35e5202 Export randn_like in ONNX exporter (#20093)
Summary:
As a work around for dynamic shape case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20093

Reviewed By: zrphercule

Differential Revision: D15220661

Pulled By: houseroad

fbshipit-source-id: de271fce542be380bd49a3c74032c61f9aed3b67
2019-05-06 14:54:46 -07:00
26f5275644 Index into a tuple with non constant integer (#20081)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/16962

This needs fixing because we turn lists into tuples when constantify a module, so indexing into a Tuple of one type with a non-constant integer is quite common.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20081

Differential Revision: D15205893

Pulled By: eellison

fbshipit-source-id: 61d74ee071ad0aad98e46fe807d6f6cc5f6abd2f
2019-05-06 14:23:16 -07:00
722eb48ff2 Cleanup includes in torch/csrc/* (#19924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19924
ghimport-source-id: f7248b16c8e263a7d0ba7975b1fc0b00cb2cf2c0

Differential Revision: D15125018

Pulled By: ZolotukhinM

fbshipit-source-id: 322c7ca53e38ef8b43b5ac5bd747b28bc10379f1
2019-05-06 14:03:18 -07:00
6ca38d9840 Cleanup includes in torch/csrc/autograd/* (#19923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19923
ghimport-source-id: 54debdd21ca0f4230b1915905673de274807a2e5

Differential Revision: D15125016

Pulled By: ZolotukhinM

fbshipit-source-id: 8d54f436e4508067089a1d05ce192093220aa1bb
2019-05-06 13:48:42 -07:00
8b46938355 Cleanup includes in torch/csrc/jit/* (#19922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19922
ghimport-source-id: 0434c46bf75621ff79ea27a18a2475e7f13e2487

Differential Revision: D15125015

Pulled By: ZolotukhinM

fbshipit-source-id: 5685edfc94067f62e363a85e9badb7f757b1d321
2019-05-06 13:40:26 -07:00
edb376eceb Cleanup includes in torch/csrc/jit/script/* (#19921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19921
ghimport-source-id: 12a4553a4a081e8a41f4ed432b4ce3dc14e4699f

Differential Revision: D15125017

Pulled By: ZolotukhinM

fbshipit-source-id: f7285bd1e0745dadb9cd353a5fa8a09728012a59
2019-05-06 13:24:22 -07:00
17268a9225 Add print function for QTensor (#19513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19513

Add support for printing a QTensor in python frontend

Differential Revision: D15017168

fbshipit-source-id: 312d1f18e6ca3c9eb4a5b8bb1c64f7cc8bc1dcf5
2019-05-06 13:12:43 -07:00
0de4b9e97e Improve nn.ActivationCls repr of inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20127

Differential Revision: D15212928

Pulled By: soumith

fbshipit-source-id: f2e3ccb51315a11043d685bc6bf415ea039eeaa3
2019-05-06 13:04:23 -07:00
a8387b7779 Delete TensorImpl::GetDevice() (#20025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025

Delete TensorImpl::GetDevice() and clean all its call sites.

Reviewed By: ezyang

Differential Revision: D15170917

fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b
2019-05-06 12:44:23 -07:00
9005a2c0fc disable flaky test_proper_exit again, still occasionally failing (#20063)
Summary:
test was disabled for being flaky, re-enabled in https://github.com/pytorch/pytorch/pull/19421 but still occasionally failing:

https://circleci.com/gh/pytorch/pytorch/1520165?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

```
Apr 29 19:51:58 ======================================================================
Apr 29 19:51:58 FAIL: test_proper_exit (__main__.TestDataLoader)
Apr 29 19:51:58 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore
Apr 29 19:51:58 ----------------------------------------------------------------------
Apr 29 19:51:58 Traceback (most recent call last):
Apr 29 19:51:58   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
Apr 29 19:51:58     fn(*args, **kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 847, in test_proper_exit
Apr 29 19:51:58     self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception))
Apr 29 19:51:58 AssertionError: test_proper_exit with use_workers=True, pin_memory=False, hold_iter_reference=False, exit_method=worker_kill: loader process did not terminate, and had exception Traceback (most recent call last):
Apr 29 19:51:58   File "test_dataloader.py", line 227, in run
Apr 29 19:51:58     super(ErrorTrackingProcess, self).run()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
Apr 29 19:51:58     self._target(*self._args, **self._kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 424, in _test_proper_exit
Apr 29 19:51:58     for i, _ in enumerate(it):
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 545, in __next__
Apr 29 19:51:58     idx, batch = self._get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 522, in _get_batch
Apr 29 19:51:58     success, data = self._try_get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 480, in _try_get_batch
Apr 29 19:51:58     data = self.data_queue.get(timeout=timeout)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Apr 29 19:51:58     res = self._recv()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Apr 29 19:51:58     return pickle.loads(buf)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Apr 29 19:51:58     return Unpickler(file).load()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Apr 29 19:51:58     dispatch[key](self)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Apr 29 19:51:58     value = func(*args)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Apr 29 19:51:58     fd = multiprocessing.reduction.rebuild_handle(df)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
Apr 29 19:51:58     conn = Client(address, authkey=current_process().authkey)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client
Apr 29 19:51:58     c = SocketClient(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
Apr 29 19:51:58     s.connect(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/socket.py", line 224, in meth
Apr 29 19:51:58     return getattr(self._sock,name)(*args)
Apr 29 19:51:58 error: [Errno 111] Connection refused

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20063

Differential Revision: D15218223

Pulled By: nairbv

fbshipit-source-id: 32018c4220f7cb9372ef138631fc3a79759265e1
2019-05-06 08:34:27 -07:00
8818005a91 Fix warnings coming from bernoulli and dirichlet kernels (#19933)
Summary:
Fixes the following warnings in CI:
```
/tmp/pip-req-build-un327jey/aten/src/ATen/native/cuda/Distributions.cu(123): warning: pointless comparison of unsigned integer with zero
              detected during instantiation of "void <unnamed>::bernoulli_tensor_cuda_kernel<scalar_t,prob_t>(at::Tensor &, const at::Tensor &, std::pair<uint64_t, uint64_t>) [with scalar_t=uint8_t, prob_t=uint8_t]"
    (238): here
```
```
/tmp/pip-req-build-un327jey/aten/src/ATen/native/cuda/Distributions.cu:220:116: warning: 'c10::ScalarType detail::scalar_type(const at::Type&)' is deprecated [-Wdeprecated-declarations]
    /tmp/pip-req-build-un327jey/aten/src/ATen/Dispatch.h:46:1: note: declared here
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19933

Differential Revision: D15218177

Pulled By: ezyang

fbshipit-source-id: 3994f7130da1ab04d2e8c3c4fe97f722b7502b41
2019-05-06 08:29:23 -07:00
3a318074e5 Fix examples in jit#user-defined-types documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20003

Differential Revision: D15218172

Pulled By: ezyang

fbshipit-source-id: 07dfebafa337252c2733c018ecaabb0d9902e07d
2019-05-06 08:20:00 -07:00
f29858ff14 Resolve host_define.h warnings (#19917)
Summary:
Eigen was updated with the commit needed to get rid of this warning that plagued the CI. This PR bumps third_party/eigen to that commit head.
```
warning: #warning "host_defines.h is an internal header file and must not be used directly.  This file will be removed in a future CUDA release.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19917

Differential Revision: D15218183

Pulled By: ezyang

fbshipit-source-id: 653c7d61ea401a7d4469c2009612dc43cc70122d
2019-05-06 08:19:57 -07:00
cc06e2f947 fix build with python-2.7.5 (#20137)
Summary:
pytorch failed to build with the following error, complaining about the first regex match
It may be caused by a bug in python 2.7.5
This change proposed is a workaround for building pytorch with python 2.7.5
Since the '*' star notation is greedy in python regex, the new expression shall produce the identical result with the old one.

```
Traceback (most recent call last):
  File "/data2/nihuini/pytorch/cmake/../aten/src/ATen/gen.py", line 14, in <module>
    import preprocess_declarations
  File "/data2/nihuini/pytorch/aten/src/ATen/preprocess_declarations.py", line 3, in <module>
    from function_wrapper import TYPE_FORMAL_GENERIC
  File "/data2/nihuini/pytorch/aten/src/ATen/function_wrapper.py", line 5, in <module>
    from code_template import CodeTemplate
  File "/data2/nihuini/pytorch/aten/src/ATen/code_template.py", line 13, in <module>
    class CodeTemplate(object):
  File "/data2/nihuini/pytorch/aten/src/ATen/code_template.py", line 23, in CodeTemplate
    subtitution = re.compile(substitution_str, re.MULTILINE)
  File "/usr/lib64/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib64/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat
--
CMake Error at cmake/Codegen.cmake:162 (message):
  Failed to get generated_cpp list
Call Stack (most recent call first):
  caffe2/CMakeLists.txt:2 (include)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20137

Differential Revision: D15218122

Pulled By: ezyang

fbshipit-source-id: 10b618ff92a04e9074f5d83e31411fc2341e0cf8
2019-05-06 08:08:41 -07:00
61f1242b7f Formula typo fix (#20110)
Summary:
T_{cur + 1} -> T_{cur} + 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20110

Differential Revision: D15218135

Pulled By: ezyang

fbshipit-source-id: fb914d977cac447867921510bf57b59e62e4f68c
2019-05-06 08:08:37 -07:00
343c1c21f2 update nn.init.calculate_gain doc example
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20131

Differential Revision: D15218126

Pulled By: ezyang

fbshipit-source-id: 164c9d1573cd4d2c3689fb83b952e71862d4f1f2
2019-05-06 08:08:34 -07:00
1dfeffbff5 Expose test utils (#20114)
Summary:
Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114

Differential Revision: D15217490

Pulled By: ezyang

fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d
2019-05-06 07:06:04 -07:00
f2c715cbe1 Fix the spelling of "context"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055

Differential Revision: D15217488

Pulled By: ezyang

fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7
2019-05-06 06:54:30 -07:00
f6609daad7 Fix the warning if the wrong gcc is used with nvcc (#20158)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20158

Differential Revision: D15217665

Pulled By: ezyang

fbshipit-source-id: 951bb640cc8bea705cb2f39ca2024e3c5084cd3b
2019-05-06 06:38:38 -07:00
57948414ac Fix small typo T_mul->T_mult
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20148

Differential Revision: D15217485

Pulled By: ezyang

fbshipit-source-id: cb183cdc2eb3e42c685ef024742a18745923d283
2019-05-06 06:32:33 -07:00
71d23ebc13 #20143 TripletMarginLoss example isn't clear (#20145)
Summary:
Pull Request for TripletMarginLoss example isn't clear #20143
SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20145

Differential Revision: D15217491

Pulled By: ezyang

fbshipit-source-id: 46e30496e9f0dd830a2eec42c5775209a773cc48
2019-05-06 06:32:30 -07:00
23ba0561c3 Add Gate Policy GateLearningRateOp (#20044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20044

We do not have a gating functor. This diff adds it. I'm leveraging existing learning rate op because there are other policies I'll need to use as a union together.
* Since there are other policy in LearningRateOp which will be used as a union, I chose to add it as a LearningRateOp.
   * constantwarmup cannot do step function of nonzero first and zero later

* There are multiple uses for it,
    * e.g. as a gating blob generator that is useful for turning off.
   * e.g. as a learning rate switcher at certain iteration.
* For generalizability, no regulation or constraint is applied on the range of the values

* see figure below for illustration

{F157366621}

Reviewed By: ccheng16

Differential Revision: D15178229

fbshipit-source-id: 1e66e9a4bc1bfb946a57f8aefc97d8170f6be731
2019-05-05 20:11:04 -07:00
863818e05a Allow overwriting kernels (#19777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19777

When used from iPython/Jupyter/Bento, the cell doing kernel registration might be executed multiple times.
Also, there might be a kernel library that wants to overwrite one of the existing kernels.
Let's allow this.

Reviewed By: dzhulgakov

Differential Revision: D15090318

fbshipit-source-id: 09f842e8fd36646053c5c2f11325de4d31105b0c
2019-05-05 01:06:26 -07:00
470af2357d Refactorings to prepare for overwritable kernels (#19776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19776

The diff stacked on top of this enables overwriting of kernels, but before that, we need to do some refactorings.
This diff:
- hides deregistration logic behind a RAII class so we can keep more information about what exactly to deregister without the API user knowing about it.
- Split KernelRegistration from SchemaRegistration by taking Dispatcher::OperatorDef and moving it to a different file. This is better readable, especially since kernel registration will become more complex in the next diff.
- Move LeftRight synchronization out of DispatchTable to Operator because there will be a mutex added to Operator in the next diff and related synchronization primitives shouldn't live on different abstraction levels.

Reviewed By: dzhulgakov

Differential Revision: D15090322

fbshipit-source-id: 2e51a192075163f0d496956d9e54b9aaf26b2369
2019-05-05 01:06:23 -07:00
fc00bfd12e Update MultiheadAttention documentations (#20071)
Summary:
Add documentations to add_bias_kv, add_zero_attn, and attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20071

Differential Revision: D15213034

Pulled By: zhangguanheng66

fbshipit-source-id: c3db4b9e8527863420ba3ce6abf6098d3b0fb7a7
2019-05-04 13:55:41 -07:00
ecdeef37df Fix math rendering of CTC loss docs and fix error of lint in test_torch.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19662

Differential Revision: D15063326

Pulled By: soumith

fbshipit-source-id: 71e79dee19c6259e27393d6b3d5ca3f8edfd6afa
2019-05-03 22:30:41 -07:00
7ddd5d06ed trace multiple methods (#19905)
Summary:
This PR adds a new trace API `trace_module` that will allow us to trace multiple methods as a part of a single `ScriptModule`

See the example below.

```python
        class Net(nn.Module):
            def __init__(self):
                super(Net, self).__init__()
                self.conv = nn.Conv2d(1, 1, 3)

            def forward(self, x):
                return self.conv(x)

            def weighted_kernel_sum(self, weight):
                return weight * self.conv.weight

        example_weight = torch.rand(1, 1, 3, 3)
        example_forward_input = torch.rand(1, 1, 3, 3)
        n = Net()
        inputs = {'forward' : example_forward_input, 'weighted_kernel_sum' : example_weight}
        module = torch.jit.trace_module(n, inputs)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19905

Differential Revision: D15200007

Pulled By: Krovatkin

fbshipit-source-id: 0354d973fe40cb6e58b395bd866df14e0fc29d5b
2019-05-03 16:21:58 -07:00
7ad04ad28d DOC: Update web documentation of geometric_ to be consistent with Tensor behaviour (#20091)
Summary:
Fix #19940 by updating web doc to reflect Tensor behaviour which will reflect [here](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.geometric_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20091

Differential Revision: D15196734

Pulled By: soumith

fbshipit-source-id: a1b8aff9599f170e76a9cbca5112b5a9488bc36c
2019-05-03 15:39:10 -07:00
271f005eeb Add elementwise_affine for LayerNormGradientOp (#19982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19982

Add elementwise_affine for LayerNormGradientOp

Reviewed By: houseroad

Differential Revision: D15157493

fbshipit-source-id: 7465f2c1d4df4649b4903b93483c4861e9c7afa9
2019-05-03 15:33:46 -07:00
440aac082a The deprecated API allows std::vector arguments (#19784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19784

For backwards compatibility, we allow vector arguments if a kernel is registered with the deprecated API.

Reviewed By: dzhulgakov

Differential Revision: D15091972

fbshipit-source-id: 4db3e3a262e605504b05c42d40046011408501d2
2019-05-03 12:18:09 -07:00
e0bd7cc821 Change the export of _dim_arange in ONNX (#20078)
Summary:
Previously using ATen op, now fully switched to pure ONNX/Caffe2 ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20078

Reviewed By: zrphercule

Differential Revision: D15188774

Pulled By: houseroad

fbshipit-source-id: 8ae3094369497e2f3ebf478cda222b73de2a995e
2019-05-03 11:07:05 -07:00
840680bbf3 Reduce overhead of OnnxifiOp (#20085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20085

Reduce the overhead of OnnxifiOp from ~40us avg (~100us max) to ~25us avg (~40us max).

Reviewed By: bertmaher

Differential Revision: D15191071

fbshipit-source-id: 830d2bd08b19567a133d43c8a7b996032629d326
2019-05-03 10:57:19 -07:00
c7c02724cd CMakeLists changes to enable libtorch for Android (#19762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762
ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0

Differential Revision: D15087653

Pulled By: ljk53

fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9
2019-05-03 09:28:53 -07:00
0e77c0f5de Add ninja to PyTorch README.md file. (#20079)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17572
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20079

Differential Revision: D15195388

Pulled By: zhangguanheng66

fbshipit-source-id: b7448b482f07e96753f727664416a5d0e85602b4
2019-05-03 07:13:51 -07:00
767c82e151 Initialize last_epoch in _LRScheduler.__init__() (#20059)
Summary:
Class attributes preferably be explicitly initiated within
the __init__() call. Otherwise, overriding step() is
prone to bugs.

This patch partially reverts #7889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20059

Differential Revision: D15195747

Pulled By: soumith

fbshipit-source-id: 3d1a51d8c725d6f14e3e91ee94c7bc7a7d6c1713
2019-05-02 22:38:12 -07:00
af87cfd7f9 Remove in-memory scalars and add comments (#20038)
Summary:
This takes care of some outstanding review comments for https://github.com/pytorch/pytorch/pull/16196/

Specifically:
1. Add comment about kind
2. Add comment about GraphPy
3. Remove ONNX version comment
4. Remove scalar_dict from SummaryWriter and all history functions

cc lanpa ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20038

Reviewed By: natalialunova

Differential Revision: D15177257

Pulled By: orionr

fbshipit-source-id: 218aa799d8b7dbb58f422a331236bba4959347de
2019-05-02 22:26:28 -07:00
fb40e58f24 Remove deprecated tensor constructors in torch.distributions (#19979)
Summary:
This removes the deprecated `tensor.new_*` constructors (see #16770) from `torch.distributions` module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19979

Differential Revision: D15195618

Pulled By: soumith

fbshipit-source-id: 46b519bfd32017265e90bd5c53f12cfe4a138021
2019-05-02 20:45:02 -07:00
792bc56ec2 Update README.md (#20088)
Summary:
Sometimes people need to checkout an older version and build PyTorch. In that case, they need to do `git submodule sync` and maybe `git submodule update --init` as mentioned [here](https://github.com/pytorch/pytorch/issues/20074).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20088

Differential Revision: D15195729

Pulled By: soumith

fbshipit-source-id: 73232b801e5524cdba462dd504fb973d95d0498c
2019-05-02 20:40:03 -07:00
fb8792e2b6 Remove torch/jit from xplat build (#19967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967

-

Reviewed By: dreiss, dzhulgakov

Differential Revision: D15150843

fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806
2019-05-02 15:31:06 -07:00
1ecbf0d213 Spilt MethodValue and FunctionValue (#19985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19985
ghimport-source-id: 6a032fd47f2f2e5ccd59fbd70bd12d0b9a815685

Differential Revision: D15160042

Pulled By: zdevito

fbshipit-source-id: a4ed3750ceea44d7d4c7d970b440840eb0ea5936
2019-05-02 14:37:19 -07:00
2ec2287cce Fix smoke tests on binary builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20069

Differential Revision: D15188864

Pulled By: pjh5

fbshipit-source-id: 692c5ecf1a4f5a560cf8fbcfe3634b809f184d72
2019-05-02 14:24:30 -07:00
99c548e223 Make IValue("string") do the right thing (#20027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20027

Before this change, any pointer was converted to bool, i.e.

    IValue("string") == IValue(true)

After this change, it does the right thing and creates a string.

Reviewed By: dzhulgakov

Differential Revision: D15172409

fbshipit-source-id: 8167dd780005f9bceef4fe3c751f752e42ceeb20
2019-05-02 12:36:13 -07:00
38e630a03f Extract feature length information from ClipRangesGatherSigridHash (#19704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19704

Extract feature length information from ClipRangesGatherSigridHash

Reviewed By: ipiszy

Differential Revision: D15075047

fbshipit-source-id: f70f61b7910515df5c47a042ac7afa523dbdb02a
2019-05-02 12:31:01 -07:00
6f7a315a71 Allow onnx export for maxpool with dilations (#18721)
Summary:
Now, MaxPool operator supports the 'dilations' attribute with this commit:
b22041c3f1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18721

Reviewed By: zrphercule

Differential Revision: D15152400

Pulled By: houseroad

fbshipit-source-id: e8f5ab35c5c2c3a540a22f7cf7bb453d892d0400
2019-05-02 11:26:57 -07:00
75868683dc Removing CUDA 8.0 nightlies (#20068)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/20067
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20068

Differential Revision: D15184571

Pulled By: pjh5

fbshipit-source-id: a37846a23ac7b414f9a3741f37a2db5bb61c93a9
2019-05-02 10:53:56 -07:00
002009b5a9 Automatic update of fbcode/onnx to 7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef (#20012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20012

Previous import was f1311e74ec8a91cbf86094cd6f10157cbf00c536

Included changes:
- **[7d7bc83d](https://github.com/onnx/onnx/commit/7d7bc83d)**: fix shape inference (#1984) <Ashwini Khade>
- **[68630bbd](https://github.com/onnx/onnx/commit/68630bbd)**: fixing some of Mod test cases (#1962) <Jeff Saremi>

Reviewed By: zrphercule

Differential Revision: D15160934

fbshipit-source-id: c53aff401f56b2febeb6c4ee302670eb12b9b495
2019-05-01 22:15:30 -07:00
725ef26f34 Add suport for tensor targets in for-in (#19380)
Summary:
Fixes #19314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19380

Differential Revision: D15167858

Pulled By: Krovatkin

fbshipit-source-id: e87261bbf3e6f8df0601df80280eb3dba42798cd
2019-05-01 17:57:56 -07:00
46589a8a32 fix labs warning in THTensorMoreMath.cpp (#19760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19760
ghimport-source-id: d0aabee8c7b881710292d468dca1537c5ea761b5

Differential Revision: D15087655

Pulled By: ljk53

fbshipit-source-id: ac133dfc2301c2a86c41b4b8f1483d7d23824e1e
2019-05-01 16:35:42 -07:00
ff70c364a4 fix labs warning in THVectorDefault.cpp (#19999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19999
ghimport-source-id: 81157c60f0a9e5adbf12af8fc3a6e4352826b859

Differential Revision: D15169025

Pulled By: ljk53

fbshipit-source-id: 8e6f8df6dec6d21d6c7e743e974f4fcfff7cdeb5
2019-05-01 16:35:39 -07:00
18cb098588 Remove warnings on new_* constructors (#20026)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#20026 Remove warnings on new_* constructors**

Revert of #16770, fixes #19995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20026

Pulled By: driazati

Differential Revision: D15171691

fbshipit-source-id: 057c3b4a9fd6086ca240007e5404a286080f04b6
2019-05-01 16:35:36 -07:00
8987ae314e Remove try/catch in constant propagation (#19686)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19686 [jit] Remove try/catch in constant propagation**

The try-catch here gets tripped pretty often when constant prop is run which screws up `catch throw` in gdb.](https://our.intern.facebook.com/intern/diff/15170134/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19686

Pulled By: driazati

Differential Revision: D15170134

fbshipit-source-id: 93688561126f3ab582c8358e8f2787f7fce9aa73
2019-05-01 16:27:08 -07:00
0da0c4be48 Rotate circleci keys
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19981

Differential Revision: D15174219

Pulled By: pjh5

fbshipit-source-id: 205952aa90ed93f193f40d4293f5a8d82fa33ed6
2019-05-01 16:04:41 -07:00
e846ccd7bc Fix bug in dumpNet (#20001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20001

att

Reviewed By: zrphercule

Differential Revision: D15164116

fbshipit-source-id: dab19fb84fa0ab648103317af5509703db918682
2019-05-01 15:27:49 -07:00
c80ae6dc8e Change the quant-dequant node pattern for jit op substitution pass (#19910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19910

This change modifies the quant-dequant node pattern from
qparam->q->dq to qparam->q->int_repr->qparam->dq. The motivation for
this change is to make the qparams required for op substition one level
up at dequant node instead of multiple levels up.

Differential Revision: D15120146

fbshipit-source-id: 74b0fd5cb50a338f562740a9cc727a7791c718c3
2019-05-01 12:18:33 -07:00
3f7e3d5857 Add the ability to observe intermediate tensors in an onnxifi op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19966

Reviewed By: yinghai

Differential Revision: D15096086

fbshipit-source-id: 8e6a26c46898f99d411dd5841f086946884b2457
2019-05-01 11:51:02 -07:00
de582e2f89 Fix test_forked_cw (#19680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19680

This was broken for quite some time because of an operator schema
check that went into effect at some point in time.

Reviewed By: manojkris

Differential Revision: D15055082

fbshipit-source-id: 7f730f9b810bdaffd69bab7ac4d02c5b2e40645b
2019-05-01 11:11:45 -07:00
e04caa3f44 Pass Quantization parameters for quant nodes (#19402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19402

This pass propagate the qparams calculated after calibration to the
quant nodes which will be used later for quantization

Differential Revision: D14995230

fbshipit-source-id: 5709153ea1c039c4ab4470ddb689a303b0bcc6fd
2019-05-01 09:15:59 -07:00
f564226167 Avoid reusing TYPE parameter in AT_DISPATCH macros. (#19968)
Summary:
From the comment: "don't use TYPE again in case it is an expensive or side-effect op"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19968

Differential Revision: D15151567

Pulled By: gchanan

fbshipit-source-id: 4d42c081ac1472b71f1cea5172cb42a7c83a7043
2019-05-01 08:52:06 -07:00
1f9a0c5dd6 Add observer nodes for input data nodes (#19232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19232

Add observer nodes to collect stats for input data nodes excluding params
which are constant at inference and need not be observed. This information
is required to compute quantization params.

Differential Revision: D14885485

fbshipit-source-id: 8762cc2a4e510e1553b3dbd1d1aecd55b4bdb89f
2019-05-01 03:49:14 -07:00
8cd6d2f101 rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942
ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862

Reviewed By: dzhulgakov

Differential Revision: D15144325

Pulled By: ljk53

fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae
2019-05-01 00:20:24 -07:00
5108e807e0 add new macro TORCH_MOBILE for libtorch mobile build (#19761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19761
ghimport-source-id: 359a44594d3d5afb8102435b4eac6ab920c24ef4

Differential Revision: D15087652

Pulled By: ljk53

fbshipit-source-id: f1a79c38c9415bb3786cc4d073370b1cb807e5ce
2019-04-30 21:25:15 -07:00
360640bc9c Extract Python-specific SugaredValues to a separate file from init.cpp. (#19986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19986
ghimport-source-id: 67f5fec4b5b2114f2922505a7743ed27e6d7e6cc

Differential Revision: D15160820

Pulled By: ZolotukhinM

fbshipit-source-id: e39238db8f30a8809891bff8a2fe39977124f6ca
2019-04-30 19:38:23 -07:00
ba84ad0d97 LeftRight works for classes without default constructors (#19775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19775

-

Reviewed By: dzhulgakov

Differential Revision: D15090319

fbshipit-source-id: e80865975970400c3db24bba4af4327105f3b9b2
2019-04-30 16:34:15 -07:00
e97da36cbb Explicitly disable copy&move on LeftRight (#19774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19774

see in-source comment

Reviewed By: dzhulgakov

Differential Revision: D15090320

fbshipit-source-id: ae9ba5b5df7115c2b1c275e384030063dbbf8f1a
2019-04-30 16:34:12 -07:00
a285cbcccf support different class modes for bbox in box_with_nms_limit_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19820

Reviewed By: newstzpz

Differential Revision: D15112955

fbshipit-source-id: a757622a32cff7159c39735607103138dbbafc24
2019-04-30 16:02:44 -07:00
74f527a8fa Adding job to upload binary sizes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19934

Differential Revision: D15155702

Pulled By: pjh5

fbshipit-source-id: cf841624145d14c7f40153c8fe7b442e633c0f92
2019-04-30 15:56:01 -07:00
48981a02e9 Fix typo in embedding_bag_cpu. (#19432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19432

Fixed typo.

Reviewed By: cpuhrsch

Differential Revision: D15003373

fbshipit-source-id: 8d0f9f70181f6e5041c1a09f5b8a7a5707a4ff2c
2019-04-30 15:51:22 -07:00
cb5442b31a Move IValues from stack into kernels (#19783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19783

Previously, the IValues were copied into the kernel arguments, which caused a refcount bump if Tensor was taken by value.
Now, a kernel can take Tensor by value without any refcount bump because it is moved in.

Reviewed By: dzhulgakov

Differential Revision: D15091973

fbshipit-source-id: 4c5ff2e3ee86f5934cc84191697f7dbc9c3ee345
2019-04-30 15:46:16 -07:00
cb8ff2a2b4 Add mkldnn support for adaptive_avg_pool2d (#19818)
Summary:
AdaptiveAvgPool2d is used in torchvision resnet models https://github.com/pytorch/vision/blob/9a481d0/torchvision/models/resnet.py#L145

Fixes #19797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19818

Differential Revision: D15112777

Pulled By: bddppq

fbshipit-source-id: 6c9b29c805d28356cda49c10c2cd3ce9d7a8b3f5
2019-04-30 15:00:34 -07:00
f5b1a41c58 specify data type in the doc (#19959)
Summary:
addresses comments in #19915
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19959

Differential Revision: D15149993

Pulled By: orionr

fbshipit-source-id: 0e438cfa1a311e89d4bed7ae9d7710a9f1b19a78
2019-04-30 14:15:56 -07:00
947fd9c3f5 More doc edits (#19929)
Summary:
* document `torch.jit.Attribute`
* add JIT one-liner to `README.md`
* misc clarity edits](https://our.intern.facebook.com/intern/diff/15152418/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19929

Pulled By: driazati

Differential Revision: D15152418

fbshipit-source-id: dfee03f0a17300aaf453fcf17f418463288f66c2
2019-04-30 13:52:07 -07:00
a9c189ca14 Macos upload fix (#19965)
Summary:
The second commit will be removed before landing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19965

Differential Revision: D15153243

Pulled By: pjh5

fbshipit-source-id: 70eae38d0cb07dc732c0cf044d36ec36d0a4472d
2019-04-30 13:46:24 -07:00
0ad1d5c317 fix THAllocator.cpp (#19759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19759
ghimport-source-id: 7ccd6e69dfd36ca9d59da59d5ec55f08e58b79f7

Reviewed By: soumith

Differential Revision: D15087651

Pulled By: ljk53

fbshipit-source-id: 536db274f7e725a0170b6f2c462146c08139f341
2019-04-30 13:29:47 -07:00
4e154c1585 remove C10_MOBILE from LegacyTypeDispatch.cpp (#19758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19758
ghimport-source-id: 59d0d337260e3baf57da87af3f97836b321a3db2

Reviewed By: pjh5

Differential Revision: D15087654

Pulled By: ljk53

fbshipit-source-id: 8f56808faba35bae68ee99c62c761f94807b6fd7
2019-04-30 13:29:43 -07:00
508ca44fcc add logging in sparse_to_dense_mask when skipping (#19945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19945

this logging was removed in D6888977, but i think it could be useful to debug upstream data issue to check the violating index

Reviewed By: xianjiec

Differential Revision: D15127887

fbshipit-source-id: 4ad7eceefcd063bf45bc190a4c0d458a089c918a
2019-04-30 12:57:24 -07:00
56977db4a7 Provide option to save quantized data for DNNLOWP without layout optimization (#19681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19681

For accelerator, we need to lower just the quantized weights data without layout transformation. This diff attempts to provide this option.

Reviewed By: jerryzh168, zrphercule

Differential Revision: D15066568

fbshipit-source-id: 133d749e087c2ad4a899bee5e96f597f70b2443c
2019-04-30 12:32:42 -07:00
ca57dd9332 Fixed log_normal and geometric for CPU (#19938)
Summary:
log_normal_ and geometric_ were disabled for CPU by mistake in [this PR](bc53805f2e), this PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19938

Differential Revision: D15143404

Pulled By: izdeby

fbshipit-source-id: 41c7bd29f046b5a3ac6d601de8c64ab553771d19
2019-04-30 12:18:10 -07:00
c6cb32f588 Forbid kernels from returning Scalar[] (#19811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19811

This does not immediately take effect for the custom op API but will break backwards compatibility once we switch to the new operator registration.

Reviewed By: dzhulgakov

Differential Revision: D15101924

fbshipit-source-id: 8890a5a3e163d3263dc1837be0b4851984771917
2019-04-30 12:13:09 -07:00
9a81d1e692 Automatic generation of unittest for Glow integration
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19936

Reviewed By: ipiszy

Differential Revision: D15138090

fbshipit-source-id: 29a812548bb5da00176b00c1e9a26a7c31cea9c0
2019-04-30 12:13:06 -07:00
3a0727e58b Fix flake8. (#19832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19832
ghimport-source-id: 7360a52dbcf83458797c27002afc1fd53ee5907f

Differential Revision: D15115620

Pulled By: ZolotukhinM

fbshipit-source-id: aa62b04facc1e1824a8889a32dace5804daa21df
2019-04-30 12:09:10 -07:00
e710f3b1e1 Fix C10_MOBILE macro for ios (#19779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19779

This macro wasn't set correctly because the target macros weren't included from Apple's header.

Reviewed By: dzhulgakov

Differential Revision: D15090427

fbshipit-source-id: 43ca44f0f409e11718b7f60c3fdcd2aa02d7018e
2019-04-30 12:03:24 -07:00
965c3b2761 Automatic update of fbcode/onnx to f1311e74ec8a91cbf86094cd6f10157cbf00c536 (#19949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19949

Previous import was 22662bfd4dcc6baebf29e3b823a051676f991001

Included changes:
- **[f1311e74](https://github.com/onnx/onnx/commit/f1311e74)**: Lint the docs name (#1982) <Lu Fang>
- **[c1c04af4](https://github.com/onnx/onnx/commit/c1c04af4)**:  Fix a shapeinference bug in upsample v9/10 (#1969) <Raymond Yang>
- **[2f7dc10f](https://github.com/onnx/onnx/commit/2f7dc10f)**: Create managingexperimentalops (#1974) <Joseph Spisak>
- **[419582b7](https://github.com/onnx/onnx/commit/419582b7)**: Create archivefileformat doc based on the wiki equivalent (#1973) <Joseph Spisak>
- **[4fcf3842](https://github.com/onnx/onnx/commit/4fcf3842)**: Create NLPinONNXproposal (#1975) <Joseph Spisak>
- **[153c4f9a](https://github.com/onnx/onnx/commit/153c4f9a)**: Create ONNXIFIproposal (#1976) <Joseph Spisak>
- **[c695016a](https://github.com/onnx/onnx/commit/c695016a)**: Create onnxreleases (#1977) <Joseph Spisak>
- **[ccf4b383](https://github.com/onnx/onnx/commit/ccf4b383)**: Create functionsproposal (#1978) <Joseph Spisak>
- **[c4d32371](https://github.com/onnx/onnx/commit/c4d32371)**: Create typeannotations.md (#1979) <Joseph Spisak>

Reviewed By: zrphercule

Differential Revision: D15145652

fbshipit-source-id: eca661fe5a735dce5af992e16b6a4013e896b8b4
2019-04-30 11:57:47 -07:00
aa6403bae6 Added .bool() method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19928

Differential Revision: D15131923

Pulled By: izdeby

fbshipit-source-id: 3909cf4623fe85e98ceaf57fbb57745919899445
2019-04-30 10:34:31 -07:00
d868c97580 Improve performance of Int8SpatialBN (needed for DF4 quantization) (#19702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19702

avx2 implementation of core compute for Int8SpatialBN

Reviewed By: jianyuh

Differential Revision: D15073973

fbshipit-source-id: c30b0c621348ba9331ba5e48b281c00cf6e479a1
2019-04-30 10:26:48 -07:00
95ce796663 enable CopyVector for type of int32_t (#19931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19931

as title

Reviewed By: Wakeupbuddy

Differential Revision: D15134369

fbshipit-source-id: a8afa166b5537bf815be875fa8afcb599897d5a7
2019-04-29 22:24:06 -07:00
4c678dbe87 Make moving FunctionSchema efficient (#19773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19773

const members can't be moved, so whenever somebody moved a function schema, it was copied instead.
This diff fixes this.

Reviewed By: dzhulgakov

Differential Revision: D15090323

fbshipit-source-id: 123a1d6b96ac46cb237966c0b072edebcdafe54c
2019-04-29 21:43:40 -07:00
8e77506799 Add onnx export for floor, ceil, log2 and prim::shape
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17895

Reviewed By: zrphercule

Differential Revision: D15103396

Pulled By: houseroad

fbshipit-source-id: 2ec80f11a19a8659aa496e68aed769a8dd1efb18
2019-04-29 21:23:14 -07:00
55c719b161 Remove operator.h's dependency on function_schema.h (#19817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817

A lot of files were depending on the JIT's typesystem
because operator.h depends on function_schema.h. However,
this isn't fundamental to the design. This diff tries to
remove the direct depenency and only includes the c10
wrapper helpers in files where it is required.

Reviewed By: smessmer

Differential Revision: D15112247

fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6
2019-04-29 19:50:43 -07:00
2a95cf6345 Add a pattern-based fusion pass. (#19596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19596
ghimport-source-id: 1d7af5877dbeffa826201812649a9009c06c6305

Differential Revision: D15042033

Pulled By: ZolotukhinM

fbshipit-source-id: e3178d9aec2ac63fc3779ddedbd967aae0401c76
2019-04-29 19:17:31 -07:00
abbfb7dd23 Fix devtoolset7 binary builds (#19919)
Summary:
cc kostmo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19919

Differential Revision: D15141049

Pulled By: pjh5

fbshipit-source-id: 9080924c5304d7db001d6bd910a8c3051a34eeae
2019-04-29 17:21:03 -07:00
f5c2b5a259 ONNX Export Min and Max ops with dim
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19689

Reviewed By: zrphercule

Differential Revision: D15103119

Pulled By: houseroad

fbshipit-source-id: f44555657f6965c41737e69485da119f37cf9d7c
2019-04-29 16:50:18 -07:00
a5b8a7d44b Don't include TensorMethods.h - it's already included by Tensor.h anyway. (#19831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19831
ghimport-source-id: 03ccc99f348021bd8a91c5006b5e4ebd3565290e

Differential Revision: D15115630

Pulled By: ZolotukhinM

fbshipit-source-id: a322cdca9c87429f092509d855d16551873bf17e
2019-04-29 16:11:25 -07:00
62a1640666 Improve torch.utils.tensorboard docs (#19915)
Summary:
This adds method details and corrects example on the page that didn't run properly. I've now confirmed that it runs in colab with nightly.

For those with internal access the rendered result can be seen at https://home.fburl.com/~orionr/pytorch-docs/tensorboard.html

cc lanpa, soumith, ezyang, brianjo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19915

Differential Revision: D15137430

Pulled By: orionr

fbshipit-source-id: 833368fb90f9d75231b8243b43de594b475b2cb1
2019-04-29 16:06:27 -07:00
ff33c8c24a Avoid printing debug information in the test
Summary: As title says.

Reviewed By: zafartahirov

Differential Revision: D15097015

fbshipit-source-id: b506130d8c8993672815221ba2c861ee1a95c64a
2019-04-29 15:48:38 -07:00
114644449e Break circular dependency between Type.h, Tensor.h and TensorMethods.h (#19830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19830
ghimport-source-id: 9b057c549652a39d6706679ee86494124bb40ef0

Differential Revision: D15115631

Pulled By: ZolotukhinM

fbshipit-source-id: 821c0aedacd9692d6076349ff8318c8f6234891c
2019-04-29 14:46:50 -07:00
ccf35f4be0 remove scalar to float matching (#19918)
Summary:
Trying to get this in before 1.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19918

Reviewed By: driazati

Differential Revision: D15124430

Pulled By: eellison

fbshipit-source-id: 549cdcbaff91218657e94ce08c0f4e69b576d809
2019-04-29 14:41:56 -07:00
4294dba981 Misc pickler improvements (#19638)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19638 [jit] Serialize attribute module as torch.jit._pickle**

* use `torch.jit._pickle` as the module for globals in the pickle program. Pickle will try to resolve these to the actual functions in `torch.jit._pickle.py` automatically (I believe this can also be overridden to point to whatever functions you want). This means that `pickle.load("my_model/attributes.pkl")` will work instead of having to use a custom `pickle.Unpickler`
* use `REDUCE` opcodes instead of `BUILD` to make use of the last bullet
* use a union in the unpickler to support globals better (+ any future metadata we might need that can't be stored in an `IValue`), this makes some of the code around `IntList`s clearer and lets us get rid of any lookbehind for opcodes
* pickle things as a tuple instead of a list (an immutable result is more semantically correct)](https://our.intern.facebook.com/intern/diff/15111203/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19638

Pulled By: driazati

Differential Revision: D15111203

fbshipit-source-id: 526c6c2b63a48eb1cba1c658045a7809730070dd
2019-04-29 13:45:10 -07:00
8933ff651c Remove self-include. (#19833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19833
ghimport-source-id: 396eb04317ad67980a624e71587bd57c4ac353f5

Differential Revision: D15115726

Pulled By: ZolotukhinM

fbshipit-source-id: 20677510a640b32defb63ae8cf3c22ffb1d1fdc4
2019-04-29 12:44:43 -07:00
de19eeee99 Enabled masked for a bool tensor (#19140)
Summary:
Added deprecation warnings for the masked methods and enabled them for a bool tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140

Differential Revision: D14888021

Pulled By: izdeby

fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad
2019-04-29 10:40:12 -07:00
39b885cbbf Add magma for CUDA 10.1 to Windows docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19914

Differential Revision: D15123029

Pulled By: soumith

fbshipit-source-id: a3d4b498a763e1a9829896d44211be00400ec39d
2019-04-29 10:13:21 -07:00
9d0b5a1ce9 Build caffe2/fb/operators (#19688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19688

Minor changes to hipify script to take extra folders.

Reviewed By: bddppq

Differential Revision: D15068427

fbshipit-source-id: e2e792c8227cbd0e15fd2564f87d740a62c477da
2019-04-29 09:01:10 -07:00
841360029a Finer grained consistency check in reducer (#19901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19901

The existing code used `expect_autograd_hooks_` as a proxy for the
situation where finalization of the previous iteration is needed. This
is not correct, however, since you may decide to completely ignore the
output of a DDP wrapped module. If this is the case, and no gradients
have been passed to the reducer, it is fine to keep going. This commit
adds a new variable `require_finalize_` that tracks whether the
finalization is really needed.

Reviewed By: mrshenli

Differential Revision: D15118871

fbshipit-source-id: 25938eaf1fe13e2940feae1312892b9d3da8a67d
2019-04-28 23:12:19 -07:00
5525c419fc Only call into reducer if torch.is_grad_enabled() (#19897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19897

During validation, gradient reduction is not needed, and autograd is
never called. The model output will always be a detached tensor. After
the new reducer was merged, this meant that it would find all model
parameters unused, and kick off reduction for them. When #19799 and
output where no parameters are used and it tries to kick off reduction
of zeroed gradients. Test for `torch.is_grad_enabled()` and
`self.training` before calling into the reducer.

Reviewed By: mrshenli

Differential Revision: D15118726

fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924
2019-04-28 23:12:16 -07:00
a92e1ccd6c Remove trailing whitespace flake8 lint (#19828)
Summary:
Not useful and noisy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19828

Differential Revision: D15118966

Pulled By: suo

fbshipit-source-id: e31c05f4eac206a2a2d9508fcbef8a658d2a9baf
2019-04-28 22:19:01 -07:00
b695e562e5 Make find_unused_parameters in DDP default to False (#19895)
Summary:
As DDP in previous releases does not support unused params, turning off `find_unused_parameters` by default to derisk new reducer.

CC pietern soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19895

Reviewed By: pietern

Differential Revision: D15118563

Pulled By: mrshenli

fbshipit-source-id: 6215c486e1dae3387b36011d8e64a2721ac85f58
2019-04-28 21:22:26 -07:00
3f7a87788f Remove redundant include from jit/fuser/cpu/dynamic_library.h. (#19863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19863
ghimport-source-id: e13569d7d5abb243d4d5703a2a35f437f359efee

Differential Revision: D15118505

Pulled By: ZolotukhinM

fbshipit-source-id: 3dd1297fdabb102c5f0f68c9b1198195a851f8d0
2019-04-28 20:34:03 -07:00
3803d1c901 Fix conda build for Windows (#19824)
Summary:
Let's test it before merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19824

Differential Revision: D15116111

Pulled By: soumith

fbshipit-source-id: 0a73de3f045ee1349061674f5f8e2aaba382493c
2019-04-27 23:10:46 -07:00
9b69da2b55 Allow for iterations where no module parameter is used (#19821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19821

It is possible that not a single parameter is used during an
iteration. If this is the case, the `prepare_for_backward` function
marks all parameters as unused, kicks off reduction of all buckets,
*and* finalizes the reduction.

This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.

Reviewed By: mrshenli

Differential Revision: D15113272

fbshipit-source-id: dc91458b569cd8c106ddaeea558464b515683550
2019-04-27 22:57:59 -07:00
f0a007a26c Use QualifiedName for classes (#19575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19575
ghimport-source-id: eaed1d9d66672aadfe893e62a3a811da8ac7d966

Differential Revision: D15035281

Reviewed By: shannonzhu

Pulled By: suo

fbshipit-source-id: 7bac5e5f9223af77268bc03e08b37450a6840dbe
2019-04-27 16:13:27 -07:00
1d6e868c2f make QualifiedName a value type (#19626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19626
ghimport-source-id: 20f03b245245a7fab1c175cebc41e477b1eadc68

Reviewed By: shannonzhu

Differential Revision: D15049582

Pulled By: suo

fbshipit-source-id: 9e73a1d8ee8aa1849880815fa6fddebca408b8b4
2019-04-27 16:13:24 -07:00
096dd8a4f2 separate QualifiedName into its own file (#19566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19566
ghimport-source-id: c237f2a25d1aa9fc41f19fefe7a08a53a54279db

Differential Revision: D15032205

Reviewed By: shannonzhu

Pulled By: suo

fbshipit-source-id: 7527d97565559ebfb2556600eea5d93c1e141ac8
2019-04-27 16:13:20 -07:00
2226 changed files with 154569 additions and 88219 deletions

View File

@ -1,3 +1,23 @@
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
@ -35,4 +55,422 @@ Future direction
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 2.7m, 2.7mu, 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos and windows: 2.7, 3.5, 3.6, 3.7
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binarybuilds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
* This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Both Linux and MacOS use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tldr; on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for both Linux and MacOS
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unneccessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusinig. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* soumith/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* soumith/manylinux-cuda90
* soumith/manylinux-cuda92
* soumith/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```
# Run the docker
# Use the correct docker image, soumith/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it soumith/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a loong time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygeine can prevent this, but just install a new miniconda to be safe.
### Windows
Maybe @peterjc123 can fill this section in.

View File

@ -42,7 +42,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(
"3.6m",
"3.7m",
],
conda=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.CONDA_PYTHON_VERSIONS,
libtorch=[
"2.7m",
],
@ -52,7 +52,7 @@ CONFIG_TREE_DATA = OrderedDict(
linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),
macos=([None], OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.CONDA_PYTHON_VERSIONS,
libtorch=[
"2.7",
],
@ -60,7 +60,19 @@ CONFIG_TREE_DATA = OrderedDict(
)
DEVTOOLSET_VERSIONS = [3, 7]
# Why is this an option?
# All the nightlies used to be devtoolset3 and built with the old gcc ABI. We
# added a devtoolset7 option so that we could build nightlies with the new gcc
# ABI. That didn't work since devtoolset7 can't build with the new gcc ABI. But
# then we set devtoolset7 to be the default anyways, since devtoolset7
# understands avx512, which is needed for good fbgemm performance.
# This should be removed. The base dockers should just be upgraded to
# devtoolset7 so we don't have to reinstall this in every build job.
# The same machinery that this uses, though, should be retooled for a different
# compiler toolchain that can build with the new gcc ABI.
DEVTOOLSET_VERSIONS = [
7,
]
class TopLevelNode(ConfigNode):
@ -83,7 +95,13 @@ class OSConfigNode(ConfigNode):
self.props["cuda_versions"] = cuda_versions
def get_children(self):
return [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]
packaging_variants = [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]
if self.find_prop("smoke"):
filtered_packaging_variants = list(filter(lambda x: x.get_label() != "libtorch", packaging_variants))
return filtered_packaging_variants
else:
return packaging_variants
class PackageFormatConfigNode(ConfigNode):
@ -94,7 +112,7 @@ class PackageFormatConfigNode(ConfigNode):
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux" and self.find_prop("package_format") != "conda":
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in DEVTOOLSET_VERSIONS]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
@ -107,7 +125,14 @@ class LinuxGccConfigNode(ConfigNode):
self.props["devtoolset_version"] = devtoolset_version
def get_children(self):
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
cuda_versions = self.find_prop("cuda_versions")
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("devtoolset_version") == 7:
cuda_versions = filter(lambda x: x != "90", cuda_versions)
return [ArchConfigNode(self, v) for v in cuda_versions]
class ArchConfigNode(ConfigNode):
@ -132,7 +157,7 @@ class PyVersionConfigNode(ConfigNode):
package_format = self.find_prop("package_format")
os_name = self.find_prop("os_name")
has_libtorch_variants = smoke and package_format == "libtorch" and os_name == "linux"
has_libtorch_variants = package_format == "libtorch" and os_name == "linux"
linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []
return [LinkingVariantConfigNode(self, v) for v in linking_variants]

View File

@ -34,7 +34,8 @@ class Conf(object):
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
alt_docker_suffix = self.cuda_version or "80"
# The cpu nightlies are built on the soumith/manylinux-cuda100 docker image
alt_docker_suffix = self.cuda_version or "100"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
@ -45,10 +46,10 @@ class Conf(object):
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
if self.smoke:
if self.libtorch_variant:
parts.append(self.libtorch_variant)
else:
if self.libtorch_variant:
parts.append(self.libtorch_variant)
if not self.smoke:
parts.append(build_or_test)
return "_".join(parts)
@ -146,13 +147,12 @@ def get_nightly_tests():
configs = gen_build_env_list(False)
filtered_configs = filter(predicate_exclude_nonlinux_and_libtorch, configs)
mylist = []
tests = []
for conf_options in filtered_configs:
d = {conf_options.gen_build_name("test"): {"requires": [conf_options.gen_build_name("build")]}}
mylist.append(d)
return mylist
params = {"requires": ["setup", conf_options.gen_build_name("build")]}
tests.append({conf_options.gen_build_name("test"): params})
return tests
def get_nightly_uploads():
@ -162,7 +162,7 @@ def get_nightly_uploads():
return {
conf.gen_build_name("upload"): OrderedDict([
("context", "org-member"),
("requires", [conf.gen_build_name(phase_dependency)]),
("requires", ["setup", conf.gen_build_name(phase_dependency)]),
]),
}
@ -189,12 +189,12 @@ def gen_schedule_tree(cron_timing):
def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):
jobs_list = []
jobs_list = ["setup"]
configs = gen_build_env_list(smoke)
for build_config in configs:
build_name = build_config.gen_build_name("build")
jobs_list.append(build_name)
jobs_list.append({build_name: {"requires": ["setup"]}})
jobs_dict[toplevel_key] = OrderedDict(
triggers=gen_schedule_tree(cron_schedule),

View File

@ -1,8 +1,7 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
from cimodel.lib.conf_tree import Ver
import cimodel.data.dimensions as dimensions
CONFIG_TREE_DATA = [
@ -11,20 +10,20 @@ CONFIG_TREE_DATA = [
(Ver("gcc", "4.9"), [X("py2")]),
]),
(Ver("ubuntu", "16.04"), [
(Ver("cuda", "8.0"), [X("py2")]),
(Ver("cuda", "9.0"), [
# TODO make explicit that this is a "secret TensorRT build"
# (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)
# TODO Uh oh, were we supposed to make this one important?!
X("py2"),
X("cmake"),
XImportant("cmake"),
]),
(Ver("cuda", "9.1"), [X("py2")]),
(Ver("mkl"), [X("py2")]),
(Ver("gcc", "5"), [X("onnx_py2")]),
(Ver("cuda", "9.1"), [XImportant("py2")]),
(Ver("mkl"), [XImportant("py2")]),
(Ver("gcc", "5"), [XImportant("onnx_py2")]),
(Ver("clang", "3.8"), [X("py2")]),
(Ver("clang", "3.9"), [X("py2")]),
(Ver("clang", "7"), [X("py2")]),
(Ver("android"), [X("py2")]),
(Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),
(Ver("android"), [XImportant("py2")]),
]),
(Ver("centos", "7"), [
(Ver("cuda", "9.0"), [X("py2")]),
@ -33,7 +32,7 @@ CONFIG_TREE_DATA = [
# TODO ios and system aren't related. system qualifies where the python comes
# from (use the system python instead of homebrew or anaconda)
(Ver("ios"), [X("py2")]),
(Ver("system"), [X("py2")]),
(Ver("system"), [XImportant("py2")]),
]),
]
@ -55,6 +54,8 @@ class TreeConfigNode(ConfigNode):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_py3.6":
return False
return str(self.find_prop("compiler_version")) in [
"gcc4.9",
"clang3.8",
@ -96,16 +97,13 @@ class LanguageConfigNode(TreeConfigNode):
self.props["language_version"] = node_name
self.props["build_only"] = self.is_build_only()
def get_children(self):
children = []
for phase in dimensions.PHASES:
if phase == "build" or not self.props["build_only"]:
children.append(PhaseConfigNode(self, phase, []))
return children
def child_constructor(self):
return ImportantConfigNode
class PhaseConfigNode(TreeConfigNode):
class ImportantConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["phase_name"] = node_name
self.props["important"] = True
def get_children(self):
return []

View File

@ -2,30 +2,35 @@
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
from cimodel.lib.conf_tree import Ver
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode
from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = 266
DOCKER_IMAGE_VERSION = 287
class Conf(object):
def __init__(self, language, distro, compiler, phase, build_only):
self.language = language
self.distro = distro
self.compiler = compiler
self.phase = phase
self.build_only = build_only
@dataclass
class Conf:
language: str
distro: Ver
compiler: Ver
build_only: bool
is_important: bool
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_py3.6" \
or self.compiler.name in ["android", "mkl", "clang"] \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
@ -44,9 +49,6 @@ class Conf(object):
root_parts = self.get_build_name_root_parts()
return "_".join(root_parts + [phase]).replace(".", "_")
def get_name(self):
return self.construct_phase_name(self.phase)
def get_platform(self):
platform = self.distro.name
if self.distro.name != "macos":
@ -57,6 +59,7 @@ class Conf(object):
lang_substitutions = {
"onnx_py2": "py2",
"onnx_py3.6": "py3.6",
"cmake": "py2",
}
@ -64,12 +67,13 @@ class Conf(object):
parts = [lang] + self.get_build_name_middle_parts()
return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))
def gen_yaml_tree(self):
def gen_yaml_tree(self, phase):
tuples = []
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3.6": "onnx-py3.6",
}
lang = miniutils.override(self.language, lang_substitutions)
@ -77,7 +81,7 @@ class Conf(object):
parts = [
"caffe2",
lang,
] + self.get_build_name_middle_parts() + [self.phase]
] + self.get_build_name_middle_parts() + [phase]
build_env = "-".join(parts)
if not self.distro.name == "macos":
@ -88,7 +92,7 @@ class Conf(object):
if self.compiler.name == "ios":
tuples.append(("BUILD_IOS", miniutils.quote("1")))
if self.phase == "test":
if phase == "test":
# TODO cuda should not be considered a compiler
if self.compiler.name == "cuda":
tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))
@ -103,11 +107,11 @@ class Conf(object):
d = OrderedDict({"environment": OrderedDict(tuples)})
if self.phase == "test":
if phase == "test":
resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"
d["resource_class"] = resource_class
d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), self.phase, "defaults"])
d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), phase, "defaults"])
return d
@ -125,11 +129,11 @@ def instantiate_configs():
for fc in found_configs:
c = Conf(
fc.find_prop("language_version"),
fc.find_prop("distro_version"),
fc.find_prop("compiler_version"),
fc.find_prop("phase_name"),
fc.find_prop("build_only"),
language=fc.find_prop("language_version"),
distro=fc.find_prop("distro_version"),
compiler=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
is_important=fc.find_prop("important"),
)
config_list.append(c)
@ -138,10 +142,13 @@ def instantiate_configs():
def add_caffe2_builds(jobs_dict):
configs = instantiate_configs()
for conf_options in configs:
jobs_dict[conf_options.get_name()] = conf_options.gen_yaml_tree()
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
for phase in phases:
jobs_dict[conf_options.construct_phase_name(phase)] = conf_options.gen_yaml_tree(phase)
graph = visualization.generate_graph(get_root())
graph.draw("caffe2-config-dimensions.png", prog="twopi")
@ -157,10 +164,24 @@ def get_caffe2_workflows():
x = []
for conf_options in filtered_configs:
item = conf_options.get_name()
if conf_options.phase == "test":
item = {conf_options.get_name(): {"requires": [conf_options.construct_phase_name("build")]}}
x.append(item)
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
for phase in phases:
requires = ["setup"]
sub_d = {"requires": requires}
if phase == "test":
requires.append(conf_options.construct_phase_name("build"))
if not conf_options.is_important:
# If you update this, update
# pytorch_build_definitions.py too
sub_d["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
x.append({conf_options.construct_phase_name(phase): sub_d})
return x

View File

@ -5,8 +5,7 @@ PHASES = ["build", "test"]
CUDA_VERSIONS = [
None, # cpu build
"80",
"90",
"92",
"100",
]
@ -16,3 +15,9 @@ STANDARD_PYTHON_VERSIONS = [
"3.6",
"3.7",
]
CONDA_PYTHON_VERSIONS = [
"2.7",
"3.6",
"3.7",
]

View File

@ -1,28 +1,38 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
CONFIG_TREE_DATA = [
("trusty", [
(None, [
X("2.7.9"),
XImportant("2.7.9"),
X("2.7"),
X("3.5"),
X("nightly"),
]),
("gcc", [
("4.8", [X("3.6")]),
("5.4", [("3.6", [X(False), X(True)])]),
("5.4", [
XImportant("3.6"),
("3.6", [
("xla", [XImportant(True)]),
("namedtensor", [XImportant(True)]),
]),
]),
("7", [X("3.6")]),
]),
]),
("xenial", [
("clang", [
("5", [X("3.6")]),
("5", [
XImportant("3.6"), # This is actually the ASAN build
("3.6", [
("namedtensor", [XImportant(True)]), # ASAN
]),
]),
]),
("cuda", [
("8", [X("3.6")]),
("9", [
# Note there are magic strings here
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L21
@ -32,13 +42,16 @@ CONFIG_TREE_DATA = [
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)
X("2.7"),
X("3.6"),
XImportant("3.6"),
("2.7", [
("namedtensor", [XImportant(True)]),
]),
]),
("9.2", [X("3.6")]),
("10", [X("3.6")]),
]),
("android", [
("r19c", [X("3.6")]),
("r19c", [XImportant("3.6")]),
]),
]),
]
@ -117,7 +130,22 @@ class PyVerConfigNode(TreeConfigNode):
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return XlaConfigNode
return ExperimentalFeatureConfigNode
class ExperimentalFeatureConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["experimental_feature"] = node_name
def child_constructor(self):
experimental_feature = self.find_prop("experimental_feature")
next_nodes = {
"xla": XlaConfigNode,
"namedtensor": NamedTensorConfigNode,
"important": ImportantConfigNode,
}
return next_nodes[experimental_feature]
class XlaConfigNode(TreeConfigNode):
@ -127,6 +155,31 @@ class XlaConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_xla"] = node_name
def child_constructor(self):
return ImportantConfigNode
class NamedTensorConfigNode(TreeConfigNode):
def modify_label(self, label):
return "NAMEDTENSOR=" + str(label)
def init2(self, node_name):
self.props["is_namedtensor"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ImportantConfigNode(TreeConfigNode):
def modify_label(self, label):
return "IMPORTANT=" + str(label)
def init2(self, node_name):
self.props["is_important"] = node_name
def get_children(self):
return []
class XenialCompilerConfigNode(TreeConfigNode):

View File

@ -8,45 +8,45 @@ import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from dataclasses import dataclass, field
from typing import List, Optional
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"
DOCKER_IMAGE_VERSION = 300
DOCKER_IMAGE_VERSION = 323
class Conf(object):
def __init__(self,
distro,
parms,
pyver=None,
cuda_version=None,
is_xla=False,
restrict_phases=None,
gpu_resource=None,
dependent_tests=None,
parent_build=None):
self.distro = distro
self.pyver = pyver
self.parms = parms
self.cuda_version = cuda_version
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
self.is_xla = is_xla
self.restrict_phases = restrict_phases
self.gpu_resource = gpu_resource
self.dependent_tests = dependent_tests or []
self.parent_build = parent_build
@dataclass
class Conf:
distro: str
parms: List[str]
pyver: Optional[str] = None
cuda_version: Optional[str] = None
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
is_xla: bool = False
restrict_phases: Optional[List[str]] = None
gpu_resource: Optional[str] = None
dependent_tests: List = field(default_factory=list)
parent_build: Optional['Conf'] = None
is_namedtensor: bool = False
is_important: bool = False
# TODO: Eliminate the special casing for docker paths
# In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch
def get_parms(self, for_docker):
leading = ["pytorch"]
leading = []
# We just don't run non-important jobs on pull requests;
# previously we also named them in a way to make it obvious
# if self.is_important and not for_docker:
# leading.append("AAA")
leading.append("pytorch")
if self.is_xla and not for_docker:
leading.append("xla")
if self.is_namedtensor and not for_docker:
leading.append("namedtensor")
cuda_parms = []
if self.cuda_version:
@ -105,8 +105,10 @@ class Conf(object):
def gen_workflow_yaml_item(self, phase):
# All jobs require the setup job
parameters = OrderedDict({"requires": ["setup"]})
if phase == "test":
val = OrderedDict()
# TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a
# caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated
@ -114,11 +116,14 @@ class Conf(object):
# pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)
dependency_build = self.parent_build or self
val["requires"] = [dependency_build.gen_build_name("build")]
parameters["requires"].append(dependency_build.gen_build_name("build"))
return {self.gen_build_name(phase): val}
else:
return self.gen_build_name(phase)
if not self.is_important:
# If you update this, update
# caffe2_build_definitions.py too
parameters["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
return {self.gen_build_name(phase): parameters}
# TODO This is a hack to special case some configs just for the workflow list
@ -157,11 +162,12 @@ def gen_dependent_configs(xenial_parent_config):
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=xenial_parent_config.is_important,
)
configs.append(c)
for x in ["pytorch_short_perf_test_gpu", "pytorch_doc_push"]:
for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
return configs
@ -212,6 +218,7 @@ def instantiate_configs():
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
parms_list.append(gcc_version)
# TODO: This is a nasty special case
if compiler_name == "clang":
parms_list.append("asan")
@ -220,6 +227,8 @@ def instantiate_configs():
parms_list.append("gcc7")
is_xla = fc.find_prop("is_xla") or False
is_namedtensor = fc.find_prop("is_namedtensor") or False
is_important = fc.find_prop("is_important") or False
gpu_resource = None
if cuda_version and cuda_version != "10":
@ -233,9 +242,11 @@ def instantiate_configs():
is_xla,
restrict_phases,
gpu_resource,
is_namedtensor=is_namedtensor,
is_important=is_important,
)
if cuda_version == "8":
if cuda_version == "9" and python_version == "3.6":
c.dependent_tests = gen_dependent_configs(c)
config_list.append(c)
@ -279,7 +290,7 @@ def get_workflow_list():
config_list = instantiate_configs()
x = []
x = ["setup"]
for conf_options in config_list:
phases = conf_options.restrict_phases or dimensions.PHASES

View File

@ -1,6 +1,10 @@
#!/usr/bin/env python3
from dataclasses import dataclass, field
from typing import Optional, Dict
def X(val):
"""
Compact way to write a leaf node
@ -8,23 +12,28 @@ def X(val):
return val, []
class Ver(object):
def XImportant(name):
"""Compact way to write an important (run on PRs) leaf node"""
return (name, [("important", [X(True)])])
@dataclass
class Ver:
"""
Represents a product with a version number
"""
def __init__(self, name, version=""):
self.name = name
self.version = version
name: str
version: str = ""
def __str__(self):
return self.name + self.version
class ConfigNode(object):
def __init__(self, parent, node_name):
self.parent = parent
self.node_name = node_name
self.props = {}
@dataclass
class ConfigNode:
parent: Optional['ConfigNode']
node_name: str
props: Dict[str, str] = field(default_factory=dict)
def get_label(self):
return self.node_name

File diff suppressed because it is too large Load Diff

View File

@ -82,6 +82,7 @@ YAML_SOURCES = [
File("nightly-build-smoke-tests-defaults.yml"),
Header("Job specifications job specs"),
Treegen(pytorch_build_definitions.add_build_env_defs, 0),
File("job-specs-setup.yml"),
File("job-specs-custom.yml"),
Treegen(caffe2_build_definitions.add_caffe2_builds, 1),
File("binary_update_htmls.yml"),

View File

@ -0,0 +1,4 @@
All the scripts in this directory are callable from `~/workspace/.circleci/scripts/foo.sh`.
Don't try to call them as `.circleci/scripts/foo.sh`, that won't
(necessarily) work. See Note [Workspace for CircleCI scripts] in
job-specs-setup.yml for more details.

View File

@ -1,6 +1,5 @@
#!/bin/bash
set -ex
set -eux -o pipefail
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
@ -20,20 +19,19 @@ export BUILDER_ROOT="$workdir/builder"
# Clone the Pytorch branch
git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
pushd "$PYTORCH_ROOT"
if [[ -n "$CIRCLE_PR_NUMBER" ]]; then
if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
# "smoke" binary build on PRs
git fetch --force origin "pull/${CIRCLE_PR_NUMBER}/head:remotes/origin/pull/${CIRCLE_PR_NUMBER}"
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B "$CIRCLE_BRANCH"
git reset --hard "$CIRCLE_SHA1"
elif [[ -n "$CIRCLE_SHA1" ]]; then
# "smoke" binary build on master on PR merges
elif [[ -n "${CIRCLE_SHA1:-}" ]]; then
# Scheduled workflows & "smoke" binary build on master on PR merges
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B master
else
# nightly binary builds. These run at 05:05 UTC every day.
last_commit="$(git rev-list --before "$(date -u +%Y-%m-%d) 05:00" --max-count 1 HEAD)"
git checkout "$last_commit"
echo "Can't tell what to checkout"
exit 1
fi
git submodule update --init --recursive --quiet
echo "Using Pytorch from "

View File

@ -1,15 +1,32 @@
#!/bin/bash
set -ex
set -eux -o pipefail
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
source "/Users/distiller/project/env"
envfile="/Users/distiller/project/env"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
source "/home/circleci/project/env"
envfile="/home/circleci/project/env"
else
# docker executor (binary builds)
source "/env"
envfile="/env"
fi
# TODO this is super hacky and ugly. Basically, the binary_update_html job does
# not have an env file, since it does not call binary_populate_env.sh, since it
# does not have a BUILD_ENVIRONMENT. So for this one case, which we detect by a
# lack of an env file, we manually export the environment variables that we
# need to install miniconda
if [[ ! -f "$envfile" ]]; then
MINICONDA_ROOT="/home/circleci/project/miniconda"
workdir="/home/circleci/project"
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
export -f retry
else
source "$envfile"
fi
conda_sh="$workdir/install_miniconda.sh"
@ -21,8 +38,7 @@ fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="$MINICONDA_ROOT/bin:$PATH"
source "$MINICONDA_ROOT/bin/activate"
# We can't actually add miniconda to the PATH in the envfile, because that
# breaks 'unbuffer' in Mac jobs. This is probably because conda comes with
# a tclsh, which then gets inserted before the tclsh needed in /usr/bin

View File

@ -1,7 +1,7 @@
#!/bin/bash
echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"
set -ex
set -eux -o pipefail
source /env
# Defaults here so they can be changed in one place
@ -19,7 +19,7 @@ fi
# We want to call unbuffer, which calls tclsh which finds the expect
# package. The expect was installed by yum into /usr/bin so we want to
# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in
# the conda docker images.
# the conda docker images, so we prepend it to the path here.
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
mkdir /just_tclsh_bin
ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh

View File

@ -3,7 +3,7 @@
source /home/circleci/project/env
cat >/home/circleci/project/ci_test_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
set -eux -o pipefail
# Set up Python
if [[ "$PACKAGE_TYPE" == conda ]]; then
@ -18,18 +18,33 @@ fi
# Install the package
# These network calls should not have 'retry's because they are installing
# locally
# locally and aren't actually network calls
# TODO there is duplicated and inconsistent test-python-env setup across this
# file, builder/smoke_test.sh, and builder/run_tests.sh, and also in the
# conda build scripts themselves. These should really be consolidated
pkg="/final_pkgs/\$(ls /final_pkgs)"
if [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "\$pkg" --offline
retry conda install -yq future numpy protobuf six
if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
# DESIRED_CUDA is in format cu90 or cu100
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
fi
retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"
fi
else
pip install "\$pkg"
retry pip install -q future numpy protobuf six
fi
# Test the package
pushd /pytorch
/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"
/builder/check_binary.sh
# =================== The above code will be executed inside Docker container ===================
EOL
echo "Prepared script to run in next step"
echo
echo
echo "The script that will run in the next step is:"
cat /home/circleci/project/ci_test_script.sh

View File

@ -1,22 +1,24 @@
#!/bin/bash
# Do NOT set -e
# Do NOT set -x
source /home/circleci/project/env
set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/home/circleci/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_SOUMITH_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_SOUMITH_CONDA_PASSWORD"
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /home/circleci/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -e ON BEFORE THIS LINE
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -ex
set -eux -o pipefail
export PATH="$MINICONDA_ROOT/bin:$PATH"
# Upload the package to the final location
@ -24,7 +26,7 @@ pushd /home/circleci/project/final_pkgs
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry timeout 30 /home/circleci/project/login_to_anaconda.sh
anaconda upload "$(ls)" -u pytorch-testing --label main --no-progress --force
anaconda upload "$(ls)" -u pytorch --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -1,5 +1,5 @@
#!/bin/bash
set -ex
set -eux -o pipefail
source "/Users/distiller/project/env"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"

View File

@ -1,5 +1,5 @@
#!/bin/bash
set -ex
set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"

View File

@ -1,21 +1,23 @@
#!/bin/bash
# Do NOT set -e
# Do NOT set -x
set -eu -o pipefail
set +x
export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/Users/distiller/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_SOUMITH_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_SOUMITH_CONDA_PASSWORD"
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /Users/distiller/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -e ON BEFORE THIS LINE
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -ex
set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
@ -24,7 +26,7 @@ pushd "$workdir/final_pkgs"
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry /Users/distiller/project/login_to_anaconda.sh
retry anaconda upload "$(ls)" -u pytorch-testing --label main --no-progress --force
retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -1,6 +1,5 @@
#!/bin/bash
set -ex
set -eux -o pipefail
export TZ=UTC
# We need to write an envfile to persist these variables to following
@ -24,7 +23,7 @@ configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
export DESIRED_DEVTOOLSET="${configs[3]}"
export DESIRED_DEVTOOLSET="${configs[3]:-}"
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export BUILD_PYTHONLESS=1
fi
@ -33,26 +32,31 @@ fi
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="soumith/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="soumith/manylinux-cuda80"
export DOCKER_IMAGE="soumith/manylinux-cuda100"
else
export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"
fi
# Upload to parallel folder for gcc abis
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
export PIP_UPLOAD_FOLDER='devtoolset7/'
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
echo "We don't handle conda builds with gcc ABI of 1, since we don't"
echo "want to add a new package name to the conda builds"
exit 1
fi
# All nightlies used to be devtoolset3, then devtoolset7 was added as a build
# option, so the upload was redirected to nightly/devtoolset7 to avoid
# conflicts with other binaries (there shouldn't be any conflicts). Now we are
# making devtoolset7 the default.
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$(uname)" == 'Darwin' ]]; then
export PIP_UPLOAD_FOLDER='nightly/'
else
export PIP_UPLOAD_FOLDER=''
# On linux machines, this shouldn't actually be called anymore. This is just
# here for extra safety.
export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'
fi
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
export PYTORCH_BUILD_VERSION="1.1.0"
if [[ "$(uname)" == 'Darwin' ]]; then
export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE"
else
export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE+$DESIRED_CUDA"
fi
export PYTORCH_BUILD_NUMBER=1
cat >>"$envfile" <<EOL
@ -63,20 +67,20 @@ echo "Running on $(uname -a) at $(date)"
export PACKAGE_TYPE="$PACKAGE_TYPE"
export DESIRED_PYTHON="$DESIRED_PYTHON"
export DESIRED_CUDA="$DESIRED_CUDA"
export LIBTORCH_VARIANT="$LIBTORCH_VARIANT"
export BUILD_PYTHONLESS="$BUILD_PYTHONLESS"
export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"
export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.1.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.2.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
export TORCH_PACKAGE_NAME='torch'
export TORCH_CONDA_BUILD_FOLDER='pytorch-1.1.0'
export TORCH_PACKAGE_NAME='torch-nightly'
export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
export NO_FBGEMM=1
export USE_FBGEMM=1
export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
export DOCKER_IMAGE="$DOCKER_IMAGE"
@ -87,9 +91,9 @@ export BUILDER_ROOT="$workdir/builder"
export MINICONDA_ROOT="$workdir/miniconda"
export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"
export CIRCLE_TAG="$CIRCLE_TAG"
export CIRCLE_TAG="${CIRCLE_TAG:-}"
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="$CIRCLE_PR_NUMBER"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
# =================== The above code will be executed inside Docker container ===================
EOL

View File

@ -9,13 +9,15 @@
source /home/circleci/project/env
echo "Running the following code in Docker"
cat /home/circleci/project/ci_test_script.sh
set -ex
echo
echo
set -eux -o pipefail
# Expect actual code to be written to this file
chmod +x /home/circleci/project/ci_test_script.sh
# Run the docker
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
export id=$(docker run --runtime=nvidia -t -d "${DOCKER_IMAGE}")
else
export id=$(docker run -t -d "${DOCKER_IMAGE}")
@ -38,7 +40,7 @@ if [[ -d "$PYTORCH_ROOT" ]]; then
docker cp "$PYTORCH_ROOT" "$id:/pytorch"
fi
if [[ -d "$BUILDER_ROOT" ]]; then
docker cp "$BUILDER ROOT" "$id:/builder"
docker cp "$BUILDER_ROOT" "$id:/builder"
fi
# Execute the test script that was populated by an earlier section

View File

@ -0,0 +1,127 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$3" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
# ======================== Building PyTorch C++ API Docs ========================
echo "Building PyTorch C++ API docs..."
# Clone the cppdocs repo
rm -rf cppdocs
git clone https://github.com/pytorch/cppdocs
set -ex
sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
# Copy some required files
cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py
cp torch/_utils_internal.py tools/shared
# Generate PyTorch files
time python tools/setup_helpers/generate_code.py \
--declarations-path build/aten/src/ATen/Declarations.yaml \
--nn-path aten/src/
# Build the docs
pushd docs/cpp
pip install breathe==4.11.1 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx==1.8.5
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j
popd
popd
pushd cppdocs
# Purge everything with some exceptions
mkdir /tmp/cppdocs-sync
mv _config.yml README.md /tmp/cppdocs-sync/
rm -rf *
# Copy over all the newly generated HTML
cp -r "${pt_checkout}"/docs/cpp/build/html/* .
# Copy back _config.yml
rm -rf _config.yml
mv /tmp/cppdocs-sync/* .
# Make a new commit
git add . || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Automatic sync on $(date)" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to https://github.com/pytorch/cppdocs"
set +x
/usr/bin/expect <<DONE
spawn git push -u origin master
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -0,0 +1,118 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: The branch to push to. Usually is "site"
branch="$3"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1
fi
# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$4" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
git clone https://github.com/pytorch/pytorch.github.io -b $branch
pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt || true
if [ "$is_master_doc" = true ]; then
make html
else
make html-stable
fi
# Move them into the docs repo
popd
popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "$is_master_doc" = true ]; then
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
fi
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "auto-generating sphinx docs" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:$branch"
set +x
/usr/bin/expect <<DONE
spawn git push origin $branch
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -0,0 +1,87 @@
#!/usr/bin/env bash
set -ex -o pipefail
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
# Remove unnecessary sources
sudo rm -f /etc/apt/sources.list.d/google-chrome.list
sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
# version number below for docker-ce and nvidia-docker2 to get newer
# versions of Docker. We hardcode these numbers because we kept
# getting broken CI when Docker would update their docker version,
# and nvidia-docker2 would be out of date for a day until they
# released a newer version of their package.
#
# How to figure out what the correct versions of these packages are?
# My preferred method is to start a Docker instance of the correct
# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask
# apt what the packages you need are. Note that the CircleCI image
# comes with Docker.
sudo apt-get -y install \
linux-headers-$(uname -r) \
linux-image-generic \
moreutils \
docker-ce=5:18.09.4~3-0~ubuntu-xenial \
nvidia-container-runtime=2.0.0+docker18.09.4-1 \
nvidia-docker2=2.0.3+docker18.09.4-1 \
expect-dev
sudo pkill -SIGHUP dockerd
retry () {
$* || $* || $* || $* || $*
}
retry sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
fi
if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env
echo "declare -x PYTHON_VERSION=${PYTHON_VERSION:-}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env
fi
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# This IAM user allows write access to S3 bucket for sccache & bazels3cache
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
set -x
else
# This IAM user allows write access to S3 bucket for sccache
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
set -x
fi
fi
# This IAM user only allows read-write access to ECR
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}
eval $(aws ecr get-login --region us-east-1 --no-include-email)
set -x

View File

@ -0,0 +1,50 @@
#!/usr/bin/env bash
set -eux -o pipefail
# Set up CircleCI GPG keys for apt, if needed
curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
# Stop background apt updates. Hypothetically, the kill should not
# be necessary, because stop is supposed to send a kill signal to
# the process, but we've added it for good luck. Also
# hypothetically, it's supposed to be unnecessary to wait for
# the process to block. We also have that line for good luck.
# If you like, try deleting them and seeing if it works.
sudo systemctl stop apt-daily.service || true
sudo systemctl kill --kill-who=all apt-daily.service || true
sudo systemctl stop unattended-upgrades.service || true
sudo systemctl kill --kill-who=all unattended-upgrades.service || true
# wait until `apt-get update` has been killed
while systemctl is-active --quiet apt-daily.service
do
sleep 1;
done
while systemctl is-active --quiet unattended-upgrades.service
do
sleep 1;
done
# See if we actually were successful
systemctl list-units --all | cat
# For good luck, try even harder to kill apt-get
sudo pkill apt-get || true
# For even better luck, purge unattended-upgrades
sudo apt-get purge -y unattended-upgrades
cat /etc/apt/sources.list
# For the bestest luck, kill again now
sudo pkill apt || true
sudo pkill dpkg || true
# Try to detect if apt/dpkg is stuck
if ps auxfww | grep '[a]pt'; then
echo "WARNING: There are leftover apt processes; subsequent apt update will likely fail"
fi
if ps auxfww | grep '[d]pkg'; then
echo "WARNING: There are leftover dpkg processes; subsequent apt update will likely fail"
fi

View File

@ -0,0 +1,96 @@
import argparse
import re
import sys
# Modify this variable if you want to change the set of default jobs
# which are run on all pull requests.
#
# WARNING: Actually, this is a lie; we're currently also controlling
# the set of jobs to run via the Workflows filters in CircleCI config.
default_set = [
# PyTorch CPU
# Selected oldest Python 2 version to ensure Python 2 coverage
'pytorch-linux-trusty-py2.7.9',
# PyTorch CUDA
'pytorch-linux-xenial-cuda9-cudnn7-py3',
# PyTorch ASAN
'pytorch-linux-xenial-py3-clang5-asan',
# PyTorch DEBUG
'pytorch-linux-trusty-py3.6-gcc5.4',
# Caffe2 CPU
'caffe2-py2-mkl-ubuntu16.04',
# Caffe2 CUDA
'caffe2-py2-cuda9.1-cudnn7-ubuntu16.04',
# Caffe2 ONNX
'caffe2-onnx-py2-gcc5-ubuntu16.04',
'caffe2-onnx-py3.6-clang7-ubuntu16.04',
# Caffe2 Clang
'caffe2-py2-clang7-ubuntu16.04',
# Caffe2 CMake
'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',
# Binaries
'manywheel 2.7mu cpu devtoolset7',
'libtorch 2.7m cpu devtoolset7',
# Caffe2 Android
'caffe2-py2-android-ubuntu16.04',
# Caffe2 OSX
'caffe2-py2-system-macos10.13',
# PyTorch OSX
'pytorch-macos-10.13-cuda9.2-cudnn7-py3',
# PyTorch Android
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c',
# XLA
'pytorch-xla-linux-trusty-py3.6-gcc5.4',
# Other checks
'pytorch-short-perf-test-gpu',
'pytorch-python-doc-push',
'pytorch-cpp-doc-push',
]
# Takes in commit message to analyze via stdin
#
# This script will query Git and attempt to determine if we should
# run the current CI job under question
#
# NB: Try to avoid hard-coding names here, so there's less place to update when jobs
# are updated/renamed
#
# Semantics in the presence of multiple tags:
# - Let D be the set of default builds
# - Let S be the set of explicitly specified builds
# - Run S \/ D
parser = argparse.ArgumentParser()
parser.add_argument('build_environment')
args = parser.parse_args()
commit_msg = sys.stdin.read()
# Matches anything that looks like [foo ci] or [ci foo] or [foo test]
# or [test foo]
RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')
markers = RE_MARKER.finditer(commit_msg)
for m in markers:
if m.group(1) and m.group(2):
print("Unrecognized marker: {}".format(m.group(0)))
continue
spec = m.group(1) or m.group(2)
if spec in args.build_environment or spec == 'all':
print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))
sys.exit(0)
for spec in default_set:
if spec in args.build_environment:
print("Accepting {} as part of default set".format(args.build_environment))
sys.exit(0)
print("Rejecting {}".format(args.build_environment))
sys.exit(1)

View File

@ -0,0 +1,29 @@
#!/usr/bin/env bash
set -exu -o pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [ -z "${BUILD_ENVIRONMENT:-}" ]; then
echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"
echo "CircleCI scripts are probably misconfigured."
exit 1
fi
if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then
echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"
echo "written out. Are you perhaps running the wrong copy of this script?"
echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"
exit 1
fi
if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then
if [[ $CIRCLE_BRANCH != "ci-all/"* ]]; then
# Don't swallow "script doesn't exist
[ -e "$SCRIPT_DIR/should_run_job.py" ]
if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then
circleci step halt
exit
fi
fi
fi

View File

@ -1,3 +1,4 @@
# There is currently no testing for libtorch TODO
# binary_linux_libtorch_2.7m_cpu_test:
# environment:
@ -5,12 +6,6 @@
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_2.7m_cu80_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cu80"
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_2.7m_cu90_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cu90"

View File

@ -1,8 +1,17 @@
# update_s3_htmls job
# update_s3_htmls job
# These jobs create html files for every cpu/cu## folder in s3. The html
# files just store the names of all the files in that folder (which are
# binary files (.whl files)). This is to allow pip installs of the latest
# version in a folder without having to know the latest date. Pip has a flag
# -f that you can pass an html file listing a bunch of packages, and pip will
# then install the one with the most recent version.
update_s3_htmls: &update_s3_htmls
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
@ -24,10 +33,11 @@
name: Update s3 htmls
no_output_timeout: "1h"
command: |
set +x
echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" >> /home/circleci/project/env
echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env
source /home/circleci/project/env
set -ex
set -eux -o pipefail
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
@ -45,3 +55,44 @@
environment:
PIP_UPLOAD_FOLDER: "nightly/devtoolset7/"
<<: *update_s3_htmls
# upload_binary_logs job
# The builder hud at pytorch.org/builder shows the sizes of all the binaries
# over time. It gets this info from html files stored in S3, which this job
# populates every day.
upload_binary_sizes: &upload_binary_sizes
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_install_miniconda
- run:
name: Upload binary sizes
no_output_timeout: "1h"
command: |
set +x
echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" > /home/circleci/project/env
echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env
export DATE="$(date -u +%Y_%m_%d)"
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
source /home/circleci/project/env
set -eux -o pipefail
# This is hardcoded to match binary_install_miniconda.sh
export PATH="/home/circleci/project/miniconda/bin:$PATH"
# Not any awscli will work. Most won't. This one will work
retry conda create -qyn aws36 python=3.6
source activate aws36
pip install awscli==1.16.46
"/home/circleci/project/builder/cron/upload_binary_sizes.sh"

View File

@ -7,13 +7,16 @@
# and then update DOCKER_IMAGE_VERSION at the top of the following files:
# * cimodel/data/pytorch_build_definitions.py
# * cimodel/data/caffe2_build_definitions.py
# And the inline copies of the variable in
# * verbatim-sources/job-specs-custom.yml
# (grep for DOCKER_IMAGE)
docker_config_defaults: &docker_config_defaults
user: jenkins
aws_auth:
# This IAM user only allows read-write access to ECR
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V3}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V3}
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and
@ -21,262 +24,33 @@ docker_config_defaults: &docker_config_defaults
setup_linux_system_environment: &setup_linux_system_environment
name: Set Up System Environment
no_output_timeout: "1h"
command: |
set -ex
command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh
# Set up CircleCI GPG keys for apt, if needed
curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
# Stop background apt updates. Hypothetically, the kill should not
# be necessary, because stop is supposed to send a kill signal to
# the process, but we've added it for good luck. Also
# hypothetically, it's supposed to be unnecessary to wait for
# the process to block. We also have that line for good luck.
# If you like, try deleting them and seeing if it works.
sudo systemctl stop apt-daily.service || true
sudo systemctl kill --kill-who=all apt-daily.service || true
sudo systemctl stop unattended-upgrades.service || true
sudo systemctl kill --kill-who=all unattended-upgrades.service || true
# wait until `apt-get update` has been killed
while systemctl is-active --quiet apt-daily.service
do
sleep 1;
done
while systemctl is-active --quiet unattended-upgrades.service
do
sleep 1;
done
# See if we actually were successful
systemctl list-units --all | cat
sudo apt-get purge -y unattended-upgrades
cat /etc/apt/sources.list
ps auxfww | grep [a]pt
ps auxfww | grep dpkg
install_doc_push_script: &install_doc_push_script
name: Install the doc push script
# NB: This (and the command below) must be run after attaching
# ~/workspace. This is NOT the default working directory (that's
# ~/project); this workspace is generated by the setup job.
should_run_job: &should_run_job
name: Should Run Job After attach_workspace
no_output_timeout: "2m"
command: |
cat >/home/circleci/project/doc_push_script.sh <<EOL
# =================== The following code **should** be executed inside Docker container ===================
command: ~/workspace/.circleci/scripts/should_run_job.sh
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "doc_push_script.sh: Invoked with \$*"
git clone https://yf225:${GITHUB_PYTORCHBOT_TOKEN}@github.com/pytorch/pytorch.github.io -b site
pushd pytorch.github.io
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="\$1"
if [ -z "\$install_path" ]; then
echo "error: doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="\$2"
if [ -z "\$version" ]; then
echo "error: doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "\$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "\$3" != "" ]; then
dry_run=true
fi
echo "install_path: \$install_path version: \$version dry_run: \$dry_run"
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "\$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt || true
if [ "\$is_master_doc" = true ]; then
make html
else
make html-stable
fi
# Move them into the docs repo
popd
popd
git rm -rf "\$install_path" || true
mv "\$pt_checkout/docs/build/html" "\$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "\$is_master_doc" = true ]; then
find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.\S+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\$version \&#x25BC</a>@g"
fi
git add "\$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "auto-generating sphinx docs" || true
git status
if [ "\$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:site"
git push origin site
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/doc_push_script.sh
# `setup_ci_environment` has to be run **after** the ``checkout`` step because
# it writes into the checkout directory and otherwise CircleCI will complain
# that
# Directory (/home/circleci/project) you are trying to checkout to is not empty and not git repository
setup_ci_environment: &setup_ci_environment
name: Set Up CI Environment After Checkout
name: Set Up CI Environment After attach_workspace
no_output_timeout: "1h"
command: |
set -ex
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST}"
if [[ "${BUILD_ENVIRONMENT}" == *-slow-* ]]; then
if ! [ -z "${CIRCLE_PULL_REQUEST}" ]; then
# It's a PR; test for [slow ci] tag on the TOPMOST commit
if !(git log --format='%B' -n 1 HEAD | grep -q -e '\[slow ci\]' -e '\[ci slow\]' -e '\[test slow\]' -e '\[slow test\]'); then
circleci step halt
exit
fi
fi
fi
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
if ! [ -z "${CIRCLE_PULL_REQUEST}" ]; then
# It's a PR; test for [xla ci] tag on the TOPMOST commit
if !(git log --format='%B' -n 1 HEAD | grep -q -e '\[xla ci\]' -e '\[ci xla\]' -e '\[test xla\]' -e '\[xla test\]'); then
# NB: This doesn't halt everything, just this job. So
# the rest of the workflow will keep going and you need
# to make sure you halt there too. Blegh.
circleci step halt
exit
fi
fi
fi
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
# version number below for docker-ce and nvidia-docker2 to get newer
# versions of Docker. We hardcode these numbers because we kept
# getting broken CI when Docker would update their docker version,
# and nvidia-docker2 would be out of date for a day until they
# released a newer version of their package.
#
# How to figure out what the correct versions of these packages are?
# My preferred method is to start a Docker instance of the correct
# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask
# apt what the packages you need are. Note that the CircleCI image
# comes with Docker.
sudo apt-get -y install \
linux-headers-$(uname -r) \
linux-image-generic \
moreutils \
docker-ce=5:18.09.4~3-0~ubuntu-xenial \
nvidia-container-runtime=2.0.0+docker18.09.4-1 \
nvidia-docker2=2.0.3+docker18.09.4-1 \
expect-dev
sudo pkill -SIGHUP dockerd
sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
fi
if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH}" >> /home/circleci/project/env
echo "declare -x PYTHON_VERSION=${PYTHON_VERSION}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env
fi
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# This IAM user allows write access to S3 bucket for sccache & bazels3cache
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V1}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V1}" >> /home/circleci/project/env
else
# This IAM user allows write access to S3 bucket for sccache
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}" >> /home/circleci/project/env
fi
fi
# This IAM user only allows read-write access to ECR
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V3}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V3}
eval $(aws ecr get-login --region us-east-1 --no-include-email)
command: ~/workspace/.circleci/scripts/setup_ci_environment.sh
# Installs expect and moreutils so that we can call `unbuffer` and `ts`.
# Also installs OpenMP
# !!!!NOTE!!!! this is copied into a binary_macos_brew_update job which is the
# same but does not install libomp. If you are changing this, consider if you
# need to change that step as well.
macos_brew_update: &macos_brew_update
name: Brew update and install moreutils, expect and libomp
no_output_timeout: "1h"
command: |
set -ex
pwd
ls -lah
# See https://discourse.brew.sh/t/fetching-homebrew-repos-is-slow/5374/3
brew untap caskroom/homebrew-cask
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
@ -286,3 +60,4 @@ macos_brew_update: &macos_brew_update
brew link parallel --overwrite
brew install expect
brew install libomp

View File

@ -1,13 +1,18 @@
pytorch_short_perf_test_gpu:
environment:
BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn7-py3:300"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
PYTHON_VERSION: "3.6"
USE_CUDA_DOCKER_RUNTIME: "1"
resource_class: gpu.medium
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
@ -24,53 +29,107 @@
docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env
# This IAM user allows write access to S3 bucket for perf test numbers
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V3}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V3}" >> /home/circleci/project/env
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
set -x
docker cp /home/circleci/project/env $id:/var/lib/jenkins/workspace/env
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
pytorch_doc_push:
pytorch_python_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn7-py3:300"
BUILD_ENVIRONMENT: pytorch-python-doc-push
# TODO: stop hardcoding this
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *install_doc_push_script
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -e
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
docker cp /home/circleci/project/doc_push_script.sh $id:/var/lib/jenkins/workspace/doc_push_script.sh
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open for merging v1.1.0 -> master for this job.
# XXX: The following code is only run on the v1.1.0 branch, which might
# an eternal PR open for merging v1.2.0 -> master for this job.
# XXX: The following code is only run on the v1.2.0 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.1.0" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/stable 1.1.0") | docker exec -u jenkins -i "$id" bash) 2>&1'
elif [[ "${CIRCLE_BRANCH}" == "v1.2.0" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/stable 1.2.0 site-v1.2.0") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_cpp_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open (#16502) for merging v1.0.1 -> master for this job.
# XXX: The following code is only run on the v1.0.1 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -81,16 +140,21 @@
docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_macos_10_13_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
no_output_timeout: "1h"
command: |
set -e
@ -103,8 +167,10 @@
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
@ -119,44 +185,51 @@
- "*"
pytorch_macos_10_13_py3_test:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
macos:
xcode: "9.0"
steps:
- run:
name: Prepare workspace
command: |
sudo mkdir -p /Users/distiller/pytorch-ci-env
sudo chmod -R 777 /Users/distiller/pytorch-ci-env
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
# This workspace also carries binaries from the build job
- attach_workspace:
at: /Users/distiller/pytorch-ci-env
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *macos_brew_update
- run:
name: Test
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# copy with -a to preserve relative structure (e.g., symlinks), and be recursive
cp -a /Users/distiller/pytorch-ci-env/workspace/. /Users/distiller/project
# TODO: I'm not sure why we can't just run our job in
# ~/workspace and call it a day
# NB: Yes, you need workspace twice
cp -a ~/workspace/workspace/. /Users/distiller/project
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
no_output_timeout: "1h"
command: |
set -e
@ -184,10 +257,11 @@
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
git submodule sync && git submodule update -q --init
git submodule sync && git submodule update -q --init --recursive
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

View File

@ -0,0 +1,34 @@
setup:
docker:
- image: circleci/python:3.7.3
steps:
- checkout
- run:
name: Ensure config is up to date
command: ./ensure-consistency.py
working_directory: .circleci
- run:
name: Save commit message
command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG
# Note [Workspace for CircleCI scripts]
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# In the beginning, you wrote your CI scripts in a
# .circleci/config.yml file, and life was good. Your CI
# configurations flourished and multiplied.
#
# Then one day, CircleCI cometh down high and say, "Your YAML file
# is too biggeth, it stresses our servers so." And thus they
# asketh us to smite the scripts in the yml file.
#
# But you can't just put the scripts in the .circleci folder,
# because in some jobs, you don't ever actually checkout the
# source repository. Where you gonna get the scripts from?
#
# Here's how you do it: you persist .circleci/scripts into a
# workspace, attach the workspace in your subjobs, and run all
# your scripts from there.
- persist_to_workspace:
root: .
paths: .circleci/scripts

View File

@ -1,8 +1,14 @@
# binary linux build defaults
##############################################################################
binary_linux_build: &binary_linux_build
resource_class: 2xlarge+
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
@ -10,23 +16,23 @@ binary_linux_build: &binary_linux_build
- run:
name: Install unbuffer and ts
command: |
set -ex
set -eux -o pipefail
source /env
retry yum -q -y install epel-release
retry yum -q -y install expect moreutils
- run:
name: Upgrade gcc version (based on env var)
name: Update compiler to devtoolset7
command: |
set -ex
set -eux -o pipefail
source /env
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
source "/builder/upgrade_gcc_abi.sh"
source "/builder/update_compiler.sh"
# Env variables are not persisted into the next step
echo "export PATH=$PATH" > /env
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" > /env
echo "export PATH=$PATH" >> /env
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env
else
echo "Not upgrading gcc version"
echo "Not updating compiler"
fi
- run:
name: Build
@ -37,7 +43,6 @@ binary_linux_build: &binary_linux_build
root: /
paths: final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
@ -47,15 +52,18 @@ binary_linux_test: &binary_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
# TODO: We shouldn't attach the workspace multiple times
- attach_workspace:
at: /home/circleci/project
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- attach_workspace:
at: /home/circleci/project
# This checkout is only needed to access
# .circleci/scripts/binary_linux_test.sh, which can't be inlined because it
# blows up the yaml size.
- run:
<<: *binary_checkout
- run:
@ -63,8 +71,7 @@ binary_linux_test: &binary_linux_test
- run:
name: Prepare test code
no_output_timeout: "1h"
command: |
source "/home/circleci/project/pytorch/.circleci/scripts/binary_linux_test.sh"
command: ~/workspace/.circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
@ -72,17 +79,17 @@ binary_linux_upload: &binary_linux_upload
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- attach_workspace:
at: /home/circleci/project
# This checkout is only needed to access
# .circleci/scripts/binary_linux_upload.sh, which can't be inlined because it
# blows up the yaml size.
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
@ -90,5 +97,5 @@ binary_linux_upload: &binary_linux_upload
- run:
name: Upload
no_output_timeout: "1h"
command: |
source "/home/circleci/project/pytorch/.circleci/scripts/binary_linux_upload.sh"
command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

View File

@ -9,6 +9,11 @@ pytorch_linux_build_defaults: &pytorch_linux_build_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- checkout
@ -24,16 +29,31 @@ pytorch_linux_build_defaults: &pytorch_linux_build_defaults
docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
git submodule sync && git submodule update -q --init
git submodule sync && git submodule update -q --init --recursive
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
fi
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
# Note [Special build images]
# The namedtensor and xla builds use the same docker image as
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
docker push ${COMMIT_DOCKER_IMAGE}
fi
@ -42,6 +62,11 @@ pytorch_linux_test_defaults: &pytorch_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
@ -51,7 +76,16 @@ pytorch_linux_test_defaults: &pytorch_linux_test_defaults
no_output_timeout: "90m"
command: |
set -e
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
# See Note [Special build images]
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
@ -60,9 +94,9 @@ pytorch_linux_test_defaults: &pytorch_linux_test_defaults
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
if [ -n "${MULTI_GPU}" ]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"'&& echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -71,6 +105,11 @@ caffe2_linux_build_defaults: &caffe2_linux_build_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- checkout
@ -130,8 +169,13 @@ caffe2_linux_test_defaults: &caffe2_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *should_run_job
- run:
<<: *setup_ci_environment
- run:

View File

@ -7,12 +7,17 @@ binary_mac_build: &binary_mac_build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *macos_brew_update
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
@ -20,7 +25,7 @@ binary_mac_build: &binary_mac_build
name: Build
no_output_timeout: "1h"
command: |
set -ex
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
@ -29,7 +34,7 @@ binary_mac_build: &binary_mac_build
name: Test
no_output_timeout: "1h"
command: |
set -ex
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
@ -42,20 +47,26 @@ binary_mac_upload: &binary_mac_upload
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *macos_brew_update
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- attach_workspace:
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
name: Upload
no_output_timeout: "10m"
command: |
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"
cat "$script"
source "$script"

View File

@ -7,6 +7,11 @@ caffe2_macos_build_defaults: &caffe2_macos_build_defaults
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
@ -50,8 +55,10 @@ caffe2_macos_build_defaults: &caffe2_macos_build_defaults
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
export SCCACHE_BIN=${PWD}/sccache_bin
mkdir -p ${SCCACHE_BIN}

View File

@ -25,61 +25,19 @@
# do not need both the pytorch and builder repos, so this is a little wasteful
# (smoke tests and upload jobs do not need the pytorch repo).
binary_checkout: &binary_checkout
name: Checkout
command: |
if [[ -n "$CIRCLE_SHA1" ]]; then
# we are on a PR or on a master-merge, but not on a timed build
git_url="https://raw.githubusercontent.com/pytorch/pytorch/$CIRCLE_SHA1"
else
# scheduled nightly binary builds. These run at 05:05 UTC every day.
last_commit="$(git rev-list --before "$(date -u +%Y-%m-%d) 05:00" --max-count 1 HEAD)"
git_url="https://raw.githubusercontent.com/pytorch/pytorch/$last_commit"
fi
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry curl -s "$git_url/.circleci/scripts/binary_checkout.sh" -o "binary_checkout.sh"
cat "binary_checkout.sh"
source "binary_checkout.sh"
name: Checkout pytorch/builder repo
command: ~/workspace/.circleci/scripts/binary_checkout.sh
# Parses circleci arguments in a consistent way, essentially routing to the
# correct pythonXgccXcudaXos build we want
binary_populate_env: &binary_populate_env
name: Set up env
command: |
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
script="$workdir/pytorch/.circleci/scripts/binary_populate_env.sh"
cat "$script"
source "$script"
name: Set up binary env variables
command: ~/workspace/.circleci/scripts/binary_populate_env.sh
binary_install_miniconda: &binary_install_miniconda
name: Install miniconda
no_output_timeout: "1h"
command: |
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
script="$workdir/pytorch/.circleci/scripts/binary_install_miniconda.sh"
cat "$script"
source "$script"
command: ~/workspace/.circleci/scripts/binary_install_miniconda.sh
# This section is used in the binary_test and smoke_test jobs. It expects
# 'binary_populate_env' to have populated /home/circleci/project/env and it
@ -87,9 +45,26 @@ binary_install_miniconda: &binary_install_miniconda
# with the code to run in the docker
binary_run_in_docker: &binary_run_in_docker
name: Run in docker
# This step only runs on circleci linux machine executors that themselves
# need to start docker images
command: ~/workspace/.circleci/scripts/binary_run_in_docker.sh
# This is copied almost verbatim from the macos_brew_update job
# In version 2.1 and above we could make this a command and pass a parameter to
# it, but in this version there is no way to pass a parameter to a step
binary_macos_brew_update: &binary_macos_brew_update
name: Brew update and install moreutils and expect
no_output_timeout: "1h"
command: |
# This step only runs on circleci linux machine executors that themselves
# need to start docker images
script="/home/circleci/project/pytorch/.circleci/scripts/binary_install_miniconda.sh"
cat "$script"
source "$script"
set -eux -o pipefail
# See https://discourse.brew.sh/t/fetching-homebrew-repos-is-slow/5374/3
brew untap caskroom/homebrew-cask
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
brew update
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect

View File

@ -1,3 +1,4 @@
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
@ -7,10 +8,16 @@ smoke_linux_test: &smoke_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace:
at: /home/circleci/project
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
@ -20,8 +27,7 @@ smoke_linux_test: &smoke_linux_test
set -ex
cat >/home/circleci/project/ci_test_script.sh <<EOL
# The following code will be executed inside Docker container
set -ex
git clone https://github.com/pytorch/builder.git /builder
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
@ -32,18 +38,27 @@ smoke_mac_test: &smoke_mac_test
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *macos_brew_update
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
git clone https://github.com/pytorch/builder.git
unbuffer ./builder/smoke_test.sh | ts
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh

View File

@ -1,26 +1,51 @@
# Binary builds (subset, to smoke test that they'll work)
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_build
- binary_linux_manywheel_3.7m_cu100_devtoolset3_build
- binary_linux_conda_2.7_cpu_build
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_build
- binary_linux_libtorch_2.7m_cu80_devtoolset3_build
- binary_macos_wheel_3.6_cpu_build
- binary_macos_conda_2.7_cpu_build
- binary_macos_libtorch_2.7_cpu_build
#
# NB: If you modify this file, you need to also modify
# the binary_and_smoke_tests_on_pr variable in
# pytorch-ci-hud to adjust the list of whitelisted builds
# at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_test:
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_build:
requires:
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_build
- binary_linux_manywheel_3.7m_cu100_devtoolset3_test:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset7_build:
requires:
- binary_linux_manywheel_3.7m_cu100_devtoolset3_build
- binary_linux_conda_2.7_cpu_test:
- setup
- binary_linux_conda_2.7_cpu_devtoolset7_build:
requires:
- binary_linux_conda_2.7_cpu_build
- setup
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_test:
# - binary_linux_conda_3.6_cu90_devtoolset7_build
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build:
requires:
- setup
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2.7m_cu90_devtoolset7_static-without-deps_build
- binary_macos_wheel_3.6_cpu_build:
requires:
- setup
- binary_macos_conda_2.7_cpu_build:
requires:
- setup
- binary_macos_libtorch_2.7_cpu_build:
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_test:
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_build
- binary_linux_manywheel_3.7m_cu100_devtoolset7_test:
requires:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset7_build
- binary_linux_conda_2.7_cpu_devtoolset7_test:
requires:
- setup
- binary_linux_conda_2.7_cpu_devtoolset7_build
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_devtoolset7_test:
# requires:
# - binary_linux_conda_3.6_cu90_build
# - setup
# - binary_linux_conda_3.6_cu90_devtoolset7_build

View File

@ -1,9 +1,6 @@
#- binary_linux_libtorch_2.7m_cpu_test:
# requires:
# - binary_linux_libtorch_2.7m_cpu_build
#- binary_linux_libtorch_2.7m_cu80_test:
# requires:
# - binary_linux_libtorch_2.7m_cu80_build
#- binary_linux_libtorch_2.7m_cu90_test:
# requires:
# - binary_linux_libtorch_2.7m_cu90_build

View File

@ -1,8 +1,19 @@
# Warning: indentation here matters!
# Pytorch MacOS builds
- pytorch_macos_10_13_py3_build
- pytorch_macos_10_13_py3_build:
requires:
- setup
filters:
branches:
only: master
- pytorch_macos_10_13_py3_test:
requires:
- setup
- pytorch_macos_10_13_py3_build
- pytorch_macos_10_13_cuda9_2_cudnn7_py3_build
filters:
branches:
only: master
- pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
requires:
- setup

View File

@ -1,5 +1,10 @@
# Scheduled to run 4 hours after the binary jobs start
# These jobs need to run after all the binary jobs run, regardless of if the
# jobs failed or not. There's no way to do this in CircleCI right now, so we
# just schedule this to run after all the binary jobs should've finished.
# These jobs are all idempotent and very lightweight; they just upload html
# files that track what binaries are available and what their sizes are.
update_s3_htmls:
triggers:
- schedule:
@ -9,7 +14,17 @@
only:
- master
jobs:
- setup
- update_s3_htmls_for_nightlies:
context: org-member
requires:
- setup
- update_s3_htmls_for_nightlies_devtoolset7:
context: org-member
requires:
- setup
- upload_binary_sizes:
context: org-member
requires:
- setup

View File

@ -4,7 +4,7 @@ max-line-length = 120
# C408 ignored because we like the dict keyword argument syntax
# E501 is not flexible enough, we're using B950 instead
ignore =
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!

7
.gitignore vendored
View File

@ -31,10 +31,8 @@ test/.coverage
test/.hypothesis/
test/cpp/api/mnist
test/custom_operator/model.pt
test/data/gpu_tensors.pt
test/data/legacy_modules.t7
test/data/legacy_serialized.pt
test/data/linear.pt
test/data/*.pt
dropout_model.pt
test/generated_type_hints_smoketest.py
test/htmlcov
@ -43,6 +41,8 @@ third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/
torch/__init__.pyi
torch/nn/functional.pyi
torch/nn/modules/*.pyi
torch/csrc/autograd/generated/*
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
@ -85,6 +85,7 @@ torch/version.py
# Root level file used in CI to specify certain env configs.
# E.g., see .circleci/config.yaml
env
.circleci/scripts/COMMIT_MSG
# IPython notebook checkpoints
.ipynb_checkpoints

144
.gitmodules vendored
View File

@ -1,84 +1,116 @@
[submodule "third_party/pybind11"]
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
ignore = dirty
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
[submodule "third_party/cub"]
path = third_party/cub
url = https://github.com/NVlabs/cub.git
ignore = dirty
path = third_party/cub
url = https://github.com/NVlabs/cub.git
[submodule "third_party/eigen"]
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
ignore = dirty
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
[submodule "third_party/googletest"]
path = third_party/googletest
url = https://github.com/google/googletest.git
ignore = dirty
path = third_party/googletest
url = https://github.com/google/googletest.git
[submodule "third_party/benchmark"]
path = third_party/benchmark
url = https://github.com/google/benchmark.git
ignore = dirty
path = third_party/benchmark
url = https://github.com/google/benchmark.git
[submodule "third_party/protobuf"]
path = third_party/protobuf
url = https://github.com/google/protobuf.git
ignore = dirty
path = third_party/protobuf
url = https://github.com/protocolbuffers/protobuf.git
[submodule "third_party/ios-cmake"]
path = third_party/ios-cmake
url = https://github.com/Yangqing/ios-cmake.git
ignore = dirty
path = third_party/ios-cmake
url = https://github.com/Yangqing/ios-cmake.git
[submodule "third_party/NNPACK"]
path = third_party/NNPACK
url = https://github.com/Maratyszcza/NNPACK.git
ignore = dirty
path = third_party/NNPACK
url = https://github.com/Maratyszcza/NNPACK.git
[submodule "third_party/gloo"]
path = third_party/gloo
url = https://github.com/facebookincubator/gloo
ignore = dirty
path = third_party/gloo
url = https://github.com/facebookincubator/gloo
[submodule "third_party/NNPACK_deps/pthreadpool"]
path = third_party/pthreadpool
url = https://github.com/Maratyszcza/pthreadpool.git
ignore = dirty
path = third_party/pthreadpool
url = https://github.com/Maratyszcza/pthreadpool.git
[submodule "third_party/NNPACK_deps/FXdiv"]
path = third_party/FXdiv
url = https://github.com/Maratyszcza/FXdiv.git
ignore = dirty
path = third_party/FXdiv
url = https://github.com/Maratyszcza/FXdiv.git
[submodule "third_party/NNPACK_deps/FP16"]
path = third_party/FP16
url = https://github.com/Maratyszcza/FP16.git
ignore = dirty
path = third_party/FP16
url = https://github.com/Maratyszcza/FP16.git
[submodule "third_party/NNPACK_deps/psimd"]
path = third_party/psimd
url = https://github.com/Maratyszcza/psimd.git
ignore = dirty
path = third_party/psimd
url = https://github.com/Maratyszcza/psimd.git
[submodule "third_party/zstd"]
path = third_party/zstd
url = https://github.com/facebook/zstd.git
ignore = dirty
path = third_party/zstd
url = https://github.com/facebook/zstd.git
[submodule "third-party/cpuinfo"]
path = third_party/cpuinfo
url = https://github.com/Maratyszcza/cpuinfo.git
ignore = dirty
path = third_party/cpuinfo
url = https://github.com/pytorch/cpuinfo.git
[submodule "third_party/python-enum"]
path = third_party/python-enum
url = https://github.com/PeachPy/enum34.git
ignore = dirty
path = third_party/python-enum
url = https://github.com/PeachPy/enum34.git
[submodule "third_party/python-peachpy"]
path = third_party/python-peachpy
url = https://github.com/Maratyszcza/PeachPy.git
ignore = dirty
path = third_party/python-peachpy
url = https://github.com/Maratyszcza/PeachPy.git
[submodule "third_party/python-six"]
path = third_party/python-six
url = https://github.com/benjaminp/six.git
ignore = dirty
path = third_party/python-six
url = https://github.com/benjaminp/six.git
[submodule "third_party/onnx"]
path = third_party/onnx
url = https://github.com/onnx/onnx.git
ignore = dirty
path = third_party/onnx
url = https://github.com/onnx/onnx.git
[submodule "third_party/onnx-tensorrt"]
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
ignore = dirty
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
[submodule "third_party/sleef"]
path = third_party/sleef
url = https://github.com/zdevito/sleef
ignore = dirty
path = third_party/sleef
url = https://github.com/zdevito/sleef
[submodule "third_party/ideep"]
path = third_party/ideep
url = https://github.com/intel/ideep
ignore = dirty
path = third_party/ideep
url = https://github.com/intel/ideep
[submodule "third_party/nccl/nccl"]
path = third_party/nccl/nccl
url = https://github.com/NVIDIA/nccl
ignore = dirty
path = third_party/nccl/nccl
url = https://github.com/NVIDIA/nccl
[submodule "third_party/gemmlowp/gemmlowp"]
path = third_party/gemmlowp/gemmlowp
url = https://github.com/google/gemmlowp.git
ignore = dirty
path = third_party/gemmlowp/gemmlowp
url = https://github.com/google/gemmlowp.git
[submodule "third_party/QNNPACK"]
path = third_party/QNNPACK
url = https://github.com/pytorch/QNNPACK
ignore = dirty
path = third_party/QNNPACK
url = https://github.com/pytorch/QNNPACK
[submodule "third_party/neon2sse"]
path = third_party/neon2sse
url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
ignore = dirty
path = third_party/neon2sse
url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
[submodule "third_party/fbgemm"]
path = third_party/fbgemm
url = https://github.com/pytorch/fbgemm
ignore = dirty
path = third_party/fbgemm
url = https://github.com/pytorch/fbgemm
[submodule "third_party/foxi"]
path = third_party/foxi
url = https://github.com/houseroad/foxi.git
ignore = dirty
path = third_party/foxi
url = https://github.com/houseroad/foxi.git
[submodule "third_party/tbb"]
path = third_party/tbb
url = https://github.com/01org/tbb
branch = tbb_2018

View File

@ -17,30 +17,44 @@ fi
caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')"
# Resnet50
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4
fi
# ResNext
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4
fi
# Shufflenet
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu --model shufflenet
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --model shufflenet
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2 --model shufflenet
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 --model shufflenet
fi

View File

@ -255,7 +255,7 @@ else
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
if [[ -n "${SCCACHE}" && $BUILD_ENVIRONMENT != *rocm* ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
@ -271,4 +271,14 @@ fi
# Install ONNX into a local directory
pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
ORIG_COMP=/opt/rocm/hcc/bin/clang-*_original
if [ -e $ORIG_COMP ]; then
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
# note that the wrapping always names the compiler "clang-7.0_original"
WRAPPED=/opt/rocm/hcc/bin/clang-[0-99]
sudo mv $ORIG_COMP $WRAPPED
fi
fi
report_compile_cache_stats

View File

@ -92,7 +92,6 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# Unknown reasons, need to debug
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/piecewise_linear_transform_test.py")
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/softmax_ops_test.py")
# On ROCm, RCCL (distributed) development isn't complete.
# https://github.com/ROCmSoftwarePlatform/rccl
@ -102,6 +101,11 @@ fi
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
echo "Running Python tests.."
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
# locale setting is required by click package with py3
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
fi
pip install --user pytest-sugar
"$PYTHON" \
-m pytest \
@ -121,6 +125,12 @@ pip install --user pytest-sugar
# torchvision tests #
#####################
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip install --user torchvision
pip install -q --user git+https://github.com/pytorch/vision.git
pip install -q --user ninja
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
pip install -q --user onnxruntime==0.4.0
fi
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -29,11 +29,11 @@ export ASAN_OPTIONS=detect_leaks=0:symbolize=1
# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems
# [3] https://github.com/Kitware/CMake/commit/e9a1ddc594de6e6251bf06d732775dae2cabe4c8
#
# TODO: Make the ASAN flags a more unified env var
# TODO: Make the ASAN flags a centralized env var and unify with USE_ASAN option
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
NO_CUDA=1 USE_MKLDNN=0 \
USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py install
assert_git_not_dirty

View File

@ -20,7 +20,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
@ -32,7 +32,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT"
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-py3-clang5-asan* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
@ -46,15 +46,15 @@ echo "CMake version:"
cmake --version
# TODO: Don't run this...
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
# TODO: Don't install this here
if ! which conda; then
# In ROCm CIs, we are doing cross compilation on build machines with
# intel cpu and later run tests on machines with amd cpu.
# Also leave out two builds to make sure non-mkldnn builds still work.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda8-cudnn7-py3-* ]]; then
pip install -q mkl mkl-devel
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then
pip_install mkl mkl-devel
export USE_MKLDNN=1
else
export USE_MKLDNN=0
@ -65,10 +65,9 @@ fi
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
build_args=()
build_args+=("-DBUILD_BINARY=ON")
build_args+=("-DBUILD_TEST=ON")
build_args+=("-DUSE_OBSERVERS=ON")
build_args+=("-DUSE_ZSTD=ON")
build_args+=("-DBUILD_CAFFE2_MOBILE=OFF")
build_args+=("-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')")
build_args+=("-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')")
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
@ -109,6 +108,15 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer
# LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user
ORIG_COMP=/opt/rocm/hcc/bin/clang-*_original
if [ -e $ORIG_COMP ]; then
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
# note that the wrapping always names the compiler "clang-7.0_original"
WRAPPED=/opt/rocm/hcc/bin/clang-[0-99]
sudo mv $ORIG_COMP $WRAPPED
fi
exit 0
fi
@ -156,17 +164,17 @@ fi
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
fi
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
@ -202,7 +210,7 @@ fi
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# TODO: Move this to Dockerfile.
pip install -q lark-parser
pip_install lark-parser
# Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642
sudo add-apt-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main"

View File

@ -17,9 +17,21 @@ function cleanup {
set -ex
# Save the SCRIPT_DIR absolute path in case later we chdir (as occurs in the gpu perf test)
SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )"
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# Figure out which Python to use for ROCm
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
# non-interactive bashs do not expand aliases by default
shopt -s expand_aliases
export PYTORCH_TEST_WITH_ROCM=1
alias python="$PYTHON"
fi
# This token is used by a parser on Jenkins logs for determining
# if a failure is a legitimate problem, or a problem with the build
# system; to find out more, grep for this string in ossci-job-dsl.
@ -89,7 +101,7 @@ if which sccache > /dev/null; then
sccache --zero-stats
function sccache_epilogue() {
echo '=================== sccache compilation log ==================='
python "$(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py" ~/sccache_error.log
python "$SCRIPT_DIR/print_sccache_log.py" ~/sccache_error.log 2>/dev/null
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
@ -116,21 +128,7 @@ if [ -z "$COMPACT_JOB_NAME" ]; then
exit 1
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/disabled-configs.txt"; then
echo "Job is explicitly disabled, SKIPPING"
exit 0
else
echo "Job is not disabled, proceeding"
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/enabled-configs.txt"; then
echo "Job is enabled, proceeding"
else
echo "Job is not enabled, FAILING now (revert changes to enabled-configs.txt to fix this)"
exit 1
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then
BUILD_TEST_LIBTORCH=1
@ -141,7 +139,7 @@ fi
# Use conda cmake in some CI build. Conda cmake will be newer than our supported
# min version 3.5, so we only do it in two builds that we know should use conda.
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn7-py2* ]] || \
if [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py2* ]] || \
[[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then
if ! which conda; then
echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"
@ -158,6 +156,11 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
fi
fi
function pip_install() {
# retry 3 times
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"
}
function get_exit_code() {
set +e
"$@"

View File

@ -1,5 +0,0 @@
# This file contains a list of disabled configurations. Disabled
# configurations are skipped and not considered a failure if they
# fail. You can use this to temporarily reserve a test name to
# turn on CI side before PyTorch repository supports it. This
# file has the same format as .jenkins/enabled-configs.txt

View File

@ -1,57 +0,0 @@
# This file contains a list of enabled configurations
# to perform tests on. If you want to run tests on CI on
# a limited set of tests before enabling the full test suite,
# you can delete lines from this file. Any test that is not
# in this file will report a failure (so you don't forget to
# reenable the tests on merge ;)
pytorch-linux-xenial-cuda8-cudnn7-py3-build
pytorch-linux-xenial-cuda8-cudnn7-py3-test
pytorch-linux-xenial-cuda8-cudnn7-py3-multigpu-test
pytorch-linux-xenial-cuda8-cudnn7-py3-nogpu-test
pytorch-linux-xenial-cuda9-cudnn7-py2-build
pytorch-linux-xenial-cuda9-cudnn7-py2-test
pytorch-linux-xenial-cuda9-cudnn7-py3-build
pytorch-linux-xenial-cuda9-cudnn7-py3-test
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-test
pytorch-linux-xenial-py3-clang5-asan-build
pytorch-linux-xenial-py3-clang5-asan-test
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build
pytorch-linux-trusty-py2.7.9-build
pytorch-linux-trusty-py2.7.9-test
pytorch-linux-trusty-py2.7-build
pytorch-linux-trusty-py2.7-test
pytorch-linux-trusty-py3.5-build
pytorch-linux-trusty-py3.5-test
pytorch-linux-trusty-py3.6-gcc4.8-build
pytorch-linux-trusty-py3.6-gcc4.8-test
pytorch-linux-trusty-py3.6-gcc5.4-build
pytorch-linux-trusty-py3.6-gcc5.4-test
pytorch-linux-trusty-py3.6-gcc7.2-build
pytorch-linux-trusty-py3.6-gcc7.2-test
pytorch-linux-trusty-py3.6-gcc7-build
pytorch-linux-trusty-py3.6-gcc7-test
pytorch-linux-trusty-pynightly-build
pytorch-linux-trusty-pynightly-test
pytorch-win-ws2016-cuda9-cudnn7-py3-build
pytorch-win-ws2016-cuda9-cudnn7-py3-test
pytorch-macos-10.13-py3-build
pytorch-macos-10.13-py3-test
pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
pytorch-docker-build-test
short-perf-test-cpu
short-perf-test-gpu
py2-clang7-rocmdeb-ubuntu16.04
py2-devtoolset7-rocmrpm-centos7.5
pytorch-ppc64le-cuda9.2-cudnn7-py3-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-test
pytorch-ppc64le-cuda9.1-cudnn7-py3-build
pytorch-ppc64le-cuda9.1-cudnn7-py3-test
pytorch-linux-xenial-cuda8-cudnn7-py3-NO_AVX2-test
pytorch-linux-xenial-cuda8-cudnn7-py3-NO_AVX-NO_AVX2-test
pytorch-linux-xenial-cuda8-cudnn7-py3-slow-test
pytorch-xla-linux-trusty-py3.6-gcc5.4-build
pytorch-xla-linux-trusty-py3.6-gcc5.4-test

View File

@ -30,7 +30,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
export NO_CUDA=0
export USE_CUDA=1
if [ -z "${IN_CIRCLECI}" ]; then
# Eigen gives "explicit specialization of class must precede its first use" error

View File

@ -8,7 +8,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export PATH="/usr/local/bin:$PATH"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
export PYTORCH_ENV_DIR="${HOME}/workspace"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}

View File

@ -27,5 +27,9 @@ if [ -n "${IN_CIRCLECI}" ]; then
fi
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api
time python test/run_test.py --verbose -i distributed
time python test/run_test.py --verbose -i c10d
time python test/run_test.py --verbose -i c10d_spawn
assert_git_not_dirty

View File

@ -12,7 +12,7 @@ test_cpu_speed_mnist () {
cd examples/mnist
pip install -q -r requirements.txt
conda install -c pytorch torchvision-cpu
# Download data
python main.py --epochs 0

View File

@ -12,7 +12,7 @@ test_gpu_speed_mnist () {
cd examples/mnist
pip install -q -r requirements.txt
conda install -c pytorch torchvision
# Download data
python main.py --epochs 0

View File

@ -35,30 +35,30 @@ fi
# --user breaks ppc64le builds and these packages are already in ppc64le docker
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# JIT C++ extensions require ninja.
pip install -q ninja --user
pip_install --user ninja
# ninja is installed in /var/lib/jenkins/.local/bin
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
pip install -q hypothesis --user
pip_install --user hypothesis
# TODO: move this to Docker
PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)
echo $PYTHON_VERSION
if [[ $PYTHON_VERSION == "2" ]]; then
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl --user
else
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl --user
fi
# if [[ $PYTHON_VERSION == "2" ]]; then
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl
# else
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# fi
pip_install --user tb-nightly
# mypy will fail to install on Python <3.4. In that case,
# we just won't run these tests.
pip install mypy --user || true
pip_install --user mypy || true
fi
# faulthandler become built-in since 3.3
if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
pip install -q faulthandler --user
pip_install --user faulthandler
fi
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
@ -92,13 +92,6 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export PYTORCH_TEST_WITH_ROCM=1
# ROCm CI is using Caffe2 docker images, which doesn't have several packages
# needed in testing. We install them here.
pip install -q psutil "librosa>=0.6.2" --user
fi
if [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX-* ]]; then
export ATEN_CPU_CAPABILITY=default
elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then
@ -141,22 +134,7 @@ test_aten() {
}
test_torchvision() {
rm -rf ninja
echo "Installing torchvision at branch master"
rm -rf vision
# TODO: This git clone is bad, it means pushes to torchvision can break
# PyTorch CI
git clone https://github.com/pytorch/vision --quiet
pushd vision
# python setup.py install with a tqdm dependency is broken in the
# Travis Python nightly (but not in latest Python nightlies, so
# this should be a transient requirement...)
# See https://github.com/pytorch/pytorch/issues/7525
#time python setup.py install
pip install -q --user .
popd
rm -rf vision
pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d
}
test_libtorch() {
@ -165,13 +143,13 @@ test_libtorch() {
python test/cpp/jit/tests_setup.py setup
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/caffe2/bin/test_jit
"$CPP_BUILD"/caffe2/build/bin/test_jit
else
"$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"
"$CPP_BUILD"/caffe2/build/bin/test_jit "[cpu]"
fi
python test/cpp/jit/tests_setup.py shutdown
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/build/bin/test_api
assert_git_not_dirty
fi
}
@ -203,6 +181,7 @@ test_xla() {
}
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
test_torchvision

View File

@ -15,6 +15,7 @@ COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
source "$SCRIPT_PARENT_DIR/common.sh"
export IMAGE_COMMIT_ID=`git rev-parse HEAD`
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

View File

@ -21,13 +21,42 @@ if "%REBUILD%"=="" ( pip install -q ninja )
git submodule sync --recursive
git submodule update --init --recursive
set PATH=%TMP_DIR_WIN%\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;%PATH%
if "%CUDA_VERSION%" == "9" goto cuda_build_9
if "%CUDA_VERSION%" == "10" goto cuda_build_10
goto cuda_build_end
:cuda_build_9
:: Override VS env here
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set DISTUTILS_USE_SDK=1
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDA_PATH_V10_1=%CUDA_PATH%
goto cuda_build_common
:cuda_build_common
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set CUDNN_LIB_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDNN_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
:cuda_build_end
set PATH=%TMP_DIR_WIN%\bin;%PATH%
:: Target only our CI GPU machine's CUDA arch to speed up the build
set TORCH_CUDA_ARCH_LIST=5.2
@ -40,16 +69,26 @@ set CXX=sccache cl
set CMAKE_GENERATOR=Ninja
:: The following code will try to build PyTorch twice if USE_CUDA is neither 0
:: nor 1. It is intended so that both builds can be folded into 1 CI run.
if not "%USE_CUDA%"=="1" (
if "%REBUILD%"=="" (
set NO_CUDA=1
:: Must save and restore the original value of USE_CUDA, otherwise the
:: `if not "%USE_CUDA%"=="0"` line can be messed up.
set OLD_USE_CUDA=%USE_CUDA%
set USE_CUDA=0
python setup.py install
set USE_CUDA=%OLD_USE_CUDA%
)
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
)
if not "%USE_CUDA%"=="0" (
:: sccache will fail for CUDA builds if all cores are used for compiling
if not defined MAX_JOBS set /A MAX_JOBS=%NUMBER_OF_PROCESSORS%-1
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
@ -64,13 +103,12 @@ if not "%USE_CUDA%"=="0" (
set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc
if "%REBUILD%"=="" set NO_CUDA=0
if "%REBUILD%"=="" set USE_CUDA=1
python setup.py install --cmake && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.
) else (
mv %CD%\build\bin\test_api.exe %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch\lib
7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && python %SCRIPT_HELPERS_DIR%\upload_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z
)
)

View File

@ -1,3 +1,5 @@
#!/usr/bin/env python
import os
import sys
import boto3

View File

@ -1,9 +1,17 @@
if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda90
if "%CUDA_VERSION%" == "10" set CUDA_SUFFIX=cuda101
if "%CUDA_SUFFIX%" == "" (
echo unknown CUDA version, please set `CUDA_VERSION` to 9 or 10.
exit /b 1
)
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_cuda90_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.5.0_cuda90_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z --quiet
aws s3 cp s3://ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
)
set MAGMA_HOME=%TMP_DIR_WIN%\magma

View File

@ -19,27 +19,55 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
if NOT "%BUILD_ENVIRONMENT%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba
:: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0
)
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow
:: No need to install faulthandler since we only test Python >= 3.6 on Windows
:: faulthandler is builtin since Python 3.3
if "%CUDA_VERSION%" == "9" goto cuda_build_9
if "%CUDA_VERSION%" == "10" goto cuda_build_10
goto cuda_build_end
:cuda_build_9
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;%PATH%
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDA_PATH_V10_1=%CUDA_PATH%
goto cuda_build_common
:cuda_build_common
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set CUDNN_LIB_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDNN_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set NUMBAPRO_CUDALIB=%CUDA_PATH%\bin
set NUMBAPRO_LIBDEVICE=%CUDA_PATH%\nvvm\libdevice
set NUMBAPRO_NVVM=%CUDA_PATH%\nvvm\bin\nvvm64_32_0.dll
:cuda_build_end
set PYTHONPATH=%TMP_DIR_WIN%\build;%PYTHONPATH%
set NUMBAPRO_CUDALIB=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin
set NUMBAPRO_LIBDEVICE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\nvvm\libdevice
set NUMBAPRO_NVVM=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\nvvm\bin\nvvm64_32_0.dll
if NOT "%BUILD_ENVIRONMENT%"=="" (
pushd %TMP_DIR_WIN%\build
@ -51,4 +79,7 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (
xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\
)
@echo off
echo @echo off >> %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
for /f "usebackq tokens=*" %%i in (`set`) do echo set "%%i" >> %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
@echo on

View File

@ -4,11 +4,22 @@ cd test\custom_operator
:: Build the custom operator library.
mkdir build
cd build
pushd build
echo "Executing CMake for custom_operator test..."
:: Note: Caffe2 does not support MSVC + CUDA + Debug mode (has to be Release mode)
cmake -DCMAKE_PREFIX_PATH=%TMP_DIR_WIN%\build\torch -DCMAKE_BUILD_TYPE=Release -GNinja ..
if ERRORLEVEL 1 exit /b 1
echo "Executing Ninja for custom_operator test..."
ninja -v
cd ..
if ERRORLEVEL 1 exit /b 1
echo "Ninja succeeded for custom_operator test."
popd
:: Run tests Python-side and export a script module.
python test_custom_ops.py -v

View File

@ -1,9 +1,7 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
dir
dir %TMP_DIR_WIN%\build
dir %TMP_DIR_WIN%\build\torch
dir %TMP_DIR_WIN%\build\torch\lib
cd %TMP_DIR_WIN%\build\torch\lib
cd %TMP_DIR_WIN%\build\torch\bin
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*"
if errorlevel 1 exit /b 1

View File

@ -1,3 +1,5 @@
#!/usr/bin/env python
import os
import sys
import boto3

View File

@ -6,6 +6,7 @@ COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
source "$SCRIPT_PARENT_DIR/common.sh"
export IMAGE_COMMIT_ID=`git rev-parse HEAD`
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

View File

@ -13,7 +13,7 @@ matrix:
fast_finish: true
include:
- name: "Ensure consistent CircleCI YAML"
python: "3.6"
python: "3.7"
dist: xenial
script: cd .circleci && ./ensure-consistency.py
- name: "Shellcheck Jenkins scripts"
@ -27,16 +27,16 @@ matrix:
- name: "Python 2.7 Lint"
python: "2.7"
install: pip install flake8
script: flake8
script:
# NB: Exclude .circleci as it is Python 3 only
- rm -rf .circleci
- flake8
- name: "Python 3.7 Lint"
python: "3.7"
dist: xenial # required for Python 3.7 (travis-ci/travis-ci#9069)
sudo: required # required for Python 3.7 (travis-ci/travis-ci#9069)
install:
- pip install flake8 flake8-mypy flake8-comprehensions flake8-pyi mccabe pycodestyle pyflakes
# Apparently Facebook runs master of this one
# https://github.com/PyCQA/flake8-bugbear/issues/53
- pip install git+https://github.com/PyCQA/flake8-bugbear.git@d9444713a51a9fb6ee8cd2d88fca85e9ff0c2d58
- pip install flake8 flake8-bugbear flake8-mypy flake8-comprehensions flake8-pyi mccabe pycodestyle pyflakes
script: flake8
- name: "MyPy typecheck"
python: "3.6"

View File

@ -1,6 +1,6 @@
@inproceedings{paszke2017automatic,
title={Automatic differentiation in PyTorch},
title={Automatic Differentiation in {PyTorch}},
author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
booktitle={NIPS-W},
booktitle={NIPS Autodiff Workshop},
year={2017}
}

View File

@ -16,6 +16,12 @@ set(CMAKE_CXX_STANDARD 11)
if (NOT MSVC)
set(CMAKE_C_STANDARD 11)
endif()
if (DEFINED GLIBCXX_USE_CXX11_ABI)
if (${GLIBCXX_USE_CXX11_ABI} EQUAL 1)
set(CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=1")
endif()
endif()
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
@ -57,13 +63,17 @@ if(APPLE)
set(CMAKE_MACOSX_RPATH ON)
endif()
if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")
set(CPU_INTEL ON)
else ()
set(CPU_INTEL OFF)
endif ()
# ---[ Options.
# Note to developers: if you add an option below, make sure you also add it to
# cmake/Summary.cmake so that the summary prints out the option values.
include(CMakeDependentOption)
option(BUILD_TORCH "Build Torch" OFF)
option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)
option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)
option(BUILD_ATEN_ONLY "Build only a subset focused on ATen only" OFF)
option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
@ -71,6 +81,8 @@ option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_pa
option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)
option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
"BUILD_SHARED_LIBS AND BUILD_CUSTOM_PROTOBUF" OFF)
@ -81,6 +93,7 @@ option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)
cmake_dependent_option(
INSTALL_TEST "Install test binaries if BUILD_TEST is on" ON
"BUILD_TEST" OFF)
option(COLORIZE_OUTPUT "Colorize output during compilation" ON)
option(USE_ASAN "Use Address Sanitizer" OFF)
option(USE_CUDA "Use CUDA" ON)
option(USE_ROCM "Use ROCm" ON)
@ -88,7 +101,7 @@ option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)
cmake_dependent_option(
USE_CUDNN "Use cuDNN" ON
"USE_CUDA" OFF)
option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" OFF)
option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" ON)
option(USE_FFMPEG "Use ffmpeg" OFF)
option(USE_GFLAGS "Use GFLAGS" OFF)
option(USE_GLOG "Use GLOG" OFF)
@ -97,8 +110,15 @@ option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(USE_LMDB "Use LMDB" OFF)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
cmake_dependent_option(
USE_NCCL "Use NCCL" ON
"USE_CUDA;UNIX;NOT APPLE" OFF)
cmake_dependent_option(
USE_STATIC_NCCL "Use static NCCL" OFF
"USE_NCCL" OFF)
cmake_dependent_option(
USE_SYSTEM_NCCL "Use system-wide NCCL" OFF
"USE_NCCL" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)
option(USE_NUMA "Use NUMA (only available on Linux)" ON)
@ -120,7 +140,13 @@ option(USE_SYSTEM_EIGEN_INSTALL
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_ZMQ "Use ZMQ" OFF)
option(USE_ZSTD "Use ZSTD" OFF)
option(USE_MKLDNN "Use MKLDNN" OFF)
cmake_dependent_option(
USE_MKLDNN "Use MKLDNN. Only available on x86 and x86_64." ON
"CPU_INTEL" OFF)
set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN})
cmake_dependent_option(
USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF
"USE_MKLDNN" OFF)
option(USE_DISTRIBUTED "Use distributed" ON)
cmake_dependent_option(
USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON
@ -131,22 +157,27 @@ cmake_dependent_option(
cmake_dependent_option(
USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF
"USE_GLOO" OFF)
option(USE_TBB "Use TBB" OFF)
# Used when building Caffe2 through setup.py
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" ON)
# /Z7 override option
# When generating debug symbols, CMake default to use the flag /Zi.
# However, it is not compatible with sccache. So we rewrite it off.
# But some users don't use sccache; this override is for them.
option(MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON)
cmake_dependent_option(
MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON
"MSVC" OFF)
SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")
set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")
# For MSVC,
# 1. Replace /Zi and /ZI with /Z7
# 2. Switch off incremental linking in debug builds
if (MSVC)
string(APPEND CMAKE_C_FLAGS " /EHa")
string(APPEND CMAKE_CXX_FLAGS " /EHa")
if(MSVC_Z7_OVERRIDE)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
@ -163,10 +194,33 @@ if (MSVC)
string(REGEX REPLACE "/INCREMENTAL" "/INCREMENTAL:NO" ${flag_var} "${${flag_var}}")
endif()
endforeach(flag_var)
# Turning off USE_DISTRIBUTED on default
set(USE_DISTRIBUTED OFF)
endif(MSVC)
# Set INTERN_BUILD_MOBILE for all mobile builds. Components that are not
# applicable to mobile are disabled by this variable.
if (ANDROID OR IOS)
set(BUILD_ATEN_MOBILE ON)
set(INTERN_BUILD_MOBILE ON)
endif()
# INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators.
# It's disabled for caffe2 mobile library.
if (INTERN_BUILD_MOBILE AND BUILD_CAFFE2_MOBILE)
set(INTERN_BUILD_ATEN_OPS OFF)
else()
set(INTERN_BUILD_ATEN_OPS ON)
endif()
# BUILD_CAFFE2_MOBILE is the master switch to choose between libcaffe2 v.s. libtorch mobile build.
# When it's enabled it builds original libcaffe2 mobile library without ATen/TH ops nor TorchScript support;
# When it's disabled it builds libtorch mobile library, which contains ATen/TH ops and native support for
# TorchScript model, but doesn't contain not-yet-unified caffe2 ops;
if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)
set(BUILD_PYTHON OFF)
set(BUILD_CAFFE2_OPS OFF)
set(USE_DISTRIBUTED OFF)
set(FEATURE_TORCH_MOBILE ON)
endif()
if (BUILD_ATEN_ONLY)
@ -191,8 +245,12 @@ include(cmake/Utils.cmake)
include(cmake/public/utils.cmake)
# ---[ Version numbers for generated libraries
set(TORCH_DEFAULT_VERSION "1.0.0")
set(TORCH_DEFAULT_VERSION "1.1.0")
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")
if (DEFINED ENV{PYTORCH_BUILD_VERSION})
set(TORCH_BUILD_VERSION "$ENV{PYTORCH_BUILD_VERSION}"
CACHE STRING "Torch build version" FORCE)
endif()
if (NOT TORCH_BUILD_VERSION)
# An empty string was specified so force version to the default
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"
@ -239,6 +297,14 @@ if(USE_FBGEMM)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")
endif()
if(BUILD_NAMEDTENSOR)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBUILD_NAMEDTENSOR")
endif()
if(USE_QNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_QNNPACK")
endif()
# ---[ Whitelist file if whitelist is specified
include(cmake/Whitelist.cmake)
@ -289,6 +355,14 @@ if(NOT MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Qunused-arguments")
if (${COLORIZE_OUTPUT})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fcolor-diagnostics")
endif()
endif()
if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.9)
if (${COLORIZE_OUTPUT})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fdiagnostics-color=always")
endif()
endif()
if ((APPLE AND (NOT ("${CLANG_VERSION_STRING}" VERSION_LESS "9.0")))
OR (CMAKE_COMPILER_IS_GNUCXX
@ -314,7 +388,9 @@ if(NOT MSVC)
else()
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO)
if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
@ -479,6 +555,7 @@ if (BUILD_SHARED_LIBS)
${PROJECT_SOURCE_DIR}/cmake/Modules_CUDA_fix
DESTINATION share/cmake/Caffe2/
COMPONENT dev)
install(EXPORT Caffe2Targets DESTINATION share/cmake/Caffe2
FILE Caffe2Targets.cmake
COMPONENT dev)

View File

@ -243,18 +243,27 @@ only interested in a specific component.
Caffe2 operators.
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.
- `DEBUG=1` will enable debug builds (-g -O0)
- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
- `USE_DISTRIBUTED=0` will disable distributed (c10d, gloo, mpi, etc.) build.
- `USE_MKLDNN=0` will disable using MKL-DNN.
- `USE_CUDA=0` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
- `BUILD_TEST=0` will disable building C++ test binaries.
- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).
- `USE_NNPACK=0` will disable compiling with NNPACK.
- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).
For example:
```bash
NO_CUDA=1 DEBUG=1 python setup.py develop
DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 python setup.py develop
```
Make sure you continue to pass these flags on subsequent builds.
For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build
options passed for the first time will persist; please run `ccmake build/`, run
`cmake-gui build/`, or directly edit `build/CMakeCache.txt` to adapt build
options.
### Code completion and IDE support
@ -349,6 +358,16 @@ ccache -F 0
# deploy (and add to ~/.bashrc for later)
export PATH="/usr/lib/ccache:$PATH"
```
#### Use a faster linker
If you are editing a single file and rebuilding in a tight loop, the time spent
linking will dominate. The system linker available in most Linux distributions
(GNU `ld`) is quite slow. Use a faster linker, like [lld](https://lld.llvm.org/).
The easiest way to use `lld` this is download the
[latest LLVM binaries](http://releases.llvm.org/download.html#8.0.0) and run:
```
ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld
```
## CUDA Development tips
@ -359,6 +378,39 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:
slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.
2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,
`cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).
3. CUDA supports a lot of C++11 features such as, `std::numeric_limits`, `std::nextafter`,
`std::tuple` etc. in device code. Many of such features are possible because of the
[--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)
nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)
that ROCm errors out on device code, which uses such stl functions.
4. A good performance metric for a CUDA kernel is the
[Effective Memory Bandwidth](https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/).
It is useful for you to measure this metric whenever you are writing/optimizing a CUDA
kernel. Following script shows how we can measure the effective bandwidth of CUDA `uniform_`
kernel.
```python
import torch
import time
size = 128*512
nrep = 100
nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
for i in range(10):
a=torch.Tensor(size).cuda().uniform_()
torch.cuda.synchronize()
start = time.time()
# dry run to alloc
out = a.uniform_()
torch.cuda.synchronize()
start = time.time()
for i in range(nrep):
out = a.uniform_()
torch.cuda.synchronize()
end = time.time()
timec = (end-start)/nrep
print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
size *=2
```
Hope this helps, and thanks for considering to contribute.
@ -505,7 +557,8 @@ which is in PyTorch's `requirements.txt`.
### Pre-commit Tidy/Linting Hook
We use clang-tidy and flake8 (installed with flake-mypy) to perform additional
We use clang-tidy and flake8 (installed with flake8-bugbear,
flake8-comprehensions, flake8-mypy, and flake8-pyi) to perform additional
formatting and semantic checking of code. We provide a pre-commit git hook for
performing these checks, before a commit is created:

View File

@ -55,7 +55,7 @@ Elaborating further:
If you use NumPy, then you have used Tensors (a.k.a ndarray).
![Tensor illustration](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/tensor_illustration.png)
![Tensor illustration](./docs/source/_static/img/tensor_illustration.png)
PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the
computation by a huge amount.
@ -151,15 +151,15 @@ They requires JetPack 4.2 and above and are maintained by @dusty-nv
### From Source
If you are installing from source, we highly recommend installing an [Anaconda](https://www.anaconda.com/distribution/#download-section) environment.
You will get a high-quality BLAS library (MKL) and you get a controlled compiler version regardless of your Linux distro.
You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.
Once you have [Anaconda](https://www.anaconda.com/distribution/#download-section) installed, here are the instructions.
If you want to compile with CUDA support, install
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 7.5 or above
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v6.x or above
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 9 or above
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v7 or above
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
If you want to disable CUDA support, export environment variable `USE_CUDA=0`.
Other potentially useful environment variables may be found in `setup.py`.
If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to [are available here](https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/)
@ -169,19 +169,22 @@ If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xa
Common
```
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing
```
On Linux
```bash
# Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda90 # or [magma-cuda80 | magma-cuda92 | magma-cuda100 ] depending on your cuda version
conda install -c pytorch magma-cuda90 # or [magma-cuda92 | magma-cuda100 ] depending on your cuda version
```
#### Get the PyTorch Source
```bash
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
```
#### Install PyTorch
@ -206,31 +209,67 @@ If the version of Visual Studio 2017 is higher than 15.4.5, installing of "VC++
<br/> There is no guarantee of the correct building with VC++ 2017 toolsets, others than version 15.4 v14.11.
<br/> "VC++ 2017 version 15.4 v14.11 toolset" might be installed onto already installed Visual Studio 2017 by running its installation once again and checking the corresponding checkbox under "Individual components"/"Compilers, build tools, and runtimes".
For building against CUDA 8.0 Visual Studio 2015 Update 3 (version 14.0), and the [patch](https://download.microsoft.com/download/8/1/d/81dbe6bb-ed92-411a-bef5-3a75ff972c6a/vc14-kb4020481.exe) are needed to be installed too.
The details of the patch can be found [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).
NVTX is a part of CUDA distributive, where it is called "Nsight Compute". For installing it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.
Be sure that CUDA with Nsight Compute is installed after Visual Studio 2017.
Currently VS 2017, VS 2019 and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise it will use VS 2017.
<br/> If Ninja is selected as the generator, the latest MSVC which is newer than VS 2015 (14.0) will get selected as the underlying toolchain if you have Python > 3.5, otherwise VS 2015 will be selected so you'll have to activate the environment. If you use CMake <= 3.14.2 and has VS 2019 installed, then even if you specify VS 2017 as the generator, VS 2019 will get selected as the generator.
CUDA and MSVC has strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).
| CUDA version | Newest supported VS version |
| ------------ | ------------------------------------------------------- |
| 9.0 / 9.1 | Visual Studio 2017 Update 4 (15.4) (`_MSC_VER` <= 1911) |
| 9.2 | Visual Studio 2017 Update 5 (15.5) (`_MSC_VER` <= 1912) |
| 10.0 | Visual Studio 2017 (15.X) (`_MSC_VER` < 1920) |
| 10.1 | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930) |
```cmd
cmd
REM [Optional] The following two lines are needed for Python 2.7, but the support for it is very experimental.
:: [Optional] Only add the next two lines if you need Python 2.7. If you use Python 3, ignore these two lines.
set MSSdk=1
set FORCE_PY27_BUILD=1
REM [Optional] As for CUDA 8, VS2015 Update 3 is required; use the following line.
set "CUDAHOSTCXX=%VS140COMNTOOLS%..\..\VC\bin\amd64\cl.exe"
:: [Optional] If you want to build with VS 2019 generator, please change the value in the next line to `Visual Studio 16 2019`.
:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.
set CMAKE_GENERATOR=Visual Studio 15 2017
set CMAKE_GENERATOR=Visual Studio 15 2017 Win64
:: Read the content in the previous section carefully before you preceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
:: It's an essential step if you use Python 3.5.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.11
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%
REM Run "Visual Studio 2017 Developer Command Prompt"
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.11
:: [Optional] If you want to override the cuda host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe
python setup.py install
```
##### Adjust Build Options (Optional)
You can adjust the configuration of cmake variables optionally (without building first), by doing
the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done
with such a step.
On Linux
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build # or cmake-gui build
```
On macOS
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build # or cmake-gui build
```
### Docker Image
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build from pytorch repo directory as docker needs to copy git repo into docker filesystem while building the image.

View File

@ -1,14 +1,17 @@
if (BUILD_ATEN_MOBILE)
if (NOT INTERN_BUILD_ATEN_OPS)
return()
endif()
# Find modules
if (NOT INTERN_BUILD_MOBILE)
list(APPEND CMAKE_MODULE_PATH /usr/lib/x86_64-linux-gnu/)
list(APPEND CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/ /usr/lib/aarch64-linux-gnu/)
endif()
list(APPEND CMAKE_MODULE_PATH
/usr/lib/x86_64-linux-gnu/
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/public
${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules_CUDA_fix)
list(APPEND CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/ /usr/lib/aarch64-linux-gnu/)
cmake_policy(SET CMP0012 NEW)
@ -21,6 +24,7 @@ set(ATen_THIRD_PARTY_INCLUDE)
set(ATen_CUDA_SRCS)
set(ATen_CUDA_TEST_SRCS)
set(ATen_CUDA_INCLUDE)
set(ATen_NVRTC_STUB_SRCS)
set(ATen_HIP_SRCS)
set(ATen_HIP_TEST_SRCS)
set(ATen_HIP_INCLUDE)
@ -98,6 +102,7 @@ add_subdirectory(src/ATen)
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} PARENT_SCOPE)
set(ATen_NVRTC_STUB_SRCS ${ATen_NVRTC_STUB_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_HIP_TEST_SRCS ${ATen_HIP_TEST_SRCS} PARENT_SCOPE)

View File

@ -7,19 +7,24 @@
#include <ATen/DeviceGuard.h>
#include <ATen/DimVector.h>
#include <ATen/Dispatch.h>
#include <ATen/DynamicLibrary.h>
#include <ATen/Formatting.h>
#include <ATen/Functions.h>
#ifdef BUILD_NAMEDTENSOR
#include <ATen/NamedTensor.h>
#endif
#include <ATen/ScalarOps.h>
#include <ATen/Tensor.h>
#include <ATen/TensorGeometry.h>
#include <ATen/TensorOperators.h>
#include <ATen/Type.h>
#include <ATen/Version.h>
#include <ATen/core/ATenGeneral.h>
#include <ATen/core/Generator.h>
#include <c10/core/Layout.h>
#include <ATen/core/Scalar.h>
#include <c10/core/Storage.h>
#include <ATen/core/TensorMethods.h>
#include <c10/core/TensorOptions.h>
#include <ATen/core/Reduction.h>
#include <c10/util/Exception.h>
#include <ATen/core/ATenDispatch.h>
#include <ATen/core/UnsafeFromTH.h>

View File

@ -39,6 +39,8 @@ FILE(GLOB base_cpp "*.cpp" "detail/*.cpp" "cpu/*.cpp")
add_subdirectory(core)
FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")
FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")
FILE(GLOB cuda_nvrtc_stub_h "cuda/nvrtc_stub/*.h")
FILE(GLOB cuda_nvrtc_stub_cpp "cuda/nvrtc_stub/*.cpp")
FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")
FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")
FILE(GLOB cudnn_cpp "cudnn/*.cpp")
@ -46,6 +48,8 @@ FILE(GLOB cudnn_cpp "cudnn/*.cpp")
FILE(GLOB hip_h "hip/*.h" "hip/detail/*.h" "hip/*.cuh" "hip/detail/*.cuh")
FILE(GLOB hip_cpp "hip/*.cpp" "hip/detail/*.cpp" "hip/impl/*.cpp")
FILE(GLOB hip_hip "hip/*.hip" "hip/detail/*.hip" "hip/impl/*.hip")
FILE(GLOB hip_nvrtc_stub_h "hip/nvrtc_stub/*.h")
FILE(GLOB hip_nvrtc_stub_cpp "hip/nvrtc_stub/*.cpp")
FILE(GLOB miopen_h "miopen/*.h")
FILE(GLOB miopen_cpp "miopen/*.cpp")
@ -65,6 +69,8 @@ FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")
FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
FILE(GLOB native_quantized_cuda_cu "native/quantized/cuda/*.cu")
FILE(GLOB native_quantized_cuda_cpp "native/quantized/cuda/*.cpp")
FILE(GLOB native_hip_hip "native/hip/*.hip")
FILE(GLOB native_hip_cpp "native/hip/*.cpp")
@ -72,6 +78,8 @@ FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")
FILE(GLOB native_cudnn_hip_cpp "native/cudnn/hip/*.cpp")
FILE(GLOB native_sparse_hip_hip "native/sparse/hip/*.hip")
FILE(GLOB native_sparse_hip_cpp "native/sparse/hip/*.cpp")
FILE(GLOB native_quantized_hip_hip "native/quantized/hip/*.hip")
FILE(GLOB native_quantized_hip_cpp "native/quantized/hip/*.cpp")
add_subdirectory(quantized)
set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_quantized_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${ATen_QUANTIZED_SRCS} ${cpu_kernel_cpp})
@ -88,8 +96,8 @@ endif()
IF(USE_CUDA)
list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})
set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu} ${native_quantized_cuda_cu})
set(all_cuda_cpp ${native_sparse_cuda_cpp} ${native_quantized_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
@ -98,9 +106,9 @@ endif()
IF(USE_ROCM)
list(APPEND ATen_HIP_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/hip)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} ${hip_hip} ${native_hip_hip} ${native_sparse_hip_hip})
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} ${hip_hip} ${native_hip_hip} ${native_sparse_hip_hip} ${native_quantized_hip_hip})
# TODO: Codegen separate files for HIP and use those (s/cuda_generated_cpp/hip_generated_cpp)
set(all_hip_cpp ${native_sparse_hip_cpp} ${hip_cpp} ${native_hip_cpp} ${cuda_generated_cpp} ${ATen_HIP_SRCS})
set(all_hip_cpp ${native_sparse_hip_cpp} ${native_quantized_hip_cpp} ${hip_cpp} ${native_hip_cpp} ${cuda_generated_cpp} ${ATen_HIP_SRCS})
set(all_hip_cpp ${native_miopen_cpp} ${native_cudnn_hip_cpp} ${miopen_cpp} ${all_hip_cpp})
endif()
@ -116,6 +124,12 @@ IF(NOT AT_LINK_STYLE)
SET(AT_LINK_STYLE SHARED)
ENDIF()
IF (USE_TBB)
message("ATen is compiled with TBB (${TBB_ROOT_DIR})")
list(APPEND ATen_CPU_INCLUDE ${TBB_ROOT_DIR}/include)
list(APPEND ATen_CPU_DEPENDENCY_LIBS tbb)
ENDIF()
IF(BLAS_FOUND)
IF ($ENV{TH_BINARY_BUILD})
MESSAGE(STATUS "TH_BINARY_BUILD detected. Enabling special linkage.")
@ -205,7 +219,7 @@ endif(MKLDNN_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS cpuinfo)
if(NOT MSVC AND NOT EMSCRIPTEN)
if(NOT MSVC AND NOT EMSCRIPTEN AND NOT INTERN_BUILD_MOBILE)
# Preserve values for the main build
set(__aten_sleef_build_shared_libs ${BUILD_SHARED_LIBS})
set(__aten_sleef_build_tests ${BUILD_TESTS})
@ -252,12 +266,7 @@ IF(USE_CUDA AND NOT USE_ROCM)
# build fake CuFFT lib in build dir
EXECUTE_PROCESS(COMMAND touch ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc)
if(${CUDA_VERSION_MAJOR} EQUAL "8")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60)
elseif(${CUDA_VERSION_MAJOR} EQUAL "9")
if(${CUDA_VERSION_MAJOR} EQUAL "9")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
@ -355,6 +364,7 @@ endif()
if(USE_CUDA)
set(ATen_CUDA_SRCS ${all_cuda_cpp})
set(ATen_NVRTC_STUB_SRCS ${cuda_nvrtc_stub_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
# passed back to be compiled into the containing library
@ -367,6 +377,9 @@ endif()
if(USE_ROCM)
set(ATen_HIP_SRCS ${all_hip_cpp})
# caffe2_nvrtc's stubs to driver APIs are useful for HIP.
# See NOTE [ ATen NVRTC Stub and HIP ]
set(ATen_NVRTC_STUB_SRCS ${hip_nvrtc_stub_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
# passed back to be compiled into the containing library
@ -438,6 +451,7 @@ endif()
set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_NVRTC_STUB_SRCS ${ATen_NVRTC_STUB_SRCS} PARENT_SCOPE)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} PARENT_SCOPE)
set(ATen_QUANTIZED_SRCS ${ATen_QUANTIZED_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

View File

@ -30,7 +30,7 @@ inline std::pair<int64_t, int64_t> collapse_dims(
T* strides,
int64_t dims,
const int excludeDim = -1) {
AT_CHECK(
TORCH_CHECK(
excludeDim >= -1 && excludeDim < dims,
"expected excluded dim between -1 and dims - 1");
@ -331,69 +331,6 @@ apply_op(int64_t numel, int64_t offset, const Op& op, Args... iters) {
}
}
inline void apply_kernel(){};
// TODO: Deal elegantly with 0-dim tensors. iters.strides_ of 0-dim
// strided_tensor_iter will be of size 0 for dim 0 and iters.strides_[iters.dim_
// - 1] will index at -1. C++14 integer_sequence could be of use here.
template <typename Op, typename... Args>
inline void
apply_kernel(int64_t numel, int64_t offset, const Op& op, Args... iters) {
if (offset > 0)
forward(offset, iters...);
int64_t size = std::min(numel, max_iterate_size(iters...));
op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);
iterate(size, iters...);
iterate_overflow(iters...);
int64_t i = size;
size = std::min(numel, max_iterate_size(iters...));
for (; i < numel;) {
op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);
iterate(size, iters...);
i += size;
iterate_overflow(iters...);
}
}
template <typename scalar1, typename scalar2, typename Op>
inline void
CPU_tensor_parallel_kernel_apply2(Tensor tensor1, Tensor tensor2, const Op op) {
if (!_apply_preamble({tensor1, tensor2}))
return;
if (tensor1.numel() == 1) {
op(1, tensor1.data<scalar1>(), tensor2.data<scalar2>(), 0, 0);
return;
}
if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {
parallel_for(
0,
tensor1.numel(),
1,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_kernel(
end - begin,
begin,
op,
strided_tensor_iter_fixed<scalar1, 8>(tensor1),
strided_tensor_iter_fixed<scalar2, 8>(tensor2));
});
} else {
parallel_for(
0,
tensor1.numel(),
1,
[&tensor1, &tensor2, &op](int64_t begin, int64_t end) {
apply_kernel(
end - begin,
begin,
op,
strided_tensor_iter<scalar1>(tensor1),
strided_tensor_iter<scalar2>(tensor2));
});
}
}
/*
Apply a pointwise operator to sequence of tensors

View File

@ -1,49 +1,184 @@
#include <ATen/CPUGenerator.h>
#define const_generator_cast(generator) \
dynamic_cast<const CPUGenerator&>(generator)
#include <c10/util/C++17.h>
#include <algorithm>
namespace at {
CPUGenerator::CPUGenerator(Context * context_)
: context(context_), generator(THGenerator_new())
{}
namespace detail {
CPUGenerator::~CPUGenerator() {
if (generator)
THGenerator_free(generator);
// Ensures default_gen_cpu is initialized once.
static std::once_flag cpu_gen_init_flag;
// Default, global CPU generator.
static std::shared_ptr<CPUGenerator> default_gen_cpu;
/**
* PyTorch maintains a collection of default generators that get
* initialized once. The purpose of these default generators is to
* maintain a global running state of the pseudo random number generation,
* when a user does not explicitly mention any generator.
* getDefaultCPUGenerator gets the default generator for a particular
* device.
*/
CPUGenerator* getDefaultCPUGenerator() {
std::call_once(cpu_gen_init_flag, [&] {
default_gen_cpu = std::make_shared<CPUGenerator>(getNonDeterministicRandom());
});
return default_gen_cpu.get();
}
CPUGenerator& CPUGenerator::copy(const Generator& from) {
THGenerator_copy(generator, const_generator_cast(from).generator);
return *this;
/**
* Utility to create a CPUGenerator. Returns a shared_ptr
*/
std::shared_ptr<CPUGenerator> createCPUGenerator(uint64_t seed_val) {
return std::make_shared<CPUGenerator>(seed_val);
}
CPUGenerator& CPUGenerator::free() {
THGenerator_free(generator);
return *this;
/**
* Helper function to concatenate two 32 bit unsigned int
* and return them as a 64 bit unsigned int
*/
inline uint64_t make64BitsFrom32Bits(uint32_t hi, uint32_t lo) {
return (static_cast<uint64_t>(hi) << 32) | lo;
}
} // namespace detail
/**
* CPUGenerator class implementation
*/
CPUGenerator::CPUGenerator(uint64_t seed_in)
: Generator{Device(DeviceType::CPU)},
engine_{seed_in},
next_float_normal_sample_{c10::optional<float>()},
next_double_normal_sample_{c10::optional<double>()} { }
/**
* Manually seeds the engine with the seed input
* See Note [Acquire lock when using random generators]
*/
void CPUGenerator::set_current_seed(uint64_t seed) {
next_float_normal_sample_.reset();
next_double_normal_sample_.reset();
engine_ = mt19937(seed);
}
/**
* Gets the current seed of CPUGenerator.
*/
uint64_t CPUGenerator::current_seed() const {
return engine_.seed();
}
/**
* Gets a nondeterministic random number from /dev/urandom or time,
* seeds the CPUGenerator with it and then returns that number.
*
* FIXME: You can move this function to Generator.cpp if the algorithm
* in getNonDeterministicRandom is unified for both CPU and CUDA
*/
uint64_t CPUGenerator::seed() {
return THRandom_seed(generator);
auto random = detail::getNonDeterministicRandom();
this->set_current_seed(random);
return random;
}
uint64_t CPUGenerator::initialSeed() {
return THRandom_initialSeed(generator);
/**
* Gets the DeviceType of CPUGenerator.
* Used for type checking during run time.
*/
DeviceType CPUGenerator::device_type() {
return DeviceType::CPU;
}
CPUGenerator& CPUGenerator::manualSeed(uint64_t seed) {
THRandom_manualSeed(generator, seed);
return *this;
/**
* Gets a random 32 bit unsigned integer from the engine
*
* See Note [Acquire lock when using random generators]
*/
uint32_t CPUGenerator::random() {
return engine_();
}
CPUGenerator& CPUGenerator::manualSeedAll(uint64_t seed) {
// There's only one CPU generator
return manualSeed(seed);
/**
* Gets a random 64 bit unsigned integer from the engine
*
* See Note [Acquire lock when using random generators]
*/
uint64_t CPUGenerator::random64() {
uint32_t random1 = engine_();
uint32_t random2 = engine_();
return detail::make64BitsFrom32Bits(random1, random2);
}
void * CPUGenerator::unsafeGetTH() {
return generator;
/**
* Get the cached normal random in float
*/
c10::optional<float> CPUGenerator::next_float_normal_sample() {
return next_float_normal_sample_;
}
/**
* Get the cached normal random in double
*/
c10::optional<double> CPUGenerator::next_double_normal_sample() {
return next_double_normal_sample_;
}
/**
* Cache normal random in float
*
* See Note [Acquire lock when using random generators]
*/
void CPUGenerator::set_next_float_normal_sample(c10::optional<float> randn) {
next_float_normal_sample_ = randn;
}
/**
* Cache normal random in double
*
* See Note [Acquire lock when using random generators]
*/
void CPUGenerator::set_next_double_normal_sample(c10::optional<double> randn) {
next_double_normal_sample_ = randn;
}
/**
* Get the engine of the CPUGenerator
*/
at::mt19937 CPUGenerator::engine() {
return engine_;
}
/**
* Set the engine of the CPUGenerator
*
* See Note [Acquire lock when using random generators]
*/
void CPUGenerator::set_engine(at::mt19937 engine) {
engine_ = engine;
}
/**
* Public clone method implementation
*
* See Note [Acquire lock when using random generators]
*/
std::shared_ptr<CPUGenerator> CPUGenerator::clone() const {
return std::shared_ptr<CPUGenerator>(this->clone_impl());
}
/**
* Private clone method implementation
*
* See Note [Acquire lock when using random generators]
*/
CPUGenerator* CPUGenerator::clone_impl() const {
auto gen = new CPUGenerator();
gen->set_engine(engine_);
gen->set_next_float_normal_sample(next_float_normal_sample_);
gen->set_next_double_normal_sample(next_double_normal_sample_);
return gen;
}
} // namespace at

View File

@ -0,0 +1,44 @@
#pragma once
#include <ATen/core/Generator.h>
#include <ATen/core/MT19937RNGEngine.h>
#include <ATen/core/PhiloxRNGEngine.h>
#include <c10/util/Optional.h>
namespace at {
struct CAFFE2_API CPUGenerator : public Generator {
// Constructors
CPUGenerator(uint64_t seed_in = default_rng_seed_val);
~CPUGenerator() = default;
// CPUGenerator methods
std::shared_ptr<CPUGenerator> clone() const;
void set_current_seed(uint64_t seed) override;
uint64_t current_seed() const override;
uint64_t seed() override;
static DeviceType device_type();
uint32_t random();
uint64_t random64();
c10::optional<float> next_float_normal_sample();
c10::optional<double> next_double_normal_sample();
void set_next_float_normal_sample(c10::optional<float> randn);
void set_next_double_normal_sample(c10::optional<double> randn);
at::mt19937 engine();
void set_engine(at::mt19937 engine);
private:
CPUGenerator* clone_impl() const override;
at::mt19937 engine_;
c10::optional<float> next_float_normal_sample_;
c10::optional<double> next_double_normal_sample_;
};
namespace detail {
CAFFE2_API CPUGenerator* getDefaultCPUGenerator();
CAFFE2_API std::shared_ptr<CPUGenerator> createCPUGenerator(uint64_t seed_val = default_rng_seed_val);
} // namespace detail
}

View File

@ -1,20 +0,0 @@
#include <ATen/CPUTypeDefault.h>
#include <ATen/Context.h>
#include <ATen/CPUGenerator.h>
namespace at {
Allocator* CPUTypeDefault::allocator() const {
return getCPUAllocator();
}
Device CPUTypeDefault::getDeviceFromPtr(void * data) const {
return DeviceType::CPU;
}
std::unique_ptr<Generator> CPUTypeDefault::generator() const {
return std::unique_ptr<Generator>(new CPUGenerator(&at::globalContext()));
}
} // namespace at

View File

@ -1,14 +0,0 @@
#pragma once
#include <ATen/TypeDefault.h>
namespace at {
struct CAFFE2_API CPUTypeDefault : public TypeDefault {
CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)
: TypeDefault(type_id, is_variable, is_undefined) {}
Allocator* allocator() const override;
Device getDeviceFromPtr(void * data) const override;
std::unique_ptr<Generator> generator() const override;
};
} // namespace at

View File

@ -0,0 +1,37 @@
#pragma once
#include <ATen/core/Generator.h>
namespace at {
struct CAFFE2_API CUDAGenerator : public Generator {
// Constructors
CUDAGenerator(DeviceIndex device_index = -1);
~CUDAGenerator() = default;
// CUDAGenerator methods
std::shared_ptr<CUDAGenerator> clone() const;
void set_current_seed(uint64_t seed) override;
uint64_t current_seed() const override;
uint64_t seed() override;
void set_philox_offset_per_thread(uint64_t offset);
uint64_t philox_offset_per_thread();
std::pair<uint64_t, uint64_t> philox_engine_inputs(uint64_t increment);
static DeviceType device_type();
private:
CUDAGenerator* clone_impl() const override;
uint64_t seed_ = default_rng_seed_val;
uint64_t philox_offset_per_thread_ = 0;
};
namespace cuda {
namespace detail {
CAFFE2_API CUDAGenerator* getDefaultCUDAGenerator(DeviceIndex device_index = -1);
CAFFE2_API std::shared_ptr<CUDAGenerator> createCUDAGenerator(DeviceIndex device_index = -1);
} // namespace detail
} // namespace cuda
} // namespace at

View File

@ -1,18 +0,0 @@
#pragma once
#include <ATen/Utils.h>
#include <ATen/core/Generator.h>
#include <c10/util/Exception.h>
namespace at {
template <typename T>
static inline T * check_generator(Generator * expr, Generator * defaultValue) {
if (!expr)
expr = defaultValue;
if(auto result = dynamic_cast<T*>(expr))
return result;
AT_ERROR("Expected a '", typeid(T).name(), "' but found '", typeid(expr).name(), "'");
}
} // namespace at

View File

@ -10,8 +10,6 @@
#include <string>
#include <stdexcept>
#include <ATen/CPUGenerator.h>
#include <ATen/RegisterCPU.h>
#include <ATen/Tensor.h>
#include <ATen/cpu/FlushDenormal.h>
@ -19,28 +17,9 @@
namespace at {
static inline void errorHandler(const char * msg, void * data) {
throw std::runtime_error(msg);
}
static inline void argErrorHandler(int arg, const char * msg, void * data) {
std::stringstream new_error;
new_error << "invalid argument " << arg << ": " << msg;
throw std::runtime_error(new_error.str());
}
Context::Context()
: next_id(static_cast<size_t>(TypeID::NumOptions))
, thc_state(nullptr, [](THCState* p){ /* no-op */ } )
, thh_state(nullptr, [](THHState* p){ /* no-op */ } )
{
THSetDefaultErrorHandler(errorHandler,nullptr);
THSetDefaultArgErrorHandler(argErrorHandler,nullptr);
generator_registry[static_cast<int>(DeviceType::CPU)]
.reset(new CPUGenerator(this));
register_cpu_types(this);
}
: thc_state(nullptr, [](THCState* p){ /* no-op */ } )
, thh_state(nullptr, [](THHState* p){ /* no-op */ } ) {}
// TODO: This could be bad juju if someone calls globalContext() in the
// destructor of an object with static lifetime.
@ -112,35 +91,6 @@ bool Context::setFlushDenormal(bool on) {
return at::cpu::set_flush_denormal(on);
}
TypeExtendedInterface& getType(TensorOptions options) {
return globalContext().getType(
options.backend(), typeMetaToScalarType(options.dtype()), options.is_variable());
}
// NOTE: We also check `at::NonVariableTypeMode`, and if it's enabled we always
// return non-Variable type in this function.
// See NOTE [ Treating Variables as non-Variables in type dispatch ]
TypeExtendedInterface& getType(const TensorImpl* impl) {
Backend backend = tensorTypeIdToBackend(impl->type_id());
return globalContext().getType(
backend, typeMetaToScalarType(impl->dtype()), impl->is_variable() && !at::NonVariableTypeMode::is_enabled());
}
TypeExtendedInterface& getType(const Tensor& t) {
return getType(t.unsafeGetTensorImpl());
}
LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options) {
return globalContext().getLegacyTHDispatcher(
options.backend(), typeMetaToScalarType(options.dtype()));
}
LegacyTHDispatcher& getLegacyTHDispatcher(const TensorImpl* impl) {
Backend backend = tensorTypeIdToBackend(impl->type_id());
return globalContext().getLegacyTHDispatcher(
backend, typeMetaToScalarType(impl->dtype()));
}
Allocator* getCPUAllocator() {
return getTHDefaultAllocator();
}
@ -156,9 +106,6 @@ struct LegacyDeviceTypeInit : public LegacyDeviceTypeInitInterface {
void initHIP() const override {
globalContext().lazyInitHIP();
}
void initComplex() const override {
globalContext().lazyInitComplex();
}
};
REGISTER_LEGACY_TYPE_INIT(LegacyDeviceTypeInit);

View File

@ -1,18 +1,14 @@
#pragma once
#include <ATen/core/ATenGeneral.h>
#include <ATen/Type.h>
#include <ATen/TypeExtendedInterface.h>
#include <ATen/Tensor.h>
#include <ATen/Utils.h>
#include <ATen/LegacyTHDispatch.h>
#include <ATen/LegacyTHDispatcher.h>
#include <ATen/core/ATenGeneral.h>
#include <ATen/core/Generator.h>
#include <ATen/CPUGenerator.h>
#include <ATen/core/LegacyTypeDispatch.h>
#include <ATen/core/VariableHooksInterface.h>
#include <ATen/detail/CUDAHooksInterface.h>
#include <ATen/detail/HIPHooksInterface.h>
#include <ATen/detail/ComplexHooksInterface.h>
#include <c10/util/Exception.h>
#include <c10/core/impl/DeviceGuardImplInterface.h>
@ -27,43 +23,29 @@ class Tensor;
class CAFFE2_API Context {
public:
Context();
TypeExtendedInterface* getNonVariableTypeRaw(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeRaw(p, s));
}
TypeExtendedInterface * getNonVariableTypeOpt(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeOpt(p, s));
}
TypeExtendedInterface & getNonVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getNonVariableType(p, s));
}
TypeExtendedInterface & getVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getVariableType(p, s));
}
TypeExtendedInterface & getType(Backend p, ScalarType s, bool is_variable) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getType(p, s, is_variable));
}
LegacyTHDispatcher& getLegacyTHDispatcher(Backend p, ScalarType s) {
return globalLegacyTHDispatch().getLegacyTHDispatcher(p, s);
}
// The passed in Type must be delete'able
// TODO: Just make it take a unique_ptr
void registerType(Backend b, Type* t) {
globalLegacyTypeDispatch().registerType(b,
LegacyTypeDispatch::TypeUniquePtr{t, LegacyTypeDeleter([](Type* p) { delete p; }) });
}
void registerLegacyTHDispatcher(Backend b, ScalarType s, LegacyTHDispatcher* t) {
globalLegacyTHDispatch().registerDispatcher(b, s,
LegacyTHDispatch::LegacyTHDispatcherUniquePtr{t, LegacyTHDispatcherDeleter([](LegacyTHDispatcher* p) { delete p; }) });
}
Generator & defaultGenerator(DeviceType device_type) {
Generator & defaultGenerator(Device device) {
DeviceType device_type = device.type();
initCUDAIfNeeded(device_type);
initHIPIfNeeded(device_type);
auto & generator = generator_registry[static_cast<int>(device_type)];
if(!generator)
AT_ERROR(DeviceTypeName(device_type), " backend type not enabled.");
return *generator;
if (device_type == at::kCPU) {
return *at::detail::getDefaultCPUGenerator();
} else if (device_type == at::kCUDA) {
return *at::detail::getCUDAHooks().getDefaultCUDAGenerator(device.index());
} else {
AT_ERROR(DeviceTypeName(device_type), " device type not enabled.");
}
}
Device getDeviceFromPtr(void* data, DeviceType device_type) {
initCUDAIfNeeded(device_type);
initHIPIfNeeded(device_type);
if (device_type == at::kCPU) {
return DeviceType::CPU;
} else if (device_type == at::kCUDA) {
return at::detail::getCUDAHooks().getDeviceFromPtr(data);
} else {
AT_ERROR(DeviceTypeName(device_type), " device type not enabled.");
}
}
bool hasOpenMP() const;
bool hasMKL() const;
@ -86,27 +68,18 @@ class CAFFE2_API Context {
THCState* lazyInitCUDA() {
std::call_once(thc_init,[&] {
thc_state = detail::getCUDAHooks().initCUDA();
generator_registry[static_cast<int>(DeviceType::CUDA)] =
detail::getCUDAHooks().initCUDAGenerator(this);
detail::getCUDAHooks().registerCUDATypes(this);
});
return thc_state.get();
}
THHState* lazyInitHIP() {
std::call_once(thh_init,[&] {
thh_state = detail::getHIPHooks().initHIP();
generator_registry[static_cast<int>(DeviceType::HIP)] =
detail::getHIPHooks().initHIPGenerator(this);
detail::getHIPHooks().registerHIPTypes(this);
});
return thh_state.get();
}
void lazyInitComplex() {
std::call_once(complex_init_, [&] {
detail::getComplexHooks().registerComplexTypes(this);
});
const at::cuda::NVRTC& getNVRTC() {
return detail::getCUDAHooks().nvrtc();
}
THCState* getTHCState() {
// AT_ASSERT(thc_state);
return thc_state.get();
@ -115,9 +88,6 @@ class CAFFE2_API Context {
return thh_state.get();
}
size_t freshTypeID() {
return next_id++;
}
bool setFlushDenormal(bool on);
// NB: This method is *purely* whether or not a user requested
@ -130,8 +100,6 @@ class CAFFE2_API Context {
void setBenchmarkCuDNN(bool);
bool deterministicCuDNN() const;
void setDeterministicCuDNN(bool);
std::unique_ptr<Generator>
generator_registry[static_cast<int>(DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES)];
private:
void initCUDAIfNeeded(DeviceType p) {
if (p == DeviceType::CUDA) {
@ -143,21 +111,13 @@ private:
lazyInitHIP();
}
}
void initComplexIfNeeded(ScalarType s) {
if (isComplexType(s)) {
lazyInitComplex();
}
}
std::once_flag thc_init;
std::once_flag thh_init;
std::once_flag complex_init_;
bool enabled_cudnn = true;
bool deterministic_cudnn = false;
bool benchmark_cudnn = false;
std::atomic<size_t> next_id;
std::unique_ptr<THCState, void(*)(THCState*)> thc_state;
std::unique_ptr<THHState, void(*)(THHState*)> thh_state;
friend struct Type;
};
CAFFE2_API Context& globalContext();
@ -166,14 +126,6 @@ static inline void init() {
globalContext();
}
static inline TypeExtendedInterface& getNonVariableType(Backend p, ScalarType s) {
return globalContext().getNonVariableType(p, s);
}
CAFFE2_API TypeExtendedInterface& getType(TensorOptions options);
CAFFE2_API TypeExtendedInterface& getType(const TensorImpl*);
CAFFE2_API TypeExtendedInterface& getType(const Tensor&);
CAFFE2_API Allocator* getCPUAllocator();
static inline DeprecatedTypeProperties& getNonVariableDeprecatedTypeProperties(Backend p, ScalarType s) {
@ -196,9 +148,6 @@ static inline DeprecatedTypeProperties& HIP(ScalarType s) {
Backend::HIP, s, /*is_variable*/false);
}
CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options);
CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(const Tensor&);
static inline bool hasCUDA() {
return globalContext().hasCUDA();
}
@ -252,11 +201,24 @@ static inline bool hasMKLDNN() {
}
static inline void manual_seed(uint64_t seed) {
globalContext().defaultGenerator(DeviceType::CPU).manualSeed(seed);
auto& gen = globalContext().defaultGenerator(DeviceType::CPU);
{
// See Note [Acquire lock when using random generators]
std::lock_guard<std::mutex> lock(gen.mutex_);
gen.set_current_seed(seed);
}
// NB: Sometimes we build with CUDA, but we don't have any GPUs
// available. In that case, we must not seed CUDA; it will fail!
if (hasCUDA() && detail::getCUDAHooks().getNumGPUs() > 0) {
globalContext().defaultGenerator(DeviceType::CUDA).manualSeedAll(seed);
int num_gpus = detail::getCUDAHooks().getNumGPUs();
if (hasCUDA() && num_gpus > 0) {
for (int i = 0; i < num_gpus; i++) {
auto& cuda_gen = globalContext().defaultGenerator(Device(at::kCUDA, i));
{
// See Note [Acquire lock when using random generators]
std::lock_guard<std::mutex> lock(cuda_gen.mutex_);
cuda_gen.set_current_seed(seed);
}
}
}
}

View File

@ -39,9 +39,18 @@ static DLDataType getDLDataType(const Tensor& t) {
case ScalarType::Bool:
dtype.code = DLDataTypeCode::kDLUInt;
break;
case ScalarType::BFloat16:
throw std::logic_error("BFloat16 is not supported by dlpack");
break;
case ScalarType::QInt8:
throw std::logic_error("QInt8 is not supported by dlpack");
break;
case ScalarType::QUInt8:
throw std::logic_error("QUInt8 is not supported by dlpack");
break;
case ScalarType::QInt32:
throw std::logic_error("QInt32 is not supported by dlpack");
break;
case ScalarType::ComplexHalf:
throw std::logic_error("ComplexHalf is not supported by dlpack");
case ScalarType::ComplexFloat:

View File

@ -5,6 +5,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
device_guard: False
return: argument 0
options:
@ -43,6 +44,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
options:
- arguments:
- THTensor* self
@ -60,6 +62,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
device_guard: False
return: bool
arguments:
@ -68,6 +71,8 @@
]]
[[
name: _th_masked_fill_
cpu_bool: True
cuda_bool: True
cname: maskedFill
variants: function
return: self
@ -84,8 +89,30 @@
- THByteTensor* mask
- THTensor* value
]]
[[
name: _th_masked_fill_bool_
cpu_bool: True
cuda_bool: True
cname: maskedFillBool
variants: function
return: self
options:
- arguments:
- arg: THTensor* self
broadcast: mask inplace fallback types:Bool
- THBoolTensor* mask
- real value
- zero_dim_tensor_only: True
arguments:
- arg: THTensor* self
broadcast: mask inplace fallback types:Bool
- THBoolTensor* mask
- THTensor* value
]]
[[
name: _th_masked_scatter_
cpu_bool: True
cuda_bool: True
cname: maskedCopy
variants: function
return: self
@ -95,9 +122,24 @@
- THByteTensor* mask
- THTensor* source
]]
[[
name: _th_masked_scatter_bool_
cpu_bool: True
cuda_bool: True
cname: maskedCopyBool
variants: function
return: self
arguments:
- arg: THTensor* self
broadcast: mask inplace fallback types:Bool
- THBoolTensor* mask
- THTensor* source
]]
[[
name: _th_masked_select
cname: maskedSelect
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -109,13 +151,31 @@
- THByteTensor* mask
]]
[[
name: _th_nonzero
cname: nonzero
name: _th_masked_select_bool
cname: maskedSelectBool
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
arguments:
- arg: THTensor* result
output: True
- arg: THTensor* self
broadcast: mask fallback types:Bool
- THBoolTensor* mask
]]
[[
name: _th_nonzero
cname: nonzero
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THIndexTensor* result
output: True
@ -130,30 +190,17 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
arguments:
- THTensor* self
]]
[[
name: _th_view
cname: newView
cpu_half: True
cpu_bool: True
cuda_bool: True
variants:
- function
device_guard: False
return: THTensor*
arguments:
- THTensor* self
- arg: IntArrayRefSize size
long_args: True
]]
[[
name: _th_resize_as_
cname: resizeAs
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
variants:
- function
return: self
@ -164,6 +211,8 @@
]]
[[
name: _th_index_select
cpu_bool: True
cuda_bool: True
cname: indexSelect
variants:
- function
@ -179,6 +228,8 @@
[[
name: _th_index_copy_
cname: indexCopy
cpu_bool: True
cuda_bool: True
variants: function
return: argument 0
arguments:
@ -190,6 +241,8 @@
]]
[[
name: _th_take
cpu_bool: True
cuda_bool: True
cname: take
variants:
- function
@ -203,6 +256,8 @@
]]
[[
name: _th_put_
cpu_bool: True
cuda_bool: True
cname: put
variants: function
backends:
@ -230,6 +285,8 @@
]]
[[
name: _th_index_fill_
cpu_bool: True
cuda_bool: True
cname: indexFill
variants: function
return: argument 0
@ -256,6 +313,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
device_guard: False
return: argument 0
arguments:
@ -270,6 +328,8 @@
[[
name: _th_scatter_
return: argument 0
cpu_bool: True
cuda_bool: True
variants: function
options:
- cname: scatter
@ -291,6 +351,8 @@
name: _th_scatter_add_
return: argument 0
cname: scatterAdd
cpu_bool: True
cuda_bool: True
variants: function
arguments:
- THTensor* self
@ -302,6 +364,8 @@
[[
name: _th_gather
cname: gather
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -316,8 +380,9 @@
]]
[[
name: _th_equal
cpu_bool: True
cname: equal
cpu_bool: True
cuda_bool: True
variants:
- function
return: bool
@ -327,6 +392,8 @@
]]
[[
name: _th_and
cpu_bool: True
cuda_bool: True
cname: __and__
variants:
- function
@ -349,6 +416,8 @@
[[
name: _th_iand_
cname: __iand__
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -368,6 +437,8 @@
[[
name: _th_or
cname: __or__
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -389,6 +460,8 @@
[[
name: _th_ior_
cname: __ior__
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -408,6 +481,8 @@
[[
name: _th_xor
cname: __xor__
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -429,6 +504,8 @@
[[
name: _th_ixor_
cname: __ixor__
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -535,11 +612,33 @@
options:
- cname: ltValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: ltTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_lt_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: ltValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: ltTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -576,11 +675,33 @@
options:
- cname: gtValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: gtTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_gt_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: gtValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: gtTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -617,11 +738,33 @@
options:
- cname: leValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: leTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_le_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: leValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: leTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -658,11 +801,33 @@
options:
- cname: geValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: geTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_ge_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: geValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: geTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -699,11 +864,33 @@
options:
- cname: eqValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: eqTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_eq_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: eqValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: eqTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -740,11 +927,33 @@
options:
- cname: neValue
arguments:
- arg: THByteTensor* result
- arg: THBoolTensor* result
output: True
- THTensor* self
- real other
- cname: neTensor
arguments:
- arg: THBoolTensor* result
output: True
- arg: THTensor* self
broadcast: other fallback
- THTensor* other
]]
[[
name: _th_ne_byte
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
options:
- cname: neValueByte
arguments:
- arg: THByteTensor* result
output: True
- THTensor* self
- real other
- cname: neTensorByte
arguments:
- arg: THByteTensor* result
output: True
@ -773,6 +982,8 @@
]]
[[
name: _th_min
cpu_bool: True
cuda_bool: True
variants:
- function
options:
@ -791,6 +1002,8 @@
]]
[[
name: _th_min
cpu_bool: True
cuda_bool: True
variants: function
options:
- cname: min
@ -809,6 +1022,8 @@
]]
[[
name: _th_max
cpu_bool: True
cuda_bool: True
variants:
- function
options:
@ -827,6 +1042,8 @@
]]
[[
name: _th_max
cpu_bool: True
cuda_bool: True
variants: function
options:
- cname: max
@ -1136,7 +1353,6 @@
types:
- floating_point
backends:
- CPU
- CUDA
variants: function
return: argument 0
@ -1179,7 +1395,6 @@
types:
- floating_point
backends:
- CPU
- CUDA
variants: function
return: argument 0
@ -1683,6 +1898,7 @@
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -1701,6 +1917,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
variants:
- function
arguments:
@ -1709,6 +1926,8 @@
[[
name: _th_cumsum
cname: cumsum
cpu_bool: True
cuda_bool: True
variants: function
return: argument 0
arguments:
@ -1721,6 +1940,8 @@
[[
name: _th_cumprod
cname: cumprod
cpu_bool: True
cuda_bool: True
variants: function
return: argument 0
arguments:
@ -1733,6 +1954,8 @@
[[
name: _th_sign
cname: sign
cpu_bool: True
cuda_bool: True
variants:
- function
return: argument 0
@ -1885,6 +2108,7 @@
backends:
- CUDA
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -1897,6 +2121,7 @@
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -1916,6 +2141,7 @@
variants:
- function
return: argument 0
scalar_check: false
options:
- arguments:
- arg: THTensor* result
@ -1990,6 +2216,7 @@
cname: addr
variants: function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2026,7 +2253,7 @@
cname: addr
variants: function
return: argument 0
scalar_check: False
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2043,6 +2270,7 @@
cname: addmv
variants: function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2060,6 +2288,7 @@
return: argument 0
options:
- cname: addmm
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2079,6 +2308,7 @@
backends:
- CUDA
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2096,6 +2326,7 @@
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2135,6 +2366,7 @@
backends:
- CUDA
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2181,17 +2413,6 @@
kwarg_only: True
- THTensor* tensor1
- THTensor* tensor2
- cname: spaddcmul
variants: function
return: argument 0
arguments:
- THTensor* self
- THTensor* self
- arg: real value
default: AS_REAL(1)
kwarg_only: True
- THSTensor* tensor1
- THSTensor* tensor2
]]
[[
name: _th_addcdiv
@ -2245,33 +2466,6 @@
- THTensor* self
- THTensor* A
]]
[[
name: _th_symeig
cname: syev
types:
- Float
- Double
backends:
- CPU
- CUDA
variants:
- function
return: argument 0,1
arguments:
- arg: THTensor* res1
output: True
- arg: THTensor* res2
output: True
- THTensor* self
- arg: bool eigenvectors
if_true: V
if_false: N
default: N
- arg: bool upper
if_true: U
if_false: L
default: U
]]
[[
name: _th_eig
cname: geev
@ -2295,51 +2489,6 @@
if_false: N
default: N
]]
[[
name: _th_svd
cname: gesdd
types:
- Float
- Double
backends:
- CPU
- CUDA
variants:
- function
return: argument 0,1,2
arguments:
- arg: THTensor* res1
output: True
- arg: THTensor* res2
output: True
- arg: THTensor* res3
output: True
- THTensor* self
- arg: bool some
if_true: S
if_false: A
default: S
- arg: bool compute_uv
if_true: S
if_false: N
default: S
]]
[[
name: _th_getri_single
cname: getri
types:
- Float
- Double
backends:
- CPU
- CUDA
variants: function
return: argument 0
arguments:
- arg: THTensor* output
output: True
- THTensor* self
]]
[[
name: _th_potri
cname: potri
@ -2352,6 +2501,7 @@
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* output
output: True
@ -2361,52 +2511,6 @@
if_false: L
default: U
]]
[[
name: _th_pstrf
cname: pstrf
types:
- Float
- Double
backends:
- CPU
variants:
- function
return: argument 0,1
arguments:
- arg: THTensor* res1
output: True
- arg: THIntegerTensor* res2
output: True
- THTensor* self
- arg: bool upper
if_true: U
if_false: L
default: U
- arg: real tol
default: -1
aten_custom_call: |
${THTensor}_pstrf(res1_, res2_, self_, (upper) ? "U" : "L", tol_);
res2 -= 1; // LAPACK returns 1-indexed pivots
]]
[[
name: _th_qr
cname: qr
types:
- Float
- Double
backends:
- CPU
- CUDA
variants:
- function
return: argument 0,1
arguments:
- arg: THTensor* res1
output: True
- arg: THTensor* res2
output: True
- THTensor* self
]]
[[
name: _th_geqrf
cname: geqrf
@ -2419,6 +2523,7 @@
variants:
- function
return: argument 0,1
scalar_check: false
arguments:
- arg: THTensor* res1
output: True
@ -2437,6 +2542,7 @@
variants:
- function
return: argument 0
scalar_check: false
arguments:
- arg: THTensor* result
output: True
@ -2469,33 +2575,13 @@
if_false: N
default: N
]]
[[
name: _th_btrisolve
cname: btrisolve
types:
- floating_point
backends:
- CPU
- CUDA
variants:
- function
return: argument 0
arguments:
- arg: THTensor* result
output: True
- THTensor* self
- THTensor* LU_data
- THIntegerTensor* LU_pivots
]]
[[
name: _th_random_
backends:
- CPU
- CUDA
return: self
variants: function
cpu_bool: True
cuda_bool: True
options:
- cname: random
arguments:
@ -2586,7 +2672,6 @@
- floating_point
backends:
- CPU
- CUDA
cname: uniform
variants: function
return: self
@ -2607,7 +2692,6 @@
- floating_point
backends:
- CPU
- CUDA
return: argument 0
variants:
- function
@ -2647,7 +2731,6 @@
- floating_point
backends:
- CPU
- CUDA
cname: normal
variants: function
return: self
@ -2667,7 +2750,6 @@
- floating_point
backends:
- CPU
- CUDA
cname: cauchy
variants: function
return: self
@ -2689,7 +2771,6 @@
- floating_point
backends:
- CPU
- CUDA
return: self
arguments:
- THTensor* self
@ -2707,7 +2788,6 @@
- floating_point
backends:
- CPU
- CUDA
cname: exponential
variants: function
return: self
@ -2723,7 +2803,6 @@
name: _th_geometric_
backends:
- CPU
- CUDA
cname: geometric
variants: function
return: self
@ -2734,41 +2813,7 @@
kwarg_only: True
- double p
]]
[[
name: _th_dirichlet_grad
types:
- floating_point
backends:
- CPU
return: argument 0
variants:
- function
options:
- cname: dirichlet_grad
arguments:
- arg: THTensor* output
output: True
- THTensor* x
- THTensor* alpha
- THTensor* total
]]
# In theory, this could be a part of the above declaration. But in
# practice this leads to all sorts of problems with ambiguous overloads.
# So we add it here with a separate name.
[[
name: _th_alias
return: THTensor*
cpu_half: True
cpu_bool: True
cuda_bool: True
variants:
- function
options:
- cname: newWithTensor
arguments:
- THTensor* self
]]
[[
name: _th_copy_ignoring_overlaps_
cname: copyIgnoringOverlaps
@ -2788,6 +2833,7 @@
cpu_half: True
cpu_bool: True
cuda_bool: True
cpu_bfloat16: True
return: self
arguments:
- arg: THTensor* self

View File

@ -14,7 +14,7 @@ namespace at {
// OptionalDeviceGuard guard(device_of(tensor));
/// Return the Device of a Tensor, if the Tensor is defined.
inline optional<Device> device_of(Tensor t) {
inline optional<Device> device_of(const Tensor& t) {
if (t.defined()) {
return make_optional(t.device());
} else {

104
aten/src/ATen/Dimname.cpp Normal file
View File

@ -0,0 +1,104 @@
#ifdef BUILD_NAMEDTENSOR
#include <ATen/Dimname.h>
#include <c10/util/Exception.h>
namespace at {
std::ostream& operator<<(std::ostream& out, const Dimname& dimname) {
if (dimname.type() == NameType::WILDCARD) {
out << "None";
} else {
out << "'" << dimname.full_name().toUnqualString() << "'";
}
return out;
}
bool is_valid_identifier(const std::string& name) {
std::locale loc;
if (name.length() == 0) {
return false;
}
for (auto it = name.begin(); it != name.end(); ++it) {
if (std::isalpha(*it, loc) || *it == '_') {
continue;
}
return false;
}
return true;
}
bool Dimname::can_refer_to(const Dimname& other) const {
switch (type()) {
case NameType::WILDCARD:
return false;
// "C" can be used to refer to "C" or "C.in".
case NameType::NORMAL:
return untagged_name() == other.untagged_name();
default:
return full_name() == other.full_name();
}
}
static void check_valid_identifier(const std::string& name) {
TORCH_CHECK(
is_valid_identifier(name),
"Invalid name: a valid identifier must contain alphabetical characters and/or underscore, got: '",
name, "'.");
}
Dimname Dimname::fromSymbol(Symbol full_name) {
TORCH_INTERNAL_ASSERT(full_name.is_dimname());
if (full_name == kWildcard) {
return Dimname::wildcard();
}
const std::string delimiter = ".";
const std::string str(full_name.toUnqualString());
auto it = str.find(delimiter);
// Check for normal name
if (it == std::string::npos) {
check_valid_identifier(str);
return Dimname(full_name);
}
// Check for tagged name
auto second_dot = str.find(delimiter, it + 1);
TORCH_CHECK(
second_dot == std::string::npos,
"Invalid name '", str, "': A tagged name can only contain one '.'");
auto untagged_name = str.substr(0, it);
auto tag = str.substr(it + 1);
check_valid_identifier(untagged_name);
check_valid_identifier(tag);
return Dimname(NameType::TAGGED, full_name, Symbol::dimname(untagged_name));
}
Dimname Dimname::wildcard() {
static Dimname result(NameType::WILDCARD, kWildcard, kWildcard);
return result;
}
optional<Dimname> unify(Dimname dimname, Dimname other) {
if (other.type() == NameType::WILDCARD) {
return dimname;
}
if (dimname.type() == NameType::WILDCARD) {
return other;
}
if (dimname.full_name() == other.full_name()) {
return dimname;
}
if (dimname.untagged_name() == other.untagged_name()) {
return Dimname::fromSymbol(dimname.untagged_name());
}
return c10::nullopt;
}
bool match(Dimname dimname, Dimname other) {
return unify(dimname, other).has_value();
}
} // namespace at
#endif

56
aten/src/ATen/Dimname.h Normal file
View File

@ -0,0 +1,56 @@
#pragma once
#ifdef BUILD_NAMEDTENSOR
#include <ATen/core/interned_strings.h>
#include <c10/util/ArrayRef.h>
#include <c10/util/Optional.h>
#include <iostream>
namespace at {
enum class NameType: uint8_t { NORMAL, WILDCARD, TAGGED };
struct CAFFE2_API Dimname {
static Dimname fromSymbol(Symbol name);
static Dimname wildcard();
NameType type() const { return type_; }
Symbol full_name() const { return full_name_; }
Symbol untagged_name() const { return untagged_name_; }
bool can_refer_to(const Dimname& other) const;
bool is_normal() const { return type_ == NameType::NORMAL; }
bool is_wildcard() const { return type_ == NameType::WILDCARD; }
bool is_tagged() const { return type_ == NameType::TAGGED; }
private:
Dimname(Symbol name)
: untagged_name_(name), full_name_(name), type_(NameType::NORMAL) {}
Dimname(NameType type, Symbol full_name, Symbol untagged_name)
: untagged_name_(untagged_name), full_name_(full_name), type_(type) {}
// [Dimname Terminology]
//
// For "C.in":
// - "C.in" is the "full name"
// - "C" is the "untagged name"
// - "in" is the "tag"
Symbol untagged_name_;
Symbol full_name_;
NameType type_;
// Will need more fields for other special name types.
};
using DimnameList = c10::ArrayRef<Dimname>;
static Symbol kWildcard = Symbol::dimname("*");
bool CAFFE2_API is_valid_identifier(const std::string& name);
CAFFE2_API c10::optional<Dimname> unify(Dimname dimname, Dimname other);
CAFFE2_API bool match(Dimname dimname, Dimname other);
CAFFE2_API std::ostream& operator<<(std::ostream& out, const Dimname& dimname);
} // namespace at
#endif

View File

@ -1,6 +1,7 @@
#pragma once
#include <ATen/Type.h>
#include <ATen/core/Tensor.h>
#include <c10/macros/Macros.h>
#include <c10/util/Half.h>
#include <c10/util/Exception.h>
#include <ATen/core/DeprecatedTypeProperties.h>
@ -11,6 +12,14 @@
return __VA_ARGS__(); \
}
#define AT_QINT_PRIVATE_CASE_TYPE(enum_type, type, underlying_enum, underlying_type, ...) \
case enum_type: { \
const auto& UNDERLYING_TYPE C10_UNUSED = underlying_enum; \
using scalar_t C10_UNUSED = type; \
using underlying_t C10_UNUSED = underlying_type; \
return __VA_ARGS__(); \
}
namespace detail {
template <at::ScalarType N>
@ -27,6 +36,17 @@ struct ScalarTypeToCType<at::ScalarType::Half> {
static at::Half t;
};
template<>
struct ScalarTypeToCType<at::ScalarType::BFloat16> {
using type = at::BFloat16;
// This is a workaround for the CUDA bug which prevents ::detail::ScalarTypeToCType<T>::type being used directly
// due to ambiguous reference which can't to be resolved. For some reason it cant pick between at::detail and at::cuda::detail.
// For repro example, please see: https://gist.github.com/izdeby/952ae7cf256ddb740a73776d39a7e7ba
// TODO: remove once the bug is fixed.
static at::BFloat16 t;
};
template<>
struct ScalarTypeToCType<at::ScalarType::Bool> {
using type = bool;
@ -38,6 +58,17 @@ struct ScalarTypeToCType<at::ScalarType::Bool> {
static bool t;
};
template<>
struct ScalarTypeToCType<at::ScalarType::Long> {
using type = int64_t;
// This is a workaround for the CUDA bug which prevents ::detail::ScalarTypeToCType<T>::type being used directly
// due to ambiguous reference which can't to be resolved. For some reason it cant pick between at::detail and at::cuda::detail.
// For repro example, please see: https://gist.github.com/izdeby/952ae7cf256ddb740a73776d39a7e7ba
// TODO: remove once the bug is fixed.
static int64_t t;
};
inline at::ScalarType scalar_type(at::ScalarType s) {
return s;
}
@ -59,6 +90,54 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
}
// The AT_DISPATCH_* family of macros provides the ability to
// conveniently generate specializations of a kernel over all of the
// dtypes we care about in PyTorch. We call it "dispatch" because
// we are "dispatching" to the correct, dtype-specific kernel.
//
// A standard usage looks like:
//
// AT_DISPATCH_ALL_TYPES(self.scalar_type(), "op_name", [&] {
// // Your code here, with 'scalar_t' now defined to
// // be the dtype in question
// })
//
// There are many variations of this macro, so it's important to
// understand exactly /which/ dtypes you want to get instantiated, as
// well as what the "default" set is.
//
// The default set of dtypes that are instantiated (e.g., by
// AT_DISPATCH_ALL_TYPES) are floating point types (float, double),
// and integral types (int32_t, int64_t, int16_t, int8_t, uint8_t),
// but NOT booleans (bool), half-precision floats (Half) or
// complex number (std::complex<float>, std::complex<double>).
// This "cut" is somewhat historical (the default types are the
// ones that TH historically supported), but it also reflects the
// fact that the non-default types are "poorly" behaved (booleans
// are NOT integers mod 2, half precision operations ~essentially
// don't exist on CPU, complex numbers are an experimental application).
//
// Here are the questions you should generally ask to decide which
// dispatch you want:
//
// 1. Is this an integral or floating point specific operation?
// (If so, you'll want one of the FLOATING or INTEGRAL macros.)
//
// 2. Should half be supported? (If you're on CPU, the answer is almost
// definitely no. If you do want support, use one of the AND_HALF
// macros)
//
// Much rarer situations:
//
// 3. Should bool be supported? (You often have to write your kernel
// differently if arithmetic operations are involved.) If so,
// Use AT_DISPATCH_ALL_TYPES_AND along with ScalarType::Bool
//
// 4. Should complex be supported? The answer is almost always no,
// unless you are working on "generic" code that should work on
// all dtypes.
// NB: the the_type variable is not used, but we have kept it for
// backwards compatibility. It's probably not used by anyone though;
// but we're just being safe (and it doesn't hurt.) Note we must
@ -67,8 +146,8 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
@ -80,8 +159,8 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
@ -91,11 +170,25 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
} \
}()
#define AT_DISPATCH_FLOATING_TYPES_AND(SCALARTYPE, TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE, decltype(::detail::ScalarTypeToCType<SCALARTYPE>::t), __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(TYPE), "'"); \
} \
}()
#define AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
@ -114,8 +207,8 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
@ -127,31 +220,11 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
detail::deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF(); \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(_st), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
@ -168,8 +241,8 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
@ -180,11 +253,26 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
} \
}()
#define AT_DISPATCH_QINT_TYPES(TYPE, NAME, ...) \
[&] { \
const auto& SCALAR_TYPE C10_UNUSED = TYPE; \
switch (TYPE) { \
AT_QINT_PRIVATE_CASE_TYPE( \
at::kQInt8, at::qint8, at::kChar, int8_t, __VA_ARGS__) \
AT_QINT_PRIVATE_CASE_TYPE( \
at::kQUInt8, at::quint8, at::kByte, uint8_t, __VA_ARGS__) \
AT_QINT_PRIVATE_CASE_TYPE( \
at::kQInt32, at::qint32, at::kInt, int, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(TYPE), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
/* don't use TYPE again in case it is an expensive or side-effect op*/ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
@ -202,30 +290,6 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
detail::deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() \
const auto& the_type = TYPE; \
(void)the_type; \
at::ScalarType _st = ::detail::scalar_type(TYPE); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(_st), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND(SCALARTYPE, TYPE, NAME, ...) \
[&] { \
switch (TYPE) { \
@ -259,7 +323,7 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(SCALARTYPE1, SCALARTYPE2, TYPE, NAME, ...) \
#define AT_DISPATCH_ALL_TYPES_AND3(SCALARTYPE1, SCALARTYPE2, SCALARTYPE3, TYPE, NAME, ...) \
[&] { \
switch (TYPE) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
@ -271,6 +335,25 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE1, decltype(::detail::ScalarTypeToCType<SCALARTYPE1>::t), __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE2, decltype(::detail::ScalarTypeToCType<SCALARTYPE2>::t), __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE3, decltype(::detail::ScalarTypeToCType<SCALARTYPE3>::t), __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(TYPE), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(SCALARTYPE1, SCALARTYPE2, SCALARTYPE3, TYPE, NAME, ...) \
[&] { \
switch (TYPE) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE1, decltype(::detail::ScalarTypeToCType<SCALARTYPE1>::t), __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE2, decltype(::detail::ScalarTypeToCType<SCALARTYPE2>::t), __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(SCALARTYPE3, decltype(::detail::ScalarTypeToCType<SCALARTYPE3>::t), __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
@ -279,3 +362,51 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {}
AT_ERROR(#NAME, " not implemented for '", TYPE, "'"); \
} \
}()
// ----------------------------------------------------------------------------
// DEPRECATED MACROS, DON'T USE THESE
// ----------------------------------------------------------------------------
#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
detail::deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF(); \
const auto& the_type = TYPE; \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(_st), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
detail::deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() \
const auto& the_type = TYPE; \
/* don't use TYPE again in case it is an expensive or side-effect op */ \
at::ScalarType _st = ::detail::scalar_type(the_type); \
switch (_st) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", toString(_st), "'"); \
} \
}()

View File

@ -0,0 +1,74 @@
#include <c10/util/Exception.h>
#include <ATen/DynamicLibrary.h>
#include <ATen/Utils.h>
#ifndef _WIN32
#include <dlfcn.h>
#include <libgen.h>
#else
#include <Windows.h>
#endif
namespace at {
#ifndef _WIN32
// Unix
static void* checkDL(void* x) {
if (!x) {
AT_ERROR("Error in dlopen or dlsym: ", dlerror());
}
return x;
}
DynamicLibrary::DynamicLibrary(const char* name) {
// NOLINTNEXTLINE(hicpp-signed-bitwise)
handle = checkDL(dlopen(name, RTLD_LOCAL | RTLD_NOW));
}
void* DynamicLibrary::sym(const char* name) {
AT_ASSERT(handle);
return checkDL(dlsym(handle, name));
}
DynamicLibrary::~DynamicLibrary() {
if (!handle)
return;
dlclose(handle);
}
#else
// Windows
DynamicLibrary::DynamicLibrary(const char* name) {
// NOLINTNEXTLINE(hicpp-signed-bitwise)
HMODULE theModule = LoadLibraryA(name);
if (theModule) {
handle = theModule;
} else {
AT_ERROR("error in LoadLibraryA");
}
}
void* DynamicLibrary::sym(const char* name) {
AT_ASSERT(handle);
FARPROC procAddress = GetProcAddress((HMODULE)handle, name);
if (!procAddress) {
AT_ERROR("error in GetProcAddress");
}
return (void*)procAddress;
}
DynamicLibrary::~DynamicLibrary() {
if (!handle) {
return;
}
FreeLibrary((HMODULE)handle);
}
#endif
} // namespace at

View File

@ -0,0 +1,21 @@
#pragma once
#include <c10/macros/Export.h>
#include <ATen/Utils.h>
namespace at {
struct DynamicLibrary {
AT_DISALLOW_COPY_AND_ASSIGN(DynamicLibrary);
CAFFE2_API DynamicLibrary(const char* name);
CAFFE2_API void* sym(const char* name);
CAFFE2_API ~DynamicLibrary();
private:
void* handle = nullptr;
};
} // namespace at

View File

@ -16,7 +16,7 @@ std::vector<int64_t> infer_size(IntArrayRef a, IntArrayRef b) {
int64_t sizeA = (dimA >= 0) ? a[dimA] : 1;
int64_t sizeB = (dimB >= 0) ? b[dimB] : 1;
AT_CHECK(
TORCH_CHECK(
sizeA == sizeB || sizeA == 1 || sizeB == 1,
"The size of tensor a (", sizeA,
") must match the size of tensor b (", sizeB,
@ -53,7 +53,7 @@ std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
: expandedSizes[i + 1] * expandedStrides[i + 1];
int64_t targetSize = sizes[i];
if (targetSize == -1) {
AT_CHECK(
TORCH_CHECK(
dim >= 0,
"The expanded size of the tensor (",
targetSize,
@ -62,7 +62,7 @@ std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
targetSize = size;
}
if (size != targetSize) {
AT_CHECK(
TORCH_CHECK(
size == 1,
"The expanded size of the tensor (",
targetSize,

View File

@ -36,7 +36,7 @@ static std::vector<int64_t> infer_size(IntArrayRef shape, int64_t numel) {
// works yet
// empty_tensor.view(-1, 0)
// doesn't.
AT_CHECK(newsize != 0, "cannot reshape tensor of 0 elements into shape ",
TORCH_CHECK(newsize != 0, "cannot reshape tensor of 0 elements into shape ",
shape, " because the unspecified dimension size -1 can be any "
"value and is ambiguous");
res[*infer_dim] = numel / newsize;

View File

@ -1,12 +0,0 @@
#include <ATen/LegacyTHDispatch.h>
namespace at {
// TODO: This could be bad juju if someone calls globalContext() in the
// destructor of an object with static lifetime.
LegacyTHDispatch & globalLegacyTHDispatch() {
static LegacyTHDispatch singleton;
return singleton;
}
}

View File

@ -1,127 +0,0 @@
#pragma once
// LegacyTHDispatcher is the legacy mechanism for dispatching directly
// to TH/THNN/THC/THCUNN functions in ATen, which is essentially a giant virtual
// dispatch table for every TH function we support dynamically dispatching over.
//
// NB: We do not actually dispatch to *operators* here, the usual pattern is for
// ATen operators to call this mechanism for their implementation, but the
// operator itself is declared separately (e.g. as a native function "wrapper").
//
// Q: Why don't we just use LegacyTypeDispatch here?
// A: Mainly separation of concerns:
// 1) Type is for implementation of operators, which requires codegen of
// Variables, JIT, etc. That is handled by the native function "wrappers";
// just calling into TH does not require that.
// 2) Type does not require scalar-specific dispatch, whereas calling into TH
// does. Thus, this separation allows us to evolve operator dispatch
// separately (i.e. to use the C10 dispatcher) from details of how to
// call TH functionality.
//
// The implmentation here is very similar to the LegacyTypeDispatch design, with
// the following simplications:
// 1) This is not required for a mobile build, so does not have to live in /core.
// 2) Because these only contain function implementations, we do not have to
// handle the Variable/Tensor split; that is handled at the native function
// "wrapper" level.
// 3) Because an operator must have been previously dispatched via the Type
// mechanism, we do need to handle device initialization. This means it is
// WRONG to call directly into these functions without first going through
// Type dispatch (i.e. the usual operator -> Type -> LegacyTHDispatch pattern).
// 4) Because an operator must have been previously dispatched via the Type
// mechanism, we do not need to handle undefined Tensors.
//
// NB: We don't use Registry for this, because we don't want to
// pay for a hash table lookup every time we do an operation.
//
// NB: we can delete this when we don't call into any TH implementations.
#include <c10/core/Backend.h>
#include <c10/core/ScalarType.h>
#include <ATen/core/LegacyDeviceTypeInit.h>
#include <ATen/LegacyTHDispatcher.h>
namespace at {
struct Type;
struct CAFFE2_API LegacyTHDispatcherDeleter {
using LegacyTHDispatcherDeleterFun = void(LegacyTHDispatcher*);
LegacyTHDispatcherDeleterFun *fn_ = nullptr;
LegacyTHDispatcherDeleter() {}
/* implicit */ LegacyTHDispatcherDeleter(LegacyTHDispatcherDeleterFun *fn) : fn_(fn) {}
void operator()(LegacyTHDispatcher * ptr) {
if (fn_) {
(*fn_)(ptr);
}
}
};
class CAFFE2_API LegacyTHDispatch {
public:
using LegacyTHDispatcherUniquePtr = std::unique_ptr<LegacyTHDispatcher, LegacyTHDispatcherDeleter>;
// WARNING: This function has the precondition that you have
// initialized the type you want to call. This initialization
// step is generally done by Context, or assumed because you
// have a Tensor and thus the Type of that Tensor must already
// be initialized.
void registerDispatcher(Backend b, ScalarType s, LegacyTHDispatcherUniquePtr&& t) {
dispatcher_registry[static_cast<int>(b)][static_cast<int>(s)] = std::move(t);
}
LegacyTHDispatcher & getLegacyTHDispatcher(Backend p, ScalarType s) {
auto* dispatcher = getLegacyTHDispatcherOpt(p, s);
if (!dispatcher) AT_ERROR(toString(p), toString(s), "THDispatcher is not enabled.");
return *dispatcher;
}
private:
LegacyTHDispatcher* getLegacyTHDispatcherRaw(Backend p, ScalarType s) {
return dispatcher_registry[static_cast<int>(p)][static_cast<int>(s)].get();
}
LegacyTHDispatcher* getLegacyTHDispatcherOpt(Backend p, ScalarType s) {
if (p != Backend::Undefined) {
initForDeviceType(backendToDeviceType(p));
// NB: there is no Complex for TH, so no initialization to be done.
}
auto dispatcher = getLegacyTHDispatcherRaw(p, s);
if(!dispatcher) {
if (p == Backend::Undefined || s == ScalarType::Undefined) {
AT_ERROR("Requested Undefined THDispatcher which is invalid. Backend:",
toString(p), "ScalarType: ", toString(s));
}
}
return dispatcher;
}
void initForDeviceType(DeviceType p) {
static std::once_flag cpu_once;
static std::once_flag cuda_once;
if (p == DeviceType::CPU) {
std::call_once(cpu_once, [] {
getLegacyDeviceTypeInit().initCPU();
});
} else if (p == DeviceType::CUDA) {
std::call_once(cuda_once, [] {
getLegacyDeviceTypeInit().initCUDA();
});
} else if (p == DeviceType::HIP) {
std::call_once(cuda_once, [] {
getLegacyDeviceTypeInit().initHIP();
});
}
}
// NB: dispatcher_registry has nullptr for all CUDA backends until
// CUDA initialization has occurred
LegacyTHDispatcherUniquePtr dispatcher_registry
[static_cast<int>(Backend::NumOptions)]
[static_cast<int>(ScalarType::NumOptions)];
};
CAFFE2_API LegacyTHDispatch& globalLegacyTHDispatch();
} // namespace at

View File

@ -40,7 +40,7 @@ namespace at {
/// Construct an MatrixRef from an ArrayRef and outer stride.
/*implicit*/ MatrixRef(ArrayRef<T> arr, size_type stride0)
: arr(arr), stride0(stride0) {
AT_CHECK(arr.size() % stride0 == 0, "MatrixRef: ArrayRef size ", arr.size(), " not divisible by stride ", stride0)
TORCH_CHECK(arr.size() % stride0 == 0, "MatrixRef: ArrayRef size ", arr.size(), " not divisible by stride ", stride0)
}
/// @}
@ -59,7 +59,7 @@ namespace at {
} else if (dim == 1) {
return stride0;
} else {
AT_CHECK(0, "MatrixRef: out of bounds dimension ", dim, "; expected 0 or 1");
TORCH_CHECK(0, "MatrixRef: out of bounds dimension ", dim, "; expected 0 or 1");
}
}

View File

@ -23,17 +23,15 @@ MemOverlap has_internal_overlap(TensorImpl* t) {
return MemOverlap::TOO_HARD;
}
void assert_no_internal_overlap(const Tensor& t, std::string op) {
assert_no_internal_overlap(t.unsafeGetTensorImpl(), op);
void assert_no_internal_overlap(const Tensor& t) {
assert_no_internal_overlap(t.unsafeGetTensorImpl());
}
void assert_no_internal_overlap(TensorImpl* t, std::string op) {
if (has_internal_overlap(t) == MemOverlap::YES) {
AT_ERROR(
op, ": unsupported operation: more than one element of the written-to "
"tensor refers to a single memory location. Please clone() the tensor "
"before calling ", op);
}
void assert_no_internal_overlap(TensorImpl* t) {
TORCH_CHECK(has_internal_overlap(t) != MemOverlap::YES,
"unsupported operation: more than one element of the written-to tensor "
"refers to a single memory location. Please clone() the tensor before "
"performing the operation.");
}
}

Some files were not shown because too many files have changed in this diff Show More