Compare commits

...

2879 Commits

Author SHA1 Message Date
db5d3131d1 add fix for CUDA 10 2018-12-06 15:44:56 -08:00
524574ab73 Define THPStorage struct only once (rather than N times) (#14802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14802

The definetion of THPStorage does not depend on any Real, its macro
defintion is unnecessary, refactor the code so that THPStorage is not macro
defined.

Reviewed By: ezyang

Differential Revision: D13340445

fbshipit-source-id: 343393d0a36c868b9a06eea2ad9b80f5e395e947
2018-12-05 13:19:29 -08:00
ca6311d909 File name change for FbgemmI8Depthwise.h and FbgemmI8Depthwise.cc (#14725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14725

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/33

Renaming FbgemmI8Depthwise.h to FbgemmI8DepthwiseAvx2.h and FbgemmI8Depthwise.cc to FbgemmI8DepthwiseAvx2.cc since FbgemmI8DepthwiseAvx2.cc will be compiled with avx2 flags

Reviewed By: jianyuh

Differential Revision: D13313898

fbshipit-source-id: a8111eacf3d79a466ce0565bfe5f2f0b200a5c33
2018-12-05 13:14:48 -08:00
e114527d19 Add torch.nn.RReLU support in symbolic (#14781)
Summary:
Now we support exporting torch.nn.RReLU in onnx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14781

Reviewed By: houseroad

Differential Revision: D13343872

Pulled By: zrphercule

fbshipit-source-id: 1e96b957de4fc2f5ba3959d42329807975419ae3
2018-12-05 13:10:07 -08:00
50936cb06e Move avx2 specific code in different source files (#28)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/28

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14516

This is the first diff in a series of diffs that will separate out avx2 specific code in separate files. The goal is to compile as little as possible code with avx2 and avx512 compiler flags.

Reviewed By: jianyuh

Differential Revision: D13248376

fbshipit-source-id: 401c2e9d3cd96c420fd08c3efa011febce96ffbb
2018-12-05 12:19:35 -08:00
55092b1cc6 Validate matching input shapes in Int8Add operator (#14520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14520

Default engine doesn't support broadcast semantics in Int8Add operator. This patch adds a check that shapes are equivalent.

Reviewed By: bertmaher

Differential Revision: D13250922

fbshipit-source-id: 8526d07723bd9a34d54dee04d121c57f8b33c481
2018-12-05 12:00:23 -08:00
1c2273c8e9 fix stft arg types
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14800

Reviewed By: zou3519

Differential Revision: D13340574

Pulled By: SsnL

fbshipit-source-id: 8b0dbbe299d1a362da0ecc0b1c0dadb2543ded5d
2018-12-05 11:45:37 -08:00
999690ff3d Improve HIPify performance (#14803)
Summary:
```
    Improve performance of pyHIPIFY

    Changes:
    - Pre-compile regexes, don't use regexes when it's not necessary
      (this saves us ~15%)
    - Compile all substitutions for mappings into a single, non-backtracking
      regex using a Trie.  This gives big savings.

    Before, running pyHIPIFY on all files took 15.8s.  Now it takes 3.9s.
```

Stacked on #14769
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14803

Differential Revision: D13342620

Pulled By: ezyang

fbshipit-source-id: 1cfa36b3236bbe24d07080a31cc788a52d740f40
2018-12-05 11:00:03 -08:00
be47470c91 Fix cuda multiprocessing cached memory (#14736)
Summary:
This PR fixes #11422

In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's.

This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type.

In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come.

Thanks a ton to ezyang who helped design the solution and debugged the issue!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736

Differential Revision: D13335899

Pulled By: ailzhang

fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371
2018-12-05 10:55:43 -08:00
3ae721d350 Set and get default dtype (#13748)
Summary:
Replaces the `DefaultTensorOptions` with just a global default dtype that you can set and get like in Python.

Also, calls `set_default_dtype` in the implementation of `torch.set_default_dtype`. Right now these two default values are separate but will always be the same. Should we just bind `set_default_dtype`  into Python? I think that might be good to do in a separate PR though.

ezyang gchanan

Also CC colesbury who wanted to do this for ATen for a while? What do you think about it?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13748

Differential Revision: D13340207

Pulled By: goldsborough

fbshipit-source-id: 2689b09eb137fabb3a92d1ad1635782bee9398e8
2018-12-05 10:28:41 -08:00
90b1196ac4 Switch Int8AveragePool operator to QNNPACK (#14783)
Summary:
2.2-2.9X better performance on ARM when compiled with gcc (same bad perf when compiled with Clang)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14783

Differential Revision: D13332680

Pulled By: Maratyszcza

fbshipit-source-id: 4c1138500c6b3026335e9bfe5f6be43b1ae2cefb
2018-12-05 10:18:42 -08:00
e1eb32d9f1 Update magma to 2.4.0 for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14738

Differential Revision: D13341611

Pulled By: soumith

fbshipit-source-id: 39a49fc60e710cc32a463858c9cee57c182330e2
2018-12-05 09:53:39 -08:00
62f4db6d8a Unify build_caffe2_amd.py and build_pytorch_amd.py (#14769)
Summary:
I need to preserve ability to HIPify out-of-place files
only, so build_amd.py grows a --out-of-place-only flag.

Stacked on #14757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14769

Differential Revision: D13340154

Pulled By: ezyang

fbshipit-source-id: 1b855bc79e824ea94517a893236fd2c8ba4cb79d
2018-12-05 09:26:12 -08:00
dbf6d12776 Default pool() option (#14636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14636

Add a default CPU option for the pool()

Reviewed By: andrewwdye

Differential Revision: D13281367

fbshipit-source-id: 92dbfce89c900a41731b6d1ff62bb97886c40f77
2018-12-05 08:44:19 -08:00
2d958b7f77 Storage.clone maintains original device (#14751)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/14673

As pointed out by vishwakftw , the root case of the `deepcopy` issue was that `storage.clone()` would create a new storage in the default device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14751

Reviewed By: soumith

Differential Revision: D13323061

Pulled By: fmassa

fbshipit-source-id: bfe46ebd78f0b6cd9518c11d09de7849282ed2a2
2018-12-05 08:33:56 -08:00
a80a46a6d0 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 080e0034bd6353420383ac7b476af5a35eaba7c3
2018-12-05 08:33:55 -08:00
0b1b72e975 Updating submodules
Reviewed By: yns88

fbshipit-source-id: e397238c7c477c4268e2dc89e530776fc89f18f8
2018-12-05 02:55:46 -08:00
0573ef664e include avx512vl to avx512 code path (#14733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14733

We often also want to use AVX512VL instruction sets.
We already included AVX512F, AVX512DQ.
Skylake also has AVX512BW, AVX512CD we may want to later.

Reviewed By: duc0

Differential Revision: D13317282

fbshipit-source-id: 82c8e401d82d5c3a5452fb4ccb6e5cb88d242bda
2018-12-05 00:50:51 -08:00
f89de64796 Use AT_WARN for warnings in the JIT (#14770)
Summary:
Previously their implementation dispatched to prim::Print, which kept
printing the warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14770

Differential Revision: D13327629

Pulled By: suo

fbshipit-source-id: b9913f533d4530eb7c29146c39981ba7f72b6b68
2018-12-05 00:16:09 -08:00
ecc17fe3dd Add output info when doing onnxGetBackendCompatibility (#14784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14784

TSIA. To give more complete info to `onnxGetBackendCompatibility`.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D13331989

fbshipit-source-id: 1064b93f7f474788f736e6f0c893dae915c6fb99
2018-12-04 21:53:32 -08:00
c79e305add Don't DCE PythonOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14773

Reviewed By: eellison

Differential Revision: D13327673

Pulled By: suo

fbshipit-source-id: 236db3407c7eacac470530836e3d4d0dc323110c
2018-12-04 21:37:36 -08:00
8dfebc16cc Improvements for symbolic AD (#14758)
Summary:
**Review only the last commit.**

This commit adds a few optimizations to AD, that let us dramatically
reduce the number of sizes we capture from forward.

We now:
- collapse chains of SumToSize
- avoid capturing sizes of tensors that are captured anyway
- more aggressively DCE the reverse code
- run CSE on the primal code to deduplicate `aten::size` calls

cc zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758

Differential Revision: D13324440

Pulled By: zou3519

fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72
2018-12-04 20:38:21 -08:00
38eb1beff5 Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py
Differential Revision:
D13289919

Original commit changeset: d701bc7bb48f

fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d
2018-12-04 20:25:16 -08:00
78a9e7d83f Delete defunct files from torch/csrc/distributed (#14785)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14785

Differential Revision: D13333066

Pulled By: ezyang

fbshipit-source-id: e7937b4e8e12409b0fa964c34f995f7861ca95ff
2018-12-04 20:13:20 -08:00
d76e411d8c support conv transpose in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14775

Differential Revision: D13330491

Pulled By: eellison

fbshipit-source-id: 432b327d6a33517ff53ea33c9f64700e81432332
2018-12-04 19:54:09 -08:00
2d3cf98b49 Making dist.get_default_group private for PT1 release (#14767)
Summary:
When I wrote the frontend API, it is designed on not letting users use the default_group directly on any functions.  It should really be private.

All collectives are supposed to either use group.WORLD, or anything that comes out of new_group. That was the initial design.

We need to make a TODO on removing group.WORLD one day. It exists for backward compatibility reasons and adds lots of complexity.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14767

Reviewed By: pietern

Differential Revision: D13330655

Pulled By: teng-li

fbshipit-source-id: ace107e1c3a9b3910a300b22815a9e8096fafb1c
2018-12-04 19:22:24 -08:00
33ea7eafef Make checkpoint_sequential work with multiple arguments (#14278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278

In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest.

We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors.

Reviewed By: ezyang

Differential Revision: D13144672

fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7
2018-12-04 18:47:43 -08:00
3237103624 Automatic update of fbcode/onnx to 42804705bdbf179d1a98394008417e1392013547 (#14777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14777

Previous import was 6b34743d2e361bbc0acb29dd73536478cb92562e

Included changes:
- **[4280470](https://github.com/onnx/onnx/commit/4280470)**: Changes done internally at Facebook (#1668) <Lu Fang>
- **[f85221f](https://github.com/onnx/onnx/commit/f85221f)**: Fuse MatMul and Add into Gemm (#1542) <vloncar>
- **[022230e](https://github.com/onnx/onnx/commit/022230e)**: Replace np.long by np.int64 (#1664) <G. Ramalingam>
- **[0ab3c95](https://github.com/onnx/onnx/commit/0ab3c95)**: Infer shape from data in Constant nodes (#1667) <Shinichiro Hamaji>

Reviewed By: bddppq

Differential Revision: D13330082

fbshipit-source-id: 13cf328626cf872d0983bbd2154d95c45da70f1c
2018-12-04 18:37:48 -08:00
a66669a110 Enable testing on Loss modules (#14778)
Summary:
This PR adds `None` buffers as parameters (similarly to #14715). It also cleans up a bunch of the `test_jit.py` tests that should be covered by `common_nn.py` and brings in `criterion_tests` to test loss functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14778

Differential Revision: D13330849

Pulled By: driazati

fbshipit-source-id: 924cc4cf94e0dcd11e811a55222fd2ebc42a9e76
2018-12-04 18:35:10 -08:00
d872af9282 Add tests for dropout/batchnorm train/eval, remove training constants (#14780)
Summary:
This PR:

1. add tests for batchnorm/dropout for train/eval parameter mutatino
2. remove training constants from all our standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780

Differential Revision: D13331578

Pulled By: wanchaol

fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5
2018-12-04 18:17:43 -08:00
86b4dd8bb2 Split LegacyDeviceTypeInit from LegacyTypeDispatch. (#14723)
Summary:
The goal here is to have LegacyTHDispatch call into this as well, so LegacyTypeDispatch and LegacyTHDispatch don't have cross dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14723

Reviewed By: ezyang

Differential Revision: D13314017

Pulled By: gchanan

fbshipit-source-id: 8761cb4af2b2269d2e755203e073bfdba535b8c0
2018-12-04 17:51:37 -08:00
f6f24cf0f4 don't allow cse to clean up nondeterministic nodes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14776

Differential Revision: D13330229

Pulled By: suo

fbshipit-source-id: 6bc88811e1889949f0f079cffccd8cd4270584cc
2018-12-04 15:45:37 -08:00
d76fd43294 Reenable all forward-pass fusions that worked before the AD fix (#14558)
Summary:
Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558

Differential Revision: D13321748

Pulled By: zou3519

fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f
2018-12-04 15:43:37 -08:00
c3bfa0e52b BatchNorm support not tracking stats
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14764

Differential Revision: D13325800

Pulled By: driazati

fbshipit-source-id: a3e4773dc31b83565e7a4de33614d6efd4a12de9
2018-12-04 15:11:53 -08:00
c21f090ab4 Minor doc change in c10/Device.h (#14762)
Summary:
Make sure it's a valid regex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14762

Reviewed By: zrphercule

Differential Revision: D13326108

Pulled By: houseroad

fbshipit-source-id: fdcae2d5d42774c4071651b7477f08047d385dfa
2018-12-04 14:52:22 -08:00
9e1f4ba124 Introduce LegacyTHDispatcher for dispatching to TH functions. (#14754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14754

This isn't hooked up to anything yet, this is just putting the skeleton in place.
The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation.

So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher.  Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher.

Reviewed By: ezyang

Differential Revision: D13321605

fbshipit-source-id: 25d1bbc9827a42d6ab5d69aabbad3eac72bf364c
2018-12-04 14:44:06 -08:00
53a9d4f312 disable batch mm if we have mutable ops (#14771)
Summary:
Just to be safe, disable batch mm for mutable ops. We don't lose much for doing this, and we can go back at a calmer time to re-enable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14771

Reviewed By: eellison

Differential Revision: D13327641

Pulled By: suo

fbshipit-source-id: 96611e21ed3cb8492a2cd040f7d33fb58c52bd5e
2018-12-04 14:34:57 -08:00
5ed9dfad98 Replace at::Half non-vectorized conversions with implementations from FP16 (#14411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14579

Folded the fp16 codes into c10.

Reviewed By: ezyang

Differential Revision: D13206450

fbshipit-source-id: 472208dd230dc49d33935622ff3286b17eeb0894
2018-12-04 14:32:33 -08:00
2d56df7892 Use .to to convert new tensors in new_tensor (#14097)
Summary:
This would solve the tracing problems of #13969.
Fixes: #14732

I would appreciate if this got good scrutiny before applied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14097

Differential Revision: D13323181

Pulled By: ezyang

fbshipit-source-id: dcd104b497c0bfddb751923c6166a3824b7a3702
2018-12-04 14:03:56 -08:00
c7c5eed686 Export generator constructor (#14041)
Summary:
Missed a spot :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14041

Reviewed By: ezyang

Differential Revision: D13283803

Pulled By: ebetica

fbshipit-source-id: 482e245f57b0cea6ca3886355ea3ae487d024d4b
2018-12-04 13:50:06 -08:00
374b797569 c10d doesn't work with torch namespace (#14042)
Summary:
If both `Utils.hpp` and the `torch` namespace is included in the same file, the compiler won't know which fmap to use. I believe this is because of ADL. This change fixes that issue for me.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14042

Reviewed By: ezyang

Differential Revision: D13283810

Pulled By: ebetica

fbshipit-source-id: b68233336518230ba730e83ddac1226a66896533
2018-12-04 13:47:20 -08:00
3aba2d99e1 Add resnet test, convert more modules (#14437)
Summary:
This PR add resnet to test_jit and convert more nn modules, stacked on #14533 and #14715
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14437

Differential Revision: D13325871

Pulled By: wanchaol

fbshipit-source-id: 6c94a988b36794a373af6541c0c262a07291f7b1
2018-12-04 13:42:41 -08:00
25c9a8b1fc Add missing test skip
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14763

Differential Revision: D13325350

Pulled By: driazati

fbshipit-source-id: 4d64a7616b227983c2fc2748c5fbecd1bcbff832
2018-12-04 13:38:53 -08:00
875be849e9 Rename _local_scalar to item() (#13676)
Summary:
Make `at::_local_scalar` more "official" by renaming it to `item()`.

gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676

Differential Revision: D13003020

Pulled By: goldsborough

fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4
2018-12-04 13:19:26 -08:00
e829a52977 Remove use of hipify_caffe2, in favor of file path test. (#14757)
Summary:
This is towards unifying build_pytorch_amd.py and build_caffe2_amd.py
scripts.  There is only one use of hipify_caffe2 left, which is just
to control which files actually get HIPified.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14757

Differential Revision: D13323486

Pulled By: ezyang

fbshipit-source-id: 958cd91be32dfc3c0a9ba9eda507adb5937aebcd
2018-12-04 12:48:49 -08:00
a597c0ca05 Add inplace FeedTensor for python frontend (#14512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14512

att

Reviewed By: dzhulgakov

Differential Revision: D13243278

fbshipit-source-id: 78af417d0fcd9b9791ee839d62095903e49205cb
2018-12-04 12:45:11 -08:00
ba70cf22fa Loss (#14720)
Summary:
Adding Loss modules to script.  Some of the modules have an optional tensor parameter. I will wait until wanchao's diff to support optional tensors is landed before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14720

Differential Revision: D13317990

Pulled By: eellison

fbshipit-source-id: 535925bdf126d28d9e7d64077b83ebd836a5beba
2018-12-04 12:30:05 -08:00
ef91cfd68b Add new reduction mode in kl_div (#14457)
Summary:
Fixes #6622 .
We used to average over all elements for kl divergence, which is not aligned with its math definition.
This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension.

- In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements.
- We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor.
- Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release.
- [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457

Differential Revision: D13236016

Pulled By: ailzhang

fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7
2018-12-04 12:24:28 -08:00
773f4d8081 Implements Gather operator for arbitrary axis, sharing the code with BatchGather. (#13756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756

This implements general Gather operator for arbitrary axis, sharing the code with BatchGather.
 - CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis.
 - Shared CUDA kernel moved to gather_op.cuh, for any axis.
 - Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument.
 - BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output.
 - Added tests for axis == 2.

GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX.
This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies
wheather this wrapping should be done; set it to true if you'd like wrapping for any axis.

TBD: Update gradients to support negative indices (separate diff).
TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping
by default, but rather do it only if wrap_indices is set.

Reviewed By: dzhulgakov

Differential Revision: D12983815

fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7
2018-12-04 11:54:28 -08:00
16558a1e9d Refactor dataloader.py (#14668)
Summary:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.

So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.

No functionality is changed, except that  I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668

Reviewed By: soumith

Differential Revision: D13289919

Pulled By: ailzhang

fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c
2018-12-04 09:53:41 -08:00
7e4a5b89fe Back out "Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10" (#14745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14745

Original commit changeset: c62e7f9b0255

Reviewed By: suo

Differential Revision: D13318594

fbshipit-source-id: 4d7dc35ca01b627accc3ee512bfcd6f2e805a533
2018-12-04 08:59:10 -08:00
ff7deb95d7 Back out "Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard" (#14744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14744

Original commit changeset: d236d5351ecf

Reviewed By: suo

Differential Revision: D13318596

fbshipit-source-id: 55f1e9472d05fb5a9c47dc82c32e9a66b5e4308c
2018-12-04 08:59:07 -08:00
7bc489c827 Disable randn_like fusion in the JIT (#14752)
Summary:
Fixes #14674. We won't have time for a proper fix before the release, so at least disable fusion of nodes that trigger incorrect behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14752

Differential Revision: D13320407

Pulled By: zou3519

fbshipit-source-id: 2400f7c2cd332b957c248e755fdb0dadee68da5d
2018-12-04 08:55:47 -08:00
86ffc2a5f1 fix import failure in hub test (#14742)
Summary:
Fix #14610

I can repro the test failure following the steps provided, and this fixes the issue for me. Seems the timing of inserting has to happen after the downloading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14742

Differential Revision: D13318533

Pulled By: ailzhang

fbshipit-source-id: b9207b4572d5a9443e516d9a84632e3d7b68e477
2018-12-04 08:37:05 -08:00
9e58c4ef91 Revert D13304654: [pytorch][PR] Introduce LegacyTHDispatcher for dispatching to TH functions.
Differential Revision:
D13304654

Original commit changeset: cfe3e1a28adc

fbshipit-source-id: 06669d3c88f83e1d959e2c266fd608316539d42a
2018-12-04 07:58:34 -08:00
264111bfc1 Introduce LegacyTHDispatcher for dispatching to TH functions. (#14708)
Summary:
This isn't hooked up to anything yet, this is just putting the skeleton in place.
The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation.

So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher.  Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14708

Reviewed By: ezyang

Differential Revision: D13304654

Pulled By: gchanan

fbshipit-source-id: cfe3e1a28adcc355f67fe143495ee7e5c5118606
2018-12-04 07:41:04 -08:00
33b1f9f71a add .code property to ScriptModule (#14735)
Summary:
simple change to allow `print(foo.code)` to give a pretty-printed description of all the methods on a module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14735

Differential Revision: D13317619

Pulled By: zdevito

fbshipit-source-id: dc7f7ba12ba070f2dfccf362995c2a9e0e573cb7
2018-12-04 07:32:18 -08:00
1921816f85 Fix clamp when min/max are both None (#14716)
Summary:
Before this PR, tensor.clamp() would return an empty tensor if min and
max were not specified. This is a regression from 0.4.1, which would
throw an error. This PR restores that error message.

Fixes #14470
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716

Differential Revision: D13311031

Pulled By: zou3519

fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880
2018-12-04 07:07:09 -08:00
6e0c5a8a4e Restore device in cpp API (#14711)
Summary:
This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454.

It enables the restoring the storage to appropriate device.

~~[TODO]: add/modify appropriate tests~~ Done
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711

Reviewed By: dzhulgakov

Differential Revision: D13315746

Pulled By: houseroad

fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185
2018-12-04 00:46:41 -08:00
cbd805169f move structs to header file (#14728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14728

Move IndexBlob,Index to header file so it can reused.

Differential Revision: D13315898

fbshipit-source-id: 34432c9b8fa08af3d3387f32a940d35b02a59760
2018-12-04 00:42:41 -08:00
c7f93668dc improve the restore device test, and relax the assertion (#14734)
Summary:
Only compare the device index if device has it.

Test the tensor restore with some computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14734

Reviewed By: dzhulgakov

Differential Revision: D13317949

Pulled By: houseroad

fbshipit-source-id: 26b2f2912a9bbc3b660a62283fb403ddab437e49
2018-12-04 00:33:09 -08:00
8812a5d42e Reduce broadcasted inputs in derivative code (#14485)
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).

Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`

During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)

This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.

cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485

Reviewed By: eellison

Differential Revision: D13312888

Pulled By: suo

fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58
2018-12-04 00:16:21 -08:00
862b8cae51 interpolate (#14123)
Summary:
Add support for interpolate and upsampling in weak_script mode.

Because the function parameters are overloaded, i had to add it as a builtin op. For interpolate:
size can be ?int | int[]?, and scale_factor can be ?float | float[]?. Every combination of the two parameters needs to be supported.

The same logic applies for upsample_nearest, upsample_bilinear, and upsample.

There are a few fixes that I came to along the way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14123

Differential Revision: D13278923

Pulled By: eellison

fbshipit-source-id: e59729034369be4ce4b747291a3d1c74e135b869
2018-12-04 00:01:43 -08:00
a23863fd6f Add Pooling modules to Script (#14527)
Summary:
Depends on #14584
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14527

Differential Revision: D13270773

Pulled By: driazati

fbshipit-source-id: e4acd43ccbce0f4b62d41c30ce8d5c721171e19a
2018-12-03 23:55:04 -08:00
d429e78a9a Add fractional_max_pool2d to standard lib
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14591

Differential Revision: D13270755

Pulled By: driazati

fbshipit-source-id: 138a60256795f5ef8d236c75be2cfd929059b98f
2018-12-03 23:49:38 -08:00
e8e494caf8 Add GroupNorm to standard library (#14722)
Summary:
Depends on #14715 for the excluded tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14722

Differential Revision: D13317714

Pulled By: driazati

fbshipit-source-id: bf1cdbc0a3803f82befed41925e91ab60e20ec82
2018-12-03 23:46:19 -08:00
95e5a5ae0c basic testing of builtin alias annotations (#14588)
Summary:
Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588

Differential Revision: D13312653

Pulled By: suo

fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888
2018-12-03 22:31:02 -08:00
9fbc2d3153 Remove TensorImpl -> LegacyTypeDispatch dependency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14651

Reviewed By: ezyang

Differential Revision: D13285370

fbshipit-source-id: cc93c3ca95e7260762c1cabca17b8973d52c4e22
2018-12-03 21:53:28 -08:00
d063c9c330 Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14647

Reviewed By: ezyang

Differential Revision: D13283497

fbshipit-source-id: d236d5351ecf7ab9712a55e9ef12d8bba48eb53f
2018-12-03 21:53:26 -08:00
46772dba0c Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14646

Reviewed By: ezyang

Differential Revision: D13283494

fbshipit-source-id: c62e7f9b02551926bf8f1e3ddf6ede4ec925d28d
2018-12-03 21:53:24 -08:00
1098500e9b Fix include paths for Layout.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14645

Reviewed By: ezyang

Differential Revision: D13283496

fbshipit-source-id: d70881e957c886a6c2befe3ef1d2c5a3fac18e7f
2018-12-03 21:53:22 -08:00
771eebad7b Move Layout to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14644

Reviewed By: ezyang

Differential Revision: D13283493

fbshipit-source-id: bb02f156d6a5b5129db5743c756acc84c38eca83
2018-12-03 21:53:20 -08:00
5a4082612f Fix include paths for Backend.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14643

Reviewed By: ezyang

Differential Revision: D13283492

fbshipit-source-id: 9919af9707d094118efc963543320e01b07d7bc5
2018-12-03 21:53:19 -08:00
c303fcb9cb Moved Backend to c10 (#14642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14642

Unfortunately, TensorOptions depends on this, so we need it in c10.

Reviewed By: ezyang

Differential Revision: D13283495

fbshipit-source-id: 433cd47eb18aac1131be9c5cd650efc583870a20
2018-12-03 21:53:17 -08:00
119f9ec291 enable NoneValue parameter assignment for WeakScriptModule (#14715)
Summary:
This PR:

1. Handle None value attr in the WeakScriptModuleProxy
2. add back module tests that now passing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715

Differential Revision: D13313573

Pulled By: wanchaol

fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b
2018-12-03 20:40:55 -08:00
bb546b2e5b WAR for self.training (#14719)
Summary:
To enable self.training in script modules, this PR automatically adds a buffer called 'training' if a script method requests self.training. Assignment to self.training is overloaded to assign both to the boolean property and the tensor value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14719

Differential Revision: D13310569

Pulled By: zdevito

fbshipit-source-id: 406387bb602f8ce5794eeff37642863c75928be5
2018-12-03 20:32:16 -08:00
9a932b8b90 fix expect
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14730

Differential Revision: D13316463

Pulled By: zdevito

fbshipit-source-id: 8b11bdb22d354c17bf2de4bded352bb6eb086ec7
2018-12-03 20:15:27 -08:00
44894915d6 Automatic update of fbcode/onnx to 6b34743d2e361bbc0acb29dd73536478cb92562e (#14637)
Summary:
Previous import was f461f7aad9987635b4aff108620ed7918f002d19

Included changes:
- **[6b34743](https://github.com/onnx/onnx/commit/6b34743)**: fix the const map initializatoin (#1662) <Lu Fang>
- **[ae80999](https://github.com/onnx/onnx/commit/ae80999)**: Fuse Pad into Conv optimizer (#1580) <vloncar>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14637

Differential Revision: D13281338

Pulled By: houseroad

fbshipit-source-id: c31429914bf5954fdc85e0c02168836ef47d635c
2018-12-03 20:11:17 -08:00
7b6c6f76f7 Skip CUDA tests when built with CUDA but no GPUs available; rename cuda tests so they're obvious.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14706

Reviewed By: soumith

Differential Revision: D13304398

fbshipit-source-id: d5e2cda965ce8bc1721489b282336ea3ca7f0471
2018-12-03 18:49:59 -08:00
22ab6183c5 Move manual_seed into ATen/Context.h; delete reimplementation in test_seed.h (#14625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14625

I want to reorg the test files, but I am too lazy to make the include
paths for test_seed.h work out.  So just delete it.

Reviewed By: gchanan

Differential Revision: D13277567

fbshipit-source-id: a3e8e46e4816b6fc0fe926b20779839f9e0a1a06
2018-12-03 18:49:58 -08:00
78d594f46c Implement Device as a type in the script (#14666)
Summary:
[ note:  stacked on expect files changes, will unstack once they land ]
This adds DeviceObjType (cannot use DeviceType it is already an enum)
to the type hierarchy and an isDevice/toDevice pair to IValue.
Previous hacks which used an int[] to represent Device are removed
and at::Device is used instead.

Note: the behavior or .to is only a subset of python, we need to
fix the aten op so that it accepts Option[Device] and Optional[ScalarType].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14666

Reviewed By: suo

Differential Revision: D13290405

Pulled By: zdevito

fbshipit-source-id: 68b4381b292f5418a6a46aaa077f1c902750b134
2018-12-03 16:54:40 -08:00
4b31572375 Meta programming on If Stmt cond to enable conditional emit blocks (#14533)
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below

```python
import torch

class Test(torch.jit.ScriptModule):
  def __init__(self, b = None):
    self.b = b
  def forward(self, input):
    x = input
    if self.b is not None:
      x = self.b(input)

    return x

  Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533

Differential Revision: D13310526

Pulled By: wanchaol

fbshipit-source-id: 78d1a8127acda5e44d2a8a88f7627c43d29ff244
2018-12-03 15:47:15 -08:00
298b775577 Delete temporary ATenCoreTest. (#14622)
Summary:
It was previously used to sure that ATen/core was working;
but now we have plenty of headers and C++ files in ATen/core
so this is no longer necessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14622

Differential Revision: D13276899

Pulled By: ezyang

fbshipit-source-id: 9bef7eb1882ccdfa3ee7681a3d5b048ea94b59d3
2018-12-03 15:07:40 -08:00
9ac845f734 Revert D13280899: [pytorch][PR] Reduce broadcasted inputs in derivative code
Differential Revision:
D13280899

Original commit changeset: 80cc5ec9331b

fbshipit-source-id: 2335093cca8fd7db95470fd83b9299adfa17aa8e
2018-12-03 14:55:02 -08:00
e0f68671bd Restore device when import jit script module (#14454)
Summary:
We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454

Reviewed By: zrphercule

Differential Revision: D13271956

Pulled By: houseroad

fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49
2018-12-03 14:10:30 -08:00
b8da44dc13 Add linear + pixelshuffle modules to standard lib
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14654

Differential Revision: D13300968

Pulled By: driazati

fbshipit-source-id: 2c36aab91ea99681687f8da6d318981fee49785b
2018-12-03 14:01:16 -08:00
68ffe46991 Reduce broadcasted inputs in derivative code (#14485)
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).

Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`

During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)

This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.

cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485

Differential Revision: D13280899

Pulled By: soumith

fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
2018-12-03 13:44:18 -08:00
b768db0810 Allow DCE to clean up some mutable ops (#14601)
Summary:
This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to.

This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils.

Tested on traced ResNet, and it gets rid of the dead code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601

Differential Revision: D13309118

Pulled By: suo

fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254
2018-12-03 13:31:08 -08:00
9783ce3825 Revert D13272203: [pytorch][PR] [jit] Meta programming on If Stmt cond to enable conditional emit blocks
Differential Revision:
D13272203

Original commit changeset: 44a545abb766

fbshipit-source-id: 8861eb4810a6c9ea4aba8427b3a07d2fa0d69a15
2018-12-03 13:28:52 -08:00
6385d00185 Move global-constructor to lazily initialized (mobile restriction) (#14650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14650

this fixes the build for mobile

Reviewed By: dzhulgakov

Differential Revision: D13267458

fbshipit-source-id: 83e7e76e3c875134395b6c43ea791c5b56871642
2018-12-03 13:24:56 -08:00
5a2f5a216f Make convertable to list also accepts optional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14598

Differential Revision: D13308254

Pulled By: wanchaol

fbshipit-source-id: bd0b6f9f20294d3d589cf68732dbd8c57b67e0e9
2018-12-03 13:09:11 -08:00
b5181ba1df add avx512 option (but no avx512 kernel yet) (#14664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664

This diff just adds a framework to add avx512 kernels.
Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough *overall* speedups because it can backfire because of cpu frequency going down.

Reviewed By: duc0

Differential Revision: D13281944

fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6
2018-12-03 12:18:19 -08:00
4b90702037 Meta programming on If Stmt cond to enable conditional emit blocks (#14533)
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below

```python
import torch

class Test(torch.jit.ScriptModule):
  def __init__(self, b = None):
    self.b = b
  def forward(self, input):
    x = input
    if self.b is not None:
      x = self.b(input)

    return x

  Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533

Differential Revision: D13272203

Pulled By: wanchaol

fbshipit-source-id: 44a545abb766bbd39b762a6e19f9ebaa295e324b
2018-12-03 12:14:52 -08:00
cac03280f9 Fixed DistributedDataParallel state pickling for multi-gpus (#14690)
Summary:
Fixed: https://github.com/pytorch/pytorch/issues/14678

This PR fixed DDP doesn't work after save() and load() for multiple GPUs, because, it needs all these replicating logics and bucketing in the constructor.

So I refactored some of the logics in the constructor to a helper function. And this will be used for load().

Added test too. Tested on 8 GPU machines.

```
tengli@learnfair062:~/pytorch/test$ python run_test.py -i distributed --verbose
Test executor: ['/private/home/tengli/miniconda3/bin/python']
Selected tests: distributed
Running test_distributed ... [2018-12-02 18:33:55.833580]
/public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec
Running distributed tests for the mpi backend
test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok

----------------------------------------------------------------------
Ran 68 tests in 6.315s

OK (skipped=15)
ok

----------------------------------------------------------------------
Ran 68 tests in 6.315s

OK (skipped=15)
ok

----------------------------------------------------------------------
Ran 68 tests in 6.315s

OK (skipped=15)
Running distributed tests for the mpi backend with file init_method
test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel'
test_DistributedDataParallelCPU (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather'
test_all_gather_full_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_group (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu'
test_all_reduce_full_group_max (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_min (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_product (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_full_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_max (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_min (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_product (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_group_sum (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_max (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_min (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_all_reduce_product (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_full_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_group (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... ok
test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier"
test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce'
test_broadcast_full_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu"
test_destroy_full_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_destroy_group (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_full_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_gather_group (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_backend (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_default_group (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_full_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_get_rank_size_group (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_irecv (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_isend (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_max (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_min (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_product (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_full_group_sum (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_max (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_min (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_product (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_group_sum (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_max (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_min (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu'
test_reduce_product (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce'
test_scatter (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_full_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_scatter_group (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_any_source (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok
test_send_recv_with_tag (__main__.TestMPI) ... ok

----------------------------------------------------------------------
Ran 68 tests in 6.415s

OK (skipped=15)
ok

----------------------------------------------------------------------
Ran 68 tests in 6.415s

OK (skipped=15)
ok

----------------------------------------------------------------------
Ran 68 tests in 6.415s

OK (skipped=15)
Running distributed tests for the nccl backend
test_Backend_enum_class (__main__.TestDistBackend) ... ok
test_DistributedDataParallel (__main__.TestDistBackend) ... ok
test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU'
test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather'
test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL'
test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_gather_multigpu (__main__.TestDistBackend) ... ok
test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL'
test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_cuda (__main__.TestDistBackend) ... ok
test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok
test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_group_cuda (__main__.TestDistBackend) ... ok
test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_cuda (__main__.TestDistBackend) ... ok
test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped'
test_destroy_full_group (__main__.TestDistBackend) ... ok
test_destroy_group (__main__.TestDistBackend) ... ok
test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_get_backend (__main__.TestDistBackend) ... ok
test_get_default_group (__main__.TestDistBackend) ... ok
test_get_rank (__main__.TestDistBackend) ... ok
test_get_rank_size_full_group (__main__.TestDistBackend) ... ok
test_get_rank_size_group (__main__.TestDistBackend) ... ok
test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv'
test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend'
test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_multigpu (__main__.TestDistBackend) ... ok
test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_sum_cuda (__main__.TestDistBackend) ... ok
test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv'
test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source'
test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv'

----------------------------------------------------------------------
Ran 68 tests in 69.549s

OK (skipped=52)
Running distributed tests for the nccl backend with file init_method
test_Backend_enum_class (__main__.TestDistBackend) ... ok
test_DistributedDataParallel (__main__.TestDistBackend) ... ok
test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU'
test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather'
test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL'
test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_gather_multigpu (__main__.TestDistBackend) ... ok
test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL'
test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested'
test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_cuda (__main__.TestDistBackend) ... ok
test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok
test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier'
test_barrier_group_cuda (__main__.TestDistBackend) ... ok
test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts'
test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_cuda (__main__.TestDistBackend) ... ok
test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped'
test_destroy_full_group (__main__.TestDistBackend) ... ok
test_destroy_group (__main__.TestDistBackend) ... ok
test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_get_backend (__main__.TestDistBackend) ... ok
test_get_default_group (__main__.TestDistBackend) ... ok
test_get_rank (__main__.TestDistBackend) ... ok
test_get_rank_size_full_group (__main__.TestDistBackend) ... ok
test_get_rank_size_group (__main__.TestDistBackend) ... ok
test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv'
test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend'
test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_multigpu (__main__.TestDistBackend) ... ok
test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors'
test_reduce_sum_cuda (__main__.TestDistBackend) ... ok
test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter'
test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv'
test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source'
test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv'

----------------------------------------------------------------------
Ran 68 tests in 70.381s

OK (skipped=52)
``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14690

Differential Revision: D13294169

Pulled By: teng-li

fbshipit-source-id: 69ccac34c6c016899bfe8fbc50b48d4bfd1d3876
2018-12-03 12:04:26 -08:00
18eaec7121 Add (unused) HIP API to the Context object. (#14623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14623

This is the last piece we need before we can start doing out-of-place
HIPify on ATen.  These APIs are not actually used at the moment;
as we still do in-place HIPify which uses CUDA.

Reviewed By: gchanan

Differential Revision: D13277246

fbshipit-source-id: 771efa81c2d2022e29350f25a5b4bb8f49ac6df0
2018-12-03 10:54:57 -08:00
b1faab3d8f Replace THCState_getCurrentStream with direct at::cuda::getCurrentCUDAStream()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14500

Reviewed By: gchanan

Differential Revision: D13241401

fbshipit-source-id: d78cf8ddce96876bedc1d14507b0646bcfd41aed
2018-12-03 10:54:55 -08:00
a49bf21d50 Delete hasCuDNN from Context. (#14499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14499

It still needs to stay in hooks, since it's part of the public
C++ API, but I want library code to try to arrange for CuDNN checks
to occur inside CUDA code, where we it's statically obvious if
CuDNN is available (and you don't need to dynamic dispatch.

Reviewed By: gchanan

Differential Revision: D13241355

fbshipit-source-id: 4e668a5914ab890463a12d9e528ba4ecbb7dd7c2
2018-12-03 10:54:54 -08:00
eb71df3e63 Delete at::current_device(), Context::current_device() and Context::getNumGPUs() (#14414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414

The previous functions were CUDA-centric, and lead to lots of places
where we improperly assumed that CUDA is the only game in town (it's not).
Best to delete them.

What are your alternatives?  This diff fix some use sites which may give
you some ideas.  In particular, the "given a device type, give me the
current device for that device type" might be a good function to enshrine
for real.

Reviewed By: gchanan

Differential Revision: D13218540

fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a
2018-12-03 10:54:52 -08:00
5ee8312b63 sparse.mm(), reland #14526 (#14661)
Summary:
- reland reverted PR #14526 with doc fixes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14661

Differential Revision: D13289047

Pulled By: weiyangfb

fbshipit-source-id: 5b843a11a58b56aeada3af2680a27cf89ecef4d8
2018-12-03 10:39:27 -08:00
7da2448d62 Fix multi-argument allreduce in ProcessGroupGloo (#14688)
Summary:
If multiple arguments are specified to c10d allreduce, they are
interpreted as if they are expanding the ranks in the process group.
Therefore, not only is every argument to allreduce an input that must
be considered, it is also an output. The problem that this commit
fixes is that they were not correctly considered as outputs.

The upstream problem is tracked in facebookincubator/gloo#152. Once
this is fixed there we can remove the copies that this commit adds.

This fixes #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14688

Differential Revision: D13294405

Pulled By: pietern

fbshipit-source-id: 078a2a0a0ff12d051392461438f1496201ec3cb9
2018-12-03 09:41:17 -08:00
b15242f70c Assert all legacy operators are 'extended_method', remove codegen for… (#14649)
Summary:
… other paths.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14649

Differential Revision: D13285183

Pulled By: gchanan

fbshipit-source-id: 91a58a22cba7e00eb0931bc277b0cb9d6f05cfdc
2018-12-03 07:41:50 -08:00
737efa78ba Remove 'type_method_inline_definitions' which isn't used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14648

Differential Revision: D13284176

Pulled By: gchanan

fbshipit-source-id: e6b8f9410fab57164259f97de2fd46f6bdf88d5a
2018-12-03 07:38:21 -08:00
b96e6ee98d Delete defunct DynamicCUDAInterface (#14621)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14621

Differential Revision: D13276723

Pulled By: ezyang

fbshipit-source-id: b666b2cdf4c45ccec7c802e268878eb2f3e028aa
2018-12-03 07:33:05 -08:00
af95f712b0 Get rid of deprecated_factory_method in codegen, which is no longer u… (#14641)
Summary:
…sed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14641

Differential Revision: D13283449

Pulled By: gchanan

fbshipit-source-id: 35cedc48940fa6144b4eab6402d9e1dc74a67b65
2018-12-03 07:28:42 -08:00
5c89190340 inline adagrad functions (#14194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14194

Inline some of perfkernels/adagrad.h functions for better performance

Reviewed By: hyuen

Differential Revision: D13096351

fbshipit-source-id: b4da8053278d585eabc5389b8a8dcae0f253b413
2018-12-02 20:23:02 -08:00
74c3cbc013 Increase test barrier timeout for barrier test (#14689)
Summary:
The CUDA initialization for the participating processes can
take long enough for the barrier timeout to trigger on the
process that doesn't participate in the group.

See #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14689

Reviewed By: teng-li

Differential Revision: D13293695

Pulled By: pietern

fbshipit-source-id: 6268dc9acfdb22f70c027e5e4be082f7127c0db4
2018-12-02 17:46:17 -08:00
5268dd468c Fixed DistributedDataParallel cannot kick off all-reduce in a corner case (#14675)
Summary:
Ok, this corner happens for translation guys, and it only happens in the following corner case:

(1) when the module is registered a parameter that does not requires grad

and

(2) this registered parameter has a unique type (say, double, or half) and it's the only unique type such that itself alone will be put into a separate bucket.

and

(3) it is the last parameter that got registered in the module, such that its bucket reduction is the first to be kicked off.

Once this corner case happens, since it does not require grad, the backward hook won't be kicked off. Now that all other buckets are waiting for its bucket to be kicked off, in this case, no bucket will be kicked off since it's blocked by the first bucket (the unique type parameter).

This PR fixes two things:
(1) Make sure that we will only bucket parameters that requires_grad
(2) Make all-reduction checks in the next iteration. As long as we detect the previous iteration's all-reduction has not been fully kicked off, we will issue an error in the next iteration.
(3) Also removed some unused variables

With this bug fixed, the only case when this error can happen is when the user changed parameters later after wrapping up the module with DDP, like the case in:
https://github.com/pytorch/pytorch/issues/12603

Test covered as well

Without the first fix, I varied that the repro in fbcode hit this error message:

```
result = self.forward(*input, **kwargs)
  File "/data/users/tengli/fbsource/fbcode/buck-out/dev/gen/language_technology/neural_mt/os/pytorch_translate/train#link-tree/torch/nn/parallel/distributed.py", line 312, in forward
    raise RuntimeError("Not all gradients are all-reduced from "
RuntimeError: Not all gradients are all-reduced from the backward of the previous iteration. This is unexpected and fatal error. Please check and ensure that the model's parameters are not changed after you wrap up the model with DistributedDataParallel.

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14675

Differential Revision: D13291083

Pulled By: teng-li

fbshipit-source-id: 2539b699fae843f104b4b8d22721ae82502ba684
2018-12-02 17:13:07 -08:00
35c8f93fd2 Fix CUDA 8 build on Windows (#14665)
Summary:
Fixes #14663.
Test for CUDA 8 is running here: https://dev.azure.com/pytorch/PyTorch/_build/results?buildId=54
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14665

Differential Revision: D13290392

Pulled By: soumith

fbshipit-source-id: 57f0d5b704e5d1fcb4927cbc007327b4ed74f443
2018-12-01 16:50:38 -08:00
da2c3afa47 Fixed typo in README.md (#14346)
Summary:
Fixed the typo in the Docker image section of README.md file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14346

Differential Revision: D13290403

Pulled By: soumith

fbshipit-source-id: 1d848027a773f0cfc875c33d69a66e96abc7ac8b
2018-12-01 16:39:33 -08:00
4c11dee0e8 Use Type::str() in Type::operator<< (#14657)
Summary:
Stacked on zip commit because it also changes expect files, read only the last commit.

This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657

Differential Revision: D13288912

Pulled By: zdevito

fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697
2018-12-01 00:53:27 -08:00
143e171cb9 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 6b3905b999b1211196c9138d7236700a1b308491
2018-11-30 19:47:44 -08:00
170ff7764f Use a zip archive as our container format (#14521)
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.

This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:

1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
   or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
   option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
   serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
   can reasonably consume this format,using json  and zipfile packages, which enables
   people to build tools like visualizers without those visualizers depending on pytorch.
   This will be especially useful if you want to, for instance, write a visualizer in javascript.

Notes:

*  This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
   library for reading/writing zipfiles that unlike other zip libraries also includes libz
   compatible compress/decompress support. It is a single header and a single C file without
   any other dependencies. Note that the instructions for miniz explicitly state:

   > Please use the files from the releases page in your projects. Do not use the git checkout directly!

   So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
   to doing zip-align style things to align data.

*  Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
   editing the meta-data file. Very important if you want to print-debug serialized models.

*  PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
   However, their implementation is completely swapped out to use miniz.

*  Code exists to check for the old magic number to give a decent warning to our preview users
   after we change the format.

*  Container version information is now put in a stand-alone 'version' file in the archive
   and serves a similar purpose to the other container version info.

*  All files in the zip archive start at 64-byte boundaries, using an approach similar to
   zip-align. Tests check that this property remains true. While the writer does this,
   the reader doesn't depend on it, allowing user-created archives that can use compression,
   and do not have to align data.

*  Added test to check for > 4GB files and archives. Disabled by default because it takes
   almost 2 minutes to run.

*  torchscript files are now optional: if a submodule does not have methods, it will
   not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521

Reviewed By: jamesr66a

Differential Revision: D13252945

Pulled By: zdevito

fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
2018-11-30 19:19:29 -08:00
1c21dc6e16 Revert D13252990: [pytorch][PR] [sparse] sparse.mm(S, D)
Differential Revision:
D13252990

Original commit changeset: 8fdb14144405

fbshipit-source-id: 49b8b0759a6e647854689962ffa72a205b4a2088
2018-11-30 18:53:47 -08:00
c71edcc747 Tensor construction codemod - caffe2/caffe2/fb/operators - 2/3
Summary:
Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13229251

fbshipit-source-id: 88b3984ea8ca82b9489c0ee9a338fd3f41dee615
2018-11-30 18:38:17 -08:00
fd17fd4aa9 Fix 'unknown type name 'optional'' (#14383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14383

D11669870 seems to have missed a spot that wasn't triggered before the stacked code above

Reviewed By: smessmer

Differential Revision: D13198269

fbshipit-source-id: 74592bedae0721acee744e31ca95253ea6efdedb
2018-11-30 17:29:50 -08:00
7f42d1c98a fix double precision cast from pybind (#14417)
Summary:
JIT world only have double, not float, so when insertConstant, we need to cast the python `float_` to double instead of float. This will fix the incorrect `math.pi` and other high precision constants value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14417

Differential Revision: D13282975

Pulled By: wanchaol

fbshipit-source-id: 26a4c89ffc044d28598af673aebfec95153a869e
2018-11-30 17:25:32 -08:00
404ad939e5 Revert existing no_grad_embedding_renorm_ from aten (#14639)
Summary:
Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed.

Instead, make a custom op, and use NoGradGuard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639

Differential Revision: D13285604

Pulled By: eellison

fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7
2018-11-30 16:57:51 -08:00
aeb38cfcea cuda implementation for PackSegment to support presence mask (#14635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635

as title

Reviewed By: enosair

Differential Revision: D13254097

fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38
2018-11-30 16:54:10 -08:00
1d464d7f3e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 17487c327cbe48969dff397656fe90efcf23b699
2018-11-30 16:23:00 -08:00
26f3fb34a1 Build distributed libs in build_libtorch.py (#14037)
Summary:
This patch detects and builds c10d and gloo for the C++ API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14037

Reviewed By: ezyang

Differential Revision: D13283801

Pulled By: ebetica

fbshipit-source-id: 006dbb691344819833da6b4b844c1f0572942135
2018-11-30 14:46:36 -08:00
36c5f40ec0 Remove methods from _th_triu_ and _th_addcmul_. (#14624)
Summary:
These somehow slipped through when we moved all of Declarations.cwrap to functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14624

Reviewed By: ezyang

Differential Revision: D13277434

Pulled By: gchanan

fbshipit-source-id: e83451e2d0fdafb55635d4b757688a501454bf8c
2018-11-30 14:19:29 -08:00
c3a2b1e155 sparse.mm(S, D) (#14526)
Summary:
- add `sparse.mm(S, D)` with backward
- for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526

Reviewed By: ezyang

Differential Revision: D13252990

Pulled By: weiyangfb

fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8
2018-11-30 14:15:34 -08:00
a84e873bb1 Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)
Summary:
See #14539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14569

Differential Revision: D13282161

Pulled By: ezyang

fbshipit-source-id: 13a1131b26fa300b037f66d1919b97d14033f9e5
2018-11-30 14:13:04 -08:00
5c1692840e Remove OptionsGuard from ATen (#14524)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/13738
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14524

Differential Revision: D13268031

Pulled By: goldsborough

fbshipit-source-id: fb306464b673c05ebd26d0f44d688ccd92d1d8c5
2018-11-30 13:30:35 -08:00
4b915260c7 Explicitly ban uninitialized tensors when invoking Predictor classes (#14377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14377

att

Reviewed By: dzhulgakov

Differential Revision: D13197348

fbshipit-source-id: 85a451bde3a57a8acdd3af548606c05e223896a6
2018-11-30 13:26:00 -08:00
738fc7054b Report timer in benchmarking when requested
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14570

Reviewed By: llyfacebook

Differential Revision: D13264904

Pulled By: sf-wind

fbshipit-source-id: fd05bc32202b7734dc911e3c792357ddf9ecedee
2018-11-30 13:17:29 -08:00
f45405bf5b Fix inheritance for SharedDataset (#14629)
Summary:
ezyang ebetica

CC jaliyae
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14629

Differential Revision: D13278988

Pulled By: goldsborough

fbshipit-source-id: 53afbcd1f3fc5cb23046ff92c4345cd90abd4584
2018-11-30 12:29:45 -08:00
814b5715ba Move module tests to common_nn (#14578)
Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578

Differential Revision: D13268286

Pulled By: driazati

fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03
2018-11-30 12:14:59 -08:00
c042f69dbb Updating submodules
Reviewed By: yns88

fbshipit-source-id: 863e9e2a1f0810f96494cabae1724622b9eb91ff
2018-11-30 11:47:16 -08:00
5ae0ed8552 Remove default constructor lines that do nothing, and fix warnings with clang trunk (#14300)
Summary:
The lines removed in this diff were no-op, but confusing: the default constructors in `store_handler.h` are implicitly deleted, since `std::runtime_error` has no default constructor.

Clang added a warning for this behavior [in September 2018](https://reviews.llvm.org/rL343285) (note that the warning is not just for cxx2a, despite the slightly confusing commit message), so building pytorch with a recent build of clang trunk causes spew of this warning, which is fixed by the present PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14300

Differential Revision: D13260039

Pulled By: umanwizard

fbshipit-source-id: 92788dbd6794253e788ef26bde250a66d8fb917e
2018-11-30 11:16:35 -08:00
c03851e93a remove copy_wrapper (#13937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13937

We can now replace s_copy_ with our new _copy_ function. Experimented with moving s_copy_ out of VariableManualType.cpp, but seemed like there was enough special casing to warrant it staying.

Reviewed By: ezyang

Differential Revision: D13053648

fbshipit-source-id: e9e04d460baf4ee49b500212cf91b95221acd769
2018-11-30 11:12:59 -08:00
5c65a7812e Move non_blocking copies to aten (#13866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13866

just a straightforward port

Reviewed By: ezyang

Differential Revision: D13011878

fbshipit-source-id: f288efebf78fa634abfb681b938b44277064d5b6
2018-11-30 11:12:57 -08:00
e3840419ec Move cuda copy to aten (#13348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13348

Move cross device, cpu to device, device to cpu copies to aten. Most of it is a direct port, main difference is that we dispatch from a single _copy_ function for copies.

Reviewed By: ezyang

Differential Revision: D12850690

fbshipit-source-id: c2e3f336796b4ae38be6027d2ec131a274a6aa8c
2018-11-30 11:12:55 -08:00
0786dfee7c Move THTensor_(copy) to aten (#13603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603
P
Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_.

Reviewed By: ezyang

Differential Revision: D12936031

fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1
2018-11-30 11:12:54 -08:00
c1c841a4e7 Changes based on @gchanan's review of #13420 (#14441)
Summary:
```
The most significant change is that this fixes the error message when
indexing an empty tensor with an out-of-bounds index. For example:

  x = torch.ones(10, 0)
  x[:, [3, 4]]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14441

Differential Revision: D13226737

Pulled By: colesbury

fbshipit-source-id: d1c4a35a30e3217e3d1727d13f6b354a4a3b2a24
2018-11-30 11:03:20 -08:00
edb3ddf1a5 Accumulate grad fix (#14587)
Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/13337.

I don't think the lint errors in the original PR had to do with files I touched, so hopefully the rebase fixes them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14587

Differential Revision: D13277428

Pulled By: soumith

fbshipit-source-id: f04c186b1dd4889b4250597eef87f9e9bf7b2426
2018-11-30 10:49:15 -08:00
67308a9323 Fix expanded mvn and lowrankmvn (#14557)
Summary:
This PR fixes an issue of the slowness expanded MVN.

A notebook to show the problem is [here](https://gist.github.com/fehiepsi/b15ac2978f1045d6d96b1d35b640d742). Basically, mvn's sample and log_prob have expensive computations based on `cholesky` and `trtrs`. We can save a lot of computation based on caching the unbroadcasted version of `scale_tril` (or `cov_diag`, `cov_factor` in lowrank mvn).
When expanding, this cached tensor should not be expanded together with other arguments.

Ref: https://github.com/uber/pyro/issues/1586

cc neerajprad fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14557

Differential Revision: D13277408

Pulled By: soumith

fbshipit-source-id: a6b16f999b008d5da148ccf519b7f32d9c6a5351
2018-11-30 10:49:13 -08:00
2e0f3b038c Tensor construction: combine Resize+mutable_data - 2/4 (#14205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14205

Original commit changeset: 8f9fb55842ae

Reviewed By: dzhulgakov

Differential Revision: D13126263

fbshipit-source-id: 12ba89e31b7738a81ec5c660ea7b79e8576c35dc
2018-11-30 10:46:58 -08:00
f6354d903a Unit tests need better compilation flow (#14547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14547

Unit tests used in dnnlowp need a better compilation flow as some of them need avx. Disabling for now so that pytorch builds with fbgemm.

Reviewed By: jianyuh

Differential Revision: D13240933

fbshipit-source-id: e2e187b758c5d89e524470cd261ce35493f427a2
2018-11-30 09:40:29 -08:00
aa842fe101 clean up linkage options (#14609)
Summary: minor code cleanup

Differential Revision: D13277803

Pulled By: soumith

fbshipit-source-id: 5ef925fe95037cab540b329054d7070c1ea7031e
2018-11-30 09:36:59 -08:00
ad1b874a36 set mkl_set_dynamic to false (#13868)
Differential Revision: D13277331

Pulled By: soumith

fbshipit-source-id: 692bb7d5157235e00dea4776d1991bb07e16ff85
2018-11-30 09:29:43 -08:00
37627a182b fix USE_SYSTEM_NCCL build (#14606)
Summary:
fixes https://github.com/pytorch/pytorch/issues/14537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14606

Differential Revision: D13274156

Pulled By: soumith

fbshipit-source-id: f834715e8e17dacf60be459b0efffba1d4df40ae
2018-11-29 23:36:17 -08:00
ff91de43de Set output of aten::mm to have the same output type as the original node after op canonicalization. (#14602)
Summary:
In CanonalizeOp, addmm is separated into mm and add. But output dimension and type are not preserved for the aten::mm node. Fixing this so that the dumped graph after this pass contains accurate information.
sample output:
before:
%6 : Dynamic = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0]
after:
%6 : Float(32, 200) = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14602

Differential Revision: D13273754

Pulled By: soumith

fbshipit-source-id: 82e22b5f30e9eb6ba9249c5a2216955421f39cc7
2018-11-29 23:24:27 -08:00
89c3dbcad8 Add binary cross entropy to standard lib
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14583

Differential Revision: D13269423

Pulled By: driazati

fbshipit-source-id: 7cc1594d8189c3e8f2d4ce0462fdc0a03683006e
2018-11-29 22:23:13 -08:00
1f6d9f44fc Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13272741

Pulled By: driazati

fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce
2018-11-29 22:18:55 -08:00
3648c269e9 Misc distributed documentation updates (#14605)
Summary:
* s/environmental/environment/g
* Casing (CUDA, InfiniBand, Ethernet)
* Don't embed torch.multiprocessing.spawn but link to it (not part of the package)
* spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605

Differential Revision: D13273480

Pulled By: pietern

fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1
2018-11-29 21:51:43 -08:00
11ef5191ff Enable tests for CPU tensors in test_distributed.py (#14572)
Summary:
These were not enabled after adding support in the Gloo backend. The
argument checks in ProcessGroupGloo raised an error in two cases:

* If the input tensor list to scatter was ``[None]`` on processes other
  than the source process.
* If the output tensor list to gather was ``[None]`` on processes other
  than the destination process.

This commit prepares these arguments explicitly instead of boxing them
at the process group call site.

This fixes #14536.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14572

Differential Revision: D13272812

Pulled By: pietern

fbshipit-source-id: 12cb0d85ec92f175365cbada585260f89330aad8
2018-11-29 21:39:02 -08:00
1975917d0e fix copy_ (#14593)
Summary:
Closes https://github.com/pytorch/pytorch/issues/14590
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14593

Differential Revision: D13272510

Pulled By: jamesr66a

fbshipit-source-id: b6921a98460c371d435277c416dad0b5ab0fec8c
2018-11-29 20:31:53 -08:00
220ce8046e Binding for prctl(PR_SET_PDEATHSIG) (#14491)
Summary:
If torch.multiprocessing.spawn is used to launch non-daemonic
processes (the default since #14391), the spawned children won't be
automatically terminated when the parent terminates.

On Linux, we can address this by setting PR_SET_PDEATHSIG, which
delivers a configurable signal to child processes when their parent
terminates.

Fixes #14394.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491

Differential Revision: D13270374

Pulled By: pietern

fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c
2018-11-29 20:09:19 -08:00
9127ab3866 Fixed new_group won't work for two or more different rank groups (#14529)
Summary:
This fixed two things:

(1) NCCL group doesn't support 2 or more groups, this is because, we need a group name in ProcessGroupNCCL class to keep track of the ProcessGroup ID within that group name, and also the NCCL unique ID within that group name and process group ID.  Otherwise, different processes will create different NCCL PG in different orders and can clash on these names.  This will fix the NCCL problem.

(2)  When using new_group, each rank should enter this function and update its global group name counter to ensure that every rank always operates on the same group name.

With both fixes: repro code in: https://github.com/pytorch/pytorch/issues/14528 should work with both NCCL and Gloo backends.

```
tengli@learnfair096:~$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=30000 ~/github_issues/nccl_group.py
rank: 0 - val: 6.0
rank: 2 - val: 6.0
rank: 3 - val: 6.0
rank: 1 - val: 6.0
rank: 4 - val: 22.0
rank: 6 - val: 22.0
rank: 5 - val: 22.0
rank: 7 - val: 22.0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14529

Differential Revision: D13253434

Pulled By: teng-li

fbshipit-source-id: 8eb45882b996b06d951fc9a306d5de86a42e8b84
2018-11-29 19:57:47 -08:00
e227aa9e2e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 44cd40cc9bc25629ec9547327a515bac22e5c905
2018-11-29 19:46:35 -08:00
67e3905bc6 Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script
Differential Revision:
D13268293

Original commit changeset: cb33c6dcdadd

fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130
2018-11-29 19:19:35 -08:00
0d3cb91d8c Make env init_method support both env and args for rank and size (#14494)
Summary:
Fixing: https://github.com/pytorch/pytorch/issues/14446

This was a supported behavior in old torch.distributed. We want to support it in the new release.

Test should cover all combination of scenario when we have either env or arg set up for rank or size or both
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14494

Differential Revision: D13253433

Pulled By: teng-li

fbshipit-source-id: c05974d84f1bdf969f74ec45763e11a841fe4848
2018-11-29 18:48:20 -08:00
1a9602d5db Delete caffe2_cuda_full_device_control (#14283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14283

According to Yangqing, this code was only used by us to do some end-to-end
performance experiments on the impact of cudaSetDevice and cudaGetDevice.  Now
that the frameworks are merged, there are a lot of bare calls to those functions
which are not covered by this flag.  It doesn't seem like a priority to restore
this functionality, so I am going to delete it for now.  If you want to bring
it back, you'll have to make all get/set calls go through this particular
interfaces.

Reviewed By: dzhulgakov

Differential Revision: D13156472

fbshipit-source-id: 4c6d2cc89ab5ae13f7c816f43729b577e1bd985c
2018-11-29 18:33:22 -08:00
8617b780cf Replace use of 'int' with more descriptive 'DeviceIndex' or 'StreamId'. (#14282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14282

This also is a substantive change, as 'DeviceIndex' and 'StreamId' are
narrower types than 'int'.

Reviewed By: Yangqing, smessmer

Differential Revision: D13156471

fbshipit-source-id: 08aa0f70c4142415b6bd4d17c57da0641c1d0e9a
2018-11-29 18:33:21 -08:00
fd31eae9ad Switch import/export to python printing (#14400)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit.

This changes the way methods are defined in TorchScript archives to use
PythonPrint rather than ONNX protobufs.

It also updates torch.proto to directly document the tensor data
structure actually being serialized.

Notes:
* because PythonPrint prints all the methods at once per module, this
  removes MethodDef in favor of a single torchscript_area and a separate
  caffe2_graphs entry. Note that NetDef's already have method names,
  so there is no need or a separate method name entry.
* This switches cpp/pickle area to RecordRef (references to a file in
  the container format) since it is possible the data in these arenas
  may be large and not suited to json ouput.
* Removes 'annotations' -- annotations should be re-added on the first
  commit that actually has a practical use for them. In the current state
  it is unlikely they are representing the right information.
* Some expect files have changed because PythonPrint is preserving more
  debug name information for parameter names.
* MethodEncoder (the ONNX output format) has been deleted. There is still
  some cleanup possible combining EncoderBase and GraphEncode now that there
  is only a single pathway using EncoderBase.
* This incorporates the changes from #14397
  to define TensorDef
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400

Reviewed By: suo

Differential Revision: D13231800

Pulled By: zdevito

fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571
2018-11-29 17:53:49 -08:00
2b7345bcd5 PT1 distributed doc update (#14530)
Summary:
Removed an incorrect section. We don't support this. I wrote this from my memory :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530

Differential Revision: D13253471

Pulled By: teng-li

fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15
2018-11-29 17:50:47 -08:00
75eccffdfe Add InstanceNorm, Distance modules to Script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13268293

Pulled By: driazati

fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd
2018-11-29 17:26:29 -08:00
15e8bb379e Add List to annotations (#14482)
Summary:
This PR adds a polyfill for `typing.List` for Python versions that don't
support `typing` as a builtin. It also moves the type defintions from
`annotations.py` so that they can be used in `torch.nn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482

Differential Revision: D13237570

Pulled By: driazati

fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd
2018-11-29 17:23:29 -08:00
2752ad8045 Automatic update of fbcode/onnx to f461f7aad9987635b4aff108620ed7918f002d19 (#14568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14568

Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e

Included changes:
- **[f461f7a](https://github.com/onnx/onnx/commit/f461f7a)**: Show the op's type and name when the shape inference is failed. (#1623) <Jerry>
- **[ab8aaf9](https://github.com/onnx/onnx/commit/ab8aaf9)**: Add scan test case (#1586) <G. Ramalingam>
- **[c95357e](https://github.com/onnx/onnx/commit/c95357e)**: link the tutorial (#1650) <Lu Fang>
- **[d7e2420](https://github.com/onnx/onnx/commit/d7e2420)**: Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin>
- **[6425108](https://github.com/onnx/onnx/commit/6425108)**: Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang>
- **[295889c](https://github.com/onnx/onnx/commit/295889c)**: use an empty initializer to create map (#1643) <Lu Fang>
- **[e38f3ec](https://github.com/onnx/onnx/commit/e38f3ec)**: Remove redundant const (#1639) <daquexian>
- **[ea694bf](https://github.com/onnx/onnx/commit/ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen>
- **[6db386e](https://github.com/onnx/onnx/commit/6db386e)**: make output shape clear enough for Softmax family (#1634) <Lu Fang>
- **[2b67c6e](https://github.com/onnx/onnx/commit/2b67c6e)**: fix batchnorm doc (#1633) <Lu Fang>
- **[c901784](https://github.com/onnx/onnx/commit/c901784)**: remove inappropriate consts (#1632) <Lu Fang>
- **[de82119](https://github.com/onnx/onnx/commit/de82119)**: Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS>
- **[d7ffe3b](https://github.com/onnx/onnx/commit/d7ffe3b)**: Update Optimizer Docs (#1607) <Armen>
- **[d09d139](https://github.com/onnx/onnx/commit/d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto>
- **[eb4b7c2](https://github.com/onnx/onnx/commit/eb4b7c2)**: allow variadic parameters of different types (#1615) <G. Ramalingam>
- **[4166246](https://github.com/onnx/onnx/commit/4166246)**: Fix onnxifi test (#1617) <Yinghai Lu>
- **[6706a4d](https://github.com/onnx/onnx/commit/6706a4d)**: Fix a bug in vector address access (#1598) <Raymond Yang>
- **[ae39866](https://github.com/onnx/onnx/commit/ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari>
- **[45ba661](https://github.com/onnx/onnx/commit/45ba661)**: Handle new types in the switch. (#1608) <Dmitri Smirnov>
- **[14853b6](https://github.com/onnx/onnx/commit/14853b6)**: Bump docker image version to 230 used in CircleCI (#1606) <bddppq>
- **[e0993b8](https://github.com/onnx/onnx/commit/e0993b8)**: [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov>
- **[e6965cc](https://github.com/onnx/onnx/commit/e6965cc)**: Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov>
- **[75b782f](https://github.com/onnx/onnx/commit/75b782f)**: In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq>
- **[c05b364](https://github.com/onnx/onnx/commit/c05b364)**: Make CI log less verbose (#1595) <bddppq>
- **[fa568e4](https://github.com/onnx/onnx/commit/fa568e4)**: Loop type shape inferencing (#1591) <Scott McKay>
- **[937e64c](https://github.com/onnx/onnx/commit/937e64c)**: add uint8 (#1590) <Lu Fang>
- **[f86e951](https://github.com/onnx/onnx/commit/f86e951)**: Add domain as an optional parameter for make_node function (#1588) <Young Kim>
- **[ff45588](https://github.com/onnx/onnx/commit/ff45588)**: Remove unreachable code in shape_inference.h (#1585) <Changming Sun>
- **[f7dcad0](https://github.com/onnx/onnx/commit/f7dcad0)**: Add several hyperbolic function ops. (#1499) <Sergii Dymchenko>
- **[a60ac7d](https://github.com/onnx/onnx/commit/a60ac7d)**: Add OneHot op to ONNX. (#1567) <Spandan Tiwari>
- **[f6c3a7e](https://github.com/onnx/onnx/commit/f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov>
- **[88d1784](https://github.com/onnx/onnx/commit/88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari>
- **[20041b7](https://github.com/onnx/onnx/commit/20041b7)**: Add type shape inferencing for the If operator (#1571) <Scott McKay>
- **[d6c4c75](https://github.com/onnx/onnx/commit/d6c4c75)**: Add a virtual destructor to GraphInferencer (#1574) <Changming Sun>
- **[a339598](https://github.com/onnx/onnx/commit/a339598)**: fix ConvTranspose spec (#1566) <Wenhao Hu>

Reviewed By: zrphercule

Differential Revision: D13263831

fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb
2018-11-29 16:31:56 -08:00
dc7498c84d add gloo support for reduce on GPU (#14443)
Summary:
as titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14443

Reviewed By: pietern

Differential Revision: D13222907

Pulled By: janewangfb

fbshipit-source-id: f418c5d84880196f97089114d02957cf739243f8
2018-11-29 16:19:39 -08:00
69d3c00ae1 Expunge use of type() from SparseTensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14546

Reviewed By: gchanan

Differential Revision: D13258512

fbshipit-source-id: b2d562b6c5228288f60f02beab3c44c50163248f
2018-11-29 16:04:18 -08:00
c7f828809b Expunge occurrences of type() from scalar_test (#14545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14545

Self explanatory

Reviewed By: gchanan

Differential Revision: D13258513

fbshipit-source-id: abce357de57b95cde58b3894c251da519ede6b53
2018-11-29 16:04:16 -08:00
9aea856115 Expunge use of type() in Distributions.cpp (#14544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14544

Modern usage is options().  This doesn't have a functional
difference, because all call sites were CPU only (where
getting the device index right doesn't matter.)

Reviewed By: gchanan

Differential Revision: D13258252

fbshipit-source-id: c70f8d618ee9caf37ff2469cceaa439348b6114c
2018-11-29 16:04:14 -08:00
7879c979b5 Expunge uses of type() from EmbeddingBag. (#14543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14543

The modern way to do this is to use options().  It doesn't
make a functional difference here because everything is CPU
(so loss of device information is not a big deal), but
it's definitely safer this way.

Reviewed By: gchanan

Differential Revision: D13257847

fbshipit-source-id: afbc9f7f8d4ca5a8b1cf198997c307e27a2c3333
2018-11-29 16:04:12 -08:00
6fe1867c23 Expunge direct device index handling from tensor_conversion_dispatch (#14421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14421

Last time I looked this, I bailed because it seemed like there were
a lot of sites to fix.  Well, I need this to work properly for out-of-place
HIPify, so I took another whack at it.  Changes should be pretty self-explanatory.

Reviewed By: gchanan

Differential Revision: D13221302

fbshipit-source-id: ed21e2668a1a629898a47358baf368fe680263a0
2018-11-29 16:04:10 -08:00
5805ef5a83 call raw_mutable_data when data type didn't match in BlobGetMutableTensor (#14513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14513

att

Reviewed By: dzhulgakov

Differential Revision: D13245875

fbshipit-source-id: 3398a1f41a6195e120ed574dee887070e86dfe1f
2018-11-29 15:18:58 -08:00
666d383a00 Add broadcast list default arg support (#14361)
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361

Differential Revision: D13192231

Pulled By: driazati

fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
2018-11-29 15:15:47 -08:00
a2d8e84594 Added launch bounds in VolumetricConvolution.cu (#14564)
Summary:
A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu.

This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564

Differential Revision: D13266136

Pulled By: soumith

fbshipit-source-id: 35464b20848bb0a1168e8f3b233172331c50b35b
2018-11-29 14:49:29 -08:00
0d663cec30 Unify cuda and hip device types in Caffe2 python front end (#14221)
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
2018-11-29 14:00:16 -08:00
bdaa0e38b8 Fix tautological-compare in aten/src/ATen/native/cuda/SummaryOps.cu (#14540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14540

refactor the HANDLE_SWITCH_CASE to avoid tautological-compare in macro

Reviewed By: ezyang

Differential Revision: D13255725

fbshipit-source-id: cfa64bb7bc53d19c93a693015202f207567690b4
2018-11-29 13:57:27 -08:00
eeb0d67b92 Update to export in onnx_aten_fallback option
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14492

Differential Revision: D13265701

Pulled By: zrphercule

fbshipit-source-id: b339c92078f73d152a14db7d5d2b3f5edda9dda6
2018-11-29 13:49:50 -08:00
2901777a0e Add back the MAX_JOBS=4 restriction to make rocm CI more stable (#14566)
Summary:
As a workaround before hcc has fixed high memory usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14566

Differential Revision: D13263555

Pulled By: bddppq

fbshipit-source-id: 479c7a76aff3919f028e03ef345795537480f0fa
2018-11-29 13:24:56 -08:00
1b0b2e69f8 assorted alias analysis fixes (#14556)
Summary:
- Correctly report whether nodes write to an alias set.
- Fix loop convergence.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14556

Differential Revision: D13261376

Pulled By: suo

fbshipit-source-id: 8123c0fb1f8f137a15bd82719be2d99e502bccc2
2018-11-29 13:09:26 -08:00
31b3d81714 Broadcast prim::FusedConcat inputs independently when checking kernels (#14503)
Summary:
Fixes #14483.

cc zou3519 mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14503

Differential Revision: D13256343

Pulled By: zou3519

fbshipit-source-id: 1c68a23f425be067a742bada7ee8cdfab7fc3fa2
2018-11-29 13:05:00 -08:00
cf059028f0 Do not load ROCm cmake files if USE_ROCM is off (#14261)
Summary:
Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build.
Should fix #14025

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261

Differential Revision: D13242090

Pulled By: bddppq

fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab
2018-11-29 11:17:19 -08:00
fb6806f6e9 Remove at references in c10 Allocator.h (#14434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14434

The referenced classes live now in c10, so we don't need to specify their namespace.

Reviewed By: ezyang

Differential Revision: D13224015

fbshipit-source-id: 6d154b8e3f9a1e38ff0407dbb1151f5c1d5df260
2018-11-29 11:07:22 -08:00
4ec6bd7356 Add sourceRank() to ProcessGroup::Work (#14453)
Summary:
This function is only implemented for the subclasses where it makes
sense. If it's not overridden it will throw an error. Having this
function removes the need for a pointer passing hack to pass the
source rank of a recv operation back to the caller. Instead, the
caller can now call `source_rank` on the work object and achieve
the same result.

Closes #11804.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14453

Differential Revision: D13230898

Pulled By: pietern

fbshipit-source-id: ef38f48bfaca8ef9a364e5be122951bafc9f8e49
2018-11-29 09:16:53 -08:00
7c24a16f82 Fixed typo for BCEWithLogitLoss doc comments (#14532)
Summary:
The math symbol was missing a prefix `:`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14532

Differential Revision: D13256077

Pulled By: soumith

fbshipit-source-id: 2359819d8aa664f915be1c436cbb0c0756504028
2018-11-29 08:22:19 -08:00
29d697aec4 typo in Module docstring
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14511

Differential Revision: D13246061

Pulled By: soumith

fbshipit-source-id: 6c13a2957c4c4324ab5d839d634689c61e25b0fe
2018-11-29 07:17:29 -08:00
44cb43bcc1 Jaliyae/samplers (#13870)
Summary:
Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870

Differential Revision: D13240120

Pulled By: soumith

fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599
2018-11-29 07:07:19 -08:00
9e93a02624 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13252887

Pulled By: driazati

fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
2018-11-28 23:31:25 -08:00
ba25b37e9b Updating submodules
Reviewed By: yns88

fbshipit-source-id: f957056bb48c583738c5defaf3d1f01cd7df3915
2018-11-28 23:31:23 -08:00
70e3736e20 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 9800251baaa09d9f7988eff340ef36e0ab11f579
2018-11-28 21:09:08 -08:00
db15f2e13f Fix version.groups() (#14505)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/14502

fmassa soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14505

Differential Revision: D13242386

Pulled By: goldsborough

fbshipit-source-id: faebae8795e1efd9c0ebc2294fe9648193d16624
2018-11-28 20:27:33 -08:00
6d63e9dbff Support Embedding + EmbeddingBag in Script + (Ignore flakey test) (#14509)
Summary:
Resubmitting PR #14415

The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.

Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509

Differential Revision: D13247917

Pulled By: eellison

fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15
2018-11-28 19:16:38 -08:00
105fa58748 pointwise_loss (#14134)
Summary:
Adding pointwise loss ops to weak_script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14134

Differential Revision: D13209455

Pulled By: eellison

fbshipit-source-id: 87fc0222121f34a2f4edb24c2da2a11124b097d8
2018-11-28 18:14:38 -08:00
186341c5dc Merge Caffe2 and PyTorch thread pool definitions (#14114)
Summary:
(1) Move Caffe2 thread pool to aten
(2) Use the same thread pool definition for PyTorch interpreter
(3) Make ivalue::Future thread-safe
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14114

Reviewed By: ilia-cher

Differential Revision: D13110451

Pulled By: highker

fbshipit-source-id: a83acb6a4bafb7f674e3fe3d58f7a74c68064fac
2018-11-28 18:10:20 -08:00
533668d7e4 Ensure that indices are on the same device as self
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14504

Reviewed By: wat3rBro

Differential Revision: D13242200

Pulled By: colesbury

fbshipit-source-id: 82731cee808681ec612d406342070640eb26e519
2018-11-28 17:54:32 -08:00
da9e49e586 Remove Context dependency from Tensor class (#14269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269

Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`)

For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling).

Reviewed By: ezyang

Differential Revision: D13117981

fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138
2018-11-28 15:45:38 -08:00
0cfbbceac3 Change Tensor::CopyFrom to a simple double dispatch (#14268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268

Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation.

Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs.

This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented).

This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification.

For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable.

Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one.

Also, please advise whether it's c10-worthy :)

Reviewed By: ezyang

Differential Revision: D13117987

fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5
2018-11-28 15:45:37 -08:00
f80d34a1c8 Update Tensor doc (#14339)
Summary:
Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`.
Update the `register_backward_hook` doc with a warning stating that it does not work in all cases.
Add support in the `_add_docstr` function to add docstring to attributes.

There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const.
I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const.
EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const.
Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it !
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339

Differential Revision: D13243266

Pulled By: ezyang

fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb
2018-11-28 15:28:17 -08:00
fb7e40b7eb nccl fixes (#14195)
Summary:
This has 4 changes

1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage
2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci
3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI
4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195

Differential Revision: D13237502

Pulled By: anderspapitto

fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55
2018-11-28 14:43:06 -08:00
ca55c5411f Clean up house on CUDAStream (#14247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14247

Just a bunch of clean up to get the code in a good state before we
enshrine it in c10.

Billing of changes:
- Inline all "pointer" API functions into their real implementations,
  so we don't have a bunch of dead pointer functions hanging around.
- Replace all occurrences of int64_t with DeviceIndex, as appropriate
- Rename device field to device_index
- Add documentation for everything in CUDAStream.h
- Bring CUDAStream to API parity with Stream (e.g., support equality)
- Delete uncheckedSetCurrentCUDAStream, it didn't work anyway because
  StreamId to internal pointer conversion has a bunch of ways it can
  fail.  Just hope for the best!

Reviewed By: dzhulgakov

Differential Revision: D13141949

fbshipit-source-id: a02f34921e3d8294bd77c262bd05da07d1740a71
2018-11-28 14:01:59 -08:00
3aeb288e40 Make clang-tidy shut up about Python C API macros.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14480

Reviewed By: goldsborough

Differential Revision: D13235001

fbshipit-source-id: cd7f00b12ed3d9ef0fb0d7bd6c428e21561ec1b6
2018-11-28 13:54:42 -08:00
e3711aa93f Make TensorImpl/StorageImpl safer (#14429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14429

- forbid copying
- make final what ought to be

Reviewed By: dzhulgakov

Differential Revision: D13223125

fbshipit-source-id: e6176cc916d4cd8370c835f243ca90d5c3124c4a
2018-11-28 13:41:49 -08:00
f6dfd9d545 Handle copying intrusive_ptr_target correctly (#14428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14428

See in-code comment

Reviewed By: ezyang

Differential Revision: D13223126

fbshipit-source-id: 1e87e6112bbcca6377ca04ef2ba25ef937931061
2018-11-28 13:41:48 -08:00
5f07b33857 Revert D13219647: [pytorch][PR] Support Embedding + EmbeddingBag in Script
Differential Revision:
D13219647

Original commit changeset: c90706aa6fbd

fbshipit-source-id: d189e717ba0773de43d633876bc3a688830a9303
2018-11-28 13:38:58 -08:00
aec4c19460 Remove StorageImpl::type() (#14139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14139

This seems neither be used nor implemented. Also, it is a c10->aten dependency which we don't want.

Reviewed By: ezyang

Differential Revision: D13112298

fbshipit-source-id: 0407c4c3ac9b02bbd6fca478336cb6a6ae334930
2018-11-28 13:32:38 -08:00
bcd7b03c2a Add XBlobGetMutableTensor that returns Tensor (#14424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14424

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14136

Since now Tensor is a shared_ptr, it doesn't make sense to have Tensor* around anymore,
so we want to change Tensor* to Tensor in the interface.
We added functions that work with `Tensor` instead of `Tensor*` in this diff.

To remove Tensor*, we'll do following
```
auto* Y = Ouptut(0);
Y->mutable_data...
```
-->
```
auto Y = Output(0);
Y.mutable_data...
```

But to run clangr codemod, we'll keep both APIs in different names, e.g. `Output` and `XOutput`, and do the refactor and then delete the old method and rename the new method into the old one.
For example for `Output`, we'll first codemod the callsites from `Output` to `XOutput`, then delete the old `Output` and rename `XOutput` to `Output` in the end.

Reviewed By: smessmer

Differential Revision: D12934074

fbshipit-source-id: d0e85f6ef8d13ed4e7a7505faa5db292a507d54c
2018-11-28 13:29:48 -08:00
0f62af4ab1 Add timeout kwarg to init_process_group (#14435)
Summary:
This applies to the gloo backend only. Timeout support for the NCCL and
MPI backends is tracked in issues #14371 and #14372 respectively.

When creating a new process group (either the global one or any subgroup
created through `new_group`) you can specify a timeout keyword
argument (of type datetime.timedelta). This timeout applies to all
collective operations executed against that process group, such that any
operation taking longer than the timeout will throw a runtime error.
Using a different, better catchable error type is tracked in #14433.

This fixes #14376.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14435

Differential Revision: D13234317

Pulled By: pietern

fbshipit-source-id: 973993b67994dc64861c0977cbb6f051ec9d87f6
2018-11-28 11:35:01 -08:00
7c4aef9dfc Add support for HIP to DispatchStub. (#14413)
Summary:
I feel a bit bad writing this patch, because there isn't really
any reason not to use the normal dispatch mechanism for CUDA
and HIP here (so we have *yet another dispatcher*), but I don't
really want to sign up to rewrite DispatchStub to deduplicate the
dispatcher right now.

Need to natively add support for HIP here, as I don't want to
have to HIPify files which are not in a CUDA directory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14413

Differential Revision: D13220358

Pulled By: ezyang

fbshipit-source-id: cc61218322589a1dc2ab8eb9d5ddd3c616f6b712
2018-11-28 11:07:45 -08:00
7749804099 Support Embedding + EmbeddingBag in Script (#14415)
Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415

Reviewed By: wanchaol

Differential Revision: D13219647

Pulled By: eellison

fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717
2018-11-28 10:52:30 -08:00
c32debb916 fix build error from D13188595 (#14481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14481

Fix build error in mode/opt

Reviewed By: dskhudia

Differential Revision: D13234688

fbshipit-source-id: 6c8515c45f75e7b88713a303f22990ad85d68beb
2018-11-28 10:46:33 -08:00
a02b3374d4 Revert D13144472: [fix] condition blob in while_op test changes data type
Differential Revision:
D13144472

Original commit changeset: af4d920a3148

fbshipit-source-id: 74d9f69fc66964b5e68b4b2cd2fd2be1f63e9d69
2018-11-28 10:43:22 -08:00
6039e25e8d Fix the build issue in setup.py due to cmake version type x.x.x.x vio… (#14331)
Summary:
See https://github.com/pytorch/pytorch/issues/13226
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14331

Differential Revision: D13234639

Pulled By: orionr

fbshipit-source-id: 87880057e84242e4af5ad6bf87e08831aa2c5459
2018-11-28 10:38:27 -08:00
8901935ad4 Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)
Summary:
Original PR: https://github.com/pytorch/pytorch/pull/11563
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473

Differential Revision: D13234208

Pulled By: ezyang

fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad
2018-11-28 09:28:26 -08:00
302caef154 Revert D13166626: [pytorch][PR] ignore generated caffe2 docs and virtualenvs
Differential Revision:
D13166626

Original commit changeset: 4f11228d8b5d

fbshipit-source-id: ff301f1791ca8a390767ae43cde8637dcd044d0c
2018-11-28 07:40:04 -08:00
c638f379b3 Make mean function work across multiple dimensions. (#14252)
Summary:
Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it.

Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252

Differential Revision: D13161157

Pulled By: umanwizard

fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c
2018-11-28 06:53:09 -08:00
68251fb931 Fix half tensor printing plus speedup large tensor printing (#14418)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863

The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions.

Some quick runtime analysis:

Before this PR:
```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After this PR

```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: b = a.cuda()

In [4]: %timeit str(b)
8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418

Reviewed By: weiyangfb

Differential Revision: D13226950

Pulled By: soumith

fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23
2018-11-28 06:13:06 -08:00
be7c618fd7 torch.sparse.sum() (#12430)
Summary:
- to fix #12241
- add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently
- this PR depends on #11253, and will need to be updated upon it lands
- [x] implement forward
- [x] implement backward
- performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py):
  - sum all dims is fastest for sparse tensor
  - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA
  - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse
  - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims`
    - optimize CUDA backward for now
      - using thrust for sort and binary search, but runtime not improved
  - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA
    - improve CPU and CUDA forward kernels

(nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 8.77 µs vs 72.9 µs | 42.5 µs vs 108 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 112 µs vs 4.47 ms | 484 µs vs 407 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 141 µs vs 148 µs | 647 µs vs 231 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 235 µs vs 1.23 ms | 781 µs vs 213 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 48.5 µs vs 360 µs | 160 µs vs 2.03 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 258 µs vs 1.22 ms | 798 µs vs 224 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 204 µs vs 882 µs | 443 µs vs 133 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 709 µs vs 1.15 ms | 893 µs vs 202 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 39.8 µs vs 81 µs | 42.4 µs vs 113 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 747 µs vs 4.7 ms | 2.4 ms vs 414 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 1.04 ms vs 126 µs | 5.03 ms vs 231 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.12 ms vs 1.24 ms | 5.99 ms vs 213 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 133 µs vs 366 µs | 463 µs vs 2.03 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.56 ms vs 1.22 ms | 6.11 ms vs 229 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.53 ms vs 799 µs | 824 µs vs 134 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 5.15 ms vs 1.09 ms | 7.02 ms vs 205 µs

- after improving CPU and CUDA forward kernels
  - in `(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`.
  - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims`

(nnz,   sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 7 µs vs 69.5 µs | 31.5 µs vs 61.6 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 11.3 µs vs 4.72 ms | 35.2 µs vs 285 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 197 µs vs 124 µs | 857 µs vs 134 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 124 µs vs 833 µs | 796 µs vs 106 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 20.5 µs vs 213 µs | 39.4 µs vs 1.24 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 131 µs vs 830 µs | 881 µs vs 132 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 95.8 µs vs 409 µs | 246 µs vs 87.2 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 624 µs vs 820 µs | 953 µs vs 124 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 45.3 µs vs 72.9 µs | 33.9 µs vs 57.2 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 81.4 µs vs 4.49 ms | 39.7 µs vs 280 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 984 µs vs 111 µs | 6.41 ms vs 121 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.45 ms vs 828 µs | 6.77 ms vs 113 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 74.9 µs vs 209 µs | 37.7 µs vs 1.23 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.48 ms vs 845 µs | 6.96 ms vs 132 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.14 ms vs 411 µs | 252 µs vs 87.8 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 4.53 ms vs 851 µs | 7.12 ms vs 128 µs

- time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`:
```
>>> d = [1000, 1000, 2, 2]
>>> nnz = 10000
>>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)),
               torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, d[2], d[3])
>>> size = torch.Size(d)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda()
>>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_()
>>> data = S2.clone()
>>> S.copy_(S2)
>>> y = S * 2
>>> torch.cuda.synchronize()
>>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize()
7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430

Differential Revision: D12878313

Pulled By: weiyangfb

fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe
2018-11-28 02:19:12 -08:00
a2fcd4dee5 Ensure FP16 rowwise Adagrad can be run
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317

Reviewed By: hyuen

Differential Revision: D10190778

fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31
2018-11-28 02:15:36 -08:00
e8754ee017 use fbgemm's im2col fusion and thread partitioning (#14350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14350

acc32 for now. Will have a separate diff for acc16 but that will need another out processing that does sparse convolution without im2col.

Reviewed By: dskhudia

Differential Revision: D13188595

fbshipit-source-id: e8faee46c7ea43e4a600aecb8b8e93e6c860a8c8
2018-11-28 01:13:11 -08:00
a38ed0268e PT1 Stable Release Distributed Documentation (#14444)
Summary:
The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080

Tested by previewing the sphinx generated webpages. All look good.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444

Differential Revision: D13227675

Pulled By: teng-li

fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86
2018-11-28 00:34:11 -08:00
3d98810fbd Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit
Differential Revision:
D13192230

Original commit changeset: 36488960b6c9

fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909
2018-11-28 00:23:09 -08:00
7d07fcd215 Fixed SyncParam/QueueReduction/SyncReduction test for 2+ GPUs (#14452)
Summary:
Fixed: https://github.com/pytorch/pytorch/issues/14445

Also bumped up timeout to 30 seconds, since on 8-GPU machines, DDP test will take more than 15 seconds sometimes.

Tested on 8 GPU machines:
```
tengli@learnfair062:~/pytorch/test$ python test_c10d.py --verbose
test_dist_broadcast_coalesced_gloo (__main__.DistributedDataParallelTest) ... ok
test_dist_broadcast_coalesced_nccl (__main__.DistributedDataParallelTest) ... skipped 'Test skipped due to known issues'
test_fp16 (__main__.DistributedDataParallelTest) ... ok
test_gloo_backend (__main__.DistributedDataParallelTest) ... ok
test_nccl_backend (__main__.DistributedDataParallelTest) ... ok
test_queue_reduction (__main__.DistributedDataParallelTest) ... ok
test_sync_params_no_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_params_with_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_reduction (__main__.DistributedDataParallelTest) ... ok
test_set_get (__main__.FileStoreTest) ... ok
test_set_get (__main__.PrefixFileStoreTest) ... ok
test_set_get (__main__.PrefixTCPStoreTest) ... ok
test_allgather_basics (__main__.ProcessGroupGlooTest) ... ok
test_allgather_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_checks (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_gather_basics (__main__.ProcessGroupGlooTest) ... ok
test_gather_checks (__main__.ProcessGroupGlooTest) ... ok
test_reduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_reduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_scatter_basics (__main__.ProcessGroupGlooTest) ... ok
test_scatter_checks (__main__.ProcessGroupGlooTest) ... ok
test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... ok
test_timeout_kwarg (__main__.ProcessGroupGlooTest) ... ok
test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok
test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_barrier (__main__.ProcessGroupNCCLTest) ... ok
test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok
test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_common_errors (__main__.RendezvousEnvTest) ... ok
test_nominal (__main__.RendezvousEnvTest) ... ok
test_common_errors (__main__.RendezvousFileTest) ... ok
test_nominal (__main__.RendezvousFileTest) ... ok
test_common_errors (__main__.RendezvousTCPTest) ... ok
test_nominal (__main__.RendezvousTCPTest) ... ok
test_unknown_handler (__main__.RendezvousTest) ... ok
test_address_already_in_use (__main__.TCPStoreTest) ... ok
test_set_get (__main__.TCPStoreTest) ... ok

----------------------------------------------------------------------
Ran 46 tests in 162.980s

OK (skipped=1)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14452

Differential Revision: D13230652

Pulled By: teng-li

fbshipit-source-id: 88580fe55b3a4fbc7a499ca3b591958f11623bf8
2018-11-27 21:58:34 -08:00
4cdcbbf410 Use nn module tests in test_jit (#14238)
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13192230

Pulled By: driazati

fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
2018-11-27 21:19:51 -08:00
a0def0b57e check for invalid ranges in torch.arange
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915

Differential Revision: D13222110

Pulled By: nairbv

fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
2018-11-27 20:38:56 -08:00
b08a186153 roll along multiple dimensions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874

Differential Revision: D13223669

Pulled By: nairbv

fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04
2018-11-27 20:32:30 -08:00
662f66ebb9 Add poisson_nll_loss to script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14420

Differential Revision: D13220726

Pulled By: driazati

fbshipit-source-id: 6c08a0050075beafcc8ba413c9603b273870c70c
2018-11-27 19:39:16 -08:00
d75f751bec Add boolean dispatch for function overloading (#14425)
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.

This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Fixes #14081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425

Differential Revision: D13222104

Pulled By: driazati

fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
2018-11-27 19:36:47 -08:00
23f901a737 fix enable_cpu_fuser
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14440

Differential Revision: D13226354

Pulled By: zdevito

fbshipit-source-id: e4ed023eece8b5b670a4a27d24a8688907b36b90
2018-11-27 19:14:10 -08:00
82175f31b4 Move Affine grid to C++ (#14392)
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392

Differential Revision: D13219698

Pulled By: eellison

fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
2018-11-27 18:38:11 -08:00
6f2307ba6a Allow building libraries with setuptools that dont have abi suffix (#14130)
Summary:
When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python.

This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix.

I've added a test `setup.py` in a `no_python_abi_suffix_test` folder.

Fixes https://github.com/pytorch/pytorch/issues/14188

t-vi fmassa soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14130

Differential Revision: D13216575

Pulled By: goldsborough

fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa
2018-11-27 17:35:53 -08:00
23d111c87f Fix clang tidy errors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14427

Differential Revision: D13222381

Pulled By: wanchaol

fbshipit-source-id: d90d210a810e95bf0eb404f9c1c304f4e6a3f61e
2018-11-27 17:30:50 -08:00
226a01e5a1 Handling of pretty-printing methods (#14378)
Summary:
Stacked on #14176, review only the last commit.
* Print parameters to methods as self.weight rather than as extra inputs.
* Print entire set of methods out as a single string
* Update test code to test the module-at-a-time export/import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14378

Differential Revision: D13198463

Pulled By: zdevito

fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8
2018-11-27 17:10:23 -08:00
75bac5ab32 Eliminate necessity of HIPify on AccumulateType.h (#14412)
Summary:
I'd like to NOT HIPify files that are not in a cuda/
directory, so hand-HIPify AccumulateType.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14412

Differential Revision: D13221801

Pulled By: ezyang

fbshipit-source-id: d1927cfc956e50a6a5e67168ac0e1ce56ecd1e0b
2018-11-27 16:39:55 -08:00
1620161d6b when BUILD_CAFFE2_OPS is OFF, torch-python needs a direct dep on nccl (#14430)
Summary:
https://github.com/pytorch/pytorch/issues/14431 tracks supporting this with CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14430

Differential Revision: D13224079

Pulled By: anderspapitto

fbshipit-source-id: 47d7900d25910ed61585b93f9003acd1b2630a9f
2018-11-27 15:53:31 -08:00
006505bb8f Speed-up "advanced" indexing operations (#13420)
Summary:
This speeds-up "advanced" indexing (indexing a tensor by a tensor)
on CPU and GPU. There's still a bunch of work to do, including
speeding up indexing by a byte (boolean) mask and speeding up the derivative
calculation for advanced indexing.

Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape).

| Test case             | CPU (old vs. new)   | CUDA (old vs. new)     |
|-----------------------|---------------------|------------------------|
| 1024x1024 -> 512x1024 | 225 us vs. **57 us**  | 297 us vs. **47 us** |
| 1024x1024 -> 1024x512 | 208 us vs. **153 us** | 335 us vs. **54 us** |
| 50x50 -> 20000x50     | 617 us vs. **77 us**  | 239 us vs. **54 us** |
| 50x50 -> 50x20000     | 575 us vs. **236 us** | 262 us vs. **58 us** |
| 2x5x10 -> 10          | 65 us  vs. **18 us**  | 612 us vs. **93 us** |

See #11647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13420

Reviewed By: soumith

Differential Revision: D13088936

Pulled By: colesbury

fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2
2018-11-27 15:23:59 -08:00
0199d59d3a Resubmit: Set the correct engine name for position weighted pooling when fp16 is used for training
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13768

Reviewed By: xianjiec

Differential Revision: D12996103

fbshipit-source-id: 5ca4cda4210f68ece2b5d6eced8cf52ee91fb36f
2018-11-27 14:51:56 -08:00
ae1b37650c Windows local build: restore original working dir after activating VC environment (#14416)
Summary:
`call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64` seems to change the working dir to `C:\Users\Administrator\source`, and we need to cd back to the PyTorch directory before running `git submodule update --init --recursive`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14416

Differential Revision: D13222269

Pulled By: yf225

fbshipit-source-id: a0eb3311fb11713b1bb8f52cd13e2c21d5ca9c7b
2018-11-27 14:18:45 -08:00
5c84145354 condition blob in while_op test changes data type (#14279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14279

att

Reviewed By: smessmer

Differential Revision: D13144472

fbshipit-source-id: af4d920a3148c648d1a428a5bcd56da19ea8c38c
2018-11-27 14:16:39 -08:00
ba6c49cb9c Add test of ONNX_ATEN (#14259)
Summary:
In #14239 we fixed ONNX_ATEN.
In order to make sure its correctness in the future, we should add related test case.
We use torch.fmod() to test ONNX_ATEN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14259

Differential Revision: D13204610

Pulled By: zrphercule

fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7
2018-11-27 13:51:51 -08:00
e392d428b1 Allowing TaskGroups to carry remote nets (#14342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14342

Sometimes, when we are creating a TaskGroup, we are in fact creating a TaskGroup for a distributed job. In some cases, we may want to register a few nets as "remote" to a TaskGroup. The remote net should have sufficient attributes on where they should be executed later on.

This diff adds the remote net attribute to the TaskGroup class. It exposes two minimal functionalities: adding a remote net, and getting all remote nets added to a TaskGroup.

Reviewed By: d4l3k

Differential Revision: D13188320

fbshipit-source-id: efe947aec30817e9512a5e18be985713b9356bdc
2018-11-27 13:34:11 -08:00
b7856a32f6 Add scaffolding for HIP backend in ATen/core. (#14285)
Summary:
This code doesn't actually do anything, but it will be the
groundwork necessary to change PyTorch's HIPIFY pass from reusing
CUDA identifiers directly, to actually switching to using HIP
identifiers (moving us closer to a world where we can compile
both HIP and CUDA PyTorch side-by-side.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14285

Differential Revision: D13158851

Pulled By: ezyang

fbshipit-source-id: df2462daa5d0d4112455b67bd3067d60ba55cda5
2018-11-27 13:21:42 -08:00
1b93cb7631 Document device_guard in native_functions.yaml (#14235)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14235

Differential Revision: D13145780

Pulled By: ezyang

fbshipit-source-id: 0e93bf009ad492551bcdcada0357f2fef529e67d
2018-11-27 13:17:23 -08:00
1b80644b4d Revert D13192228: [pytorch][PR] [jit] Add boolean dispatch for function overloading
Differential Revision:
D13192228

Original commit changeset: fce33c400c1f

fbshipit-source-id: 75c9991dc7097f9513c6c89d16eff2de6e287c3b
2018-11-27 13:14:42 -08:00
f9c27d60c3 Remove fake dependencies from TensorImpl to caffe2 (#14141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14141

These includes weren't actually used, let's remove them.

Reviewed By: ezyang

Differential Revision: D13113129

fbshipit-source-id: 816995e280b81bf99002772ea8aea458bdfcd2c7
2018-11-27 12:59:56 -08:00
3257ac1ff3 Fix include paths for TensorTypeId.h and TensorTypeIdRegistration.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14070

Reviewed By: ezyang

Differential Revision: D13081610

fbshipit-source-id: 685994a15a2cd15e9e5447cf77671343de5dd278
2018-11-27 12:59:54 -08:00
ed10ef97da Move TensorTypeId to c10/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14327

Reviewed By: ezyang

Differential Revision: D13131338

fbshipit-source-id: c4682cb6ed6fe4cd1636e09d918eef6e90c836f1
2018-11-27 12:59:52 -08:00
6c2e816268 Fix include paths for Storage.h and StorageImpl.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14062

Reviewed By: ezyang

Differential Revision: D13081603

fbshipit-source-id: c272b715ef2f513d21d1c3f34fbf79eec6946441
2018-11-27 12:59:50 -08:00
3d4d09fe06 Move Storage and StorageImpl to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14061

Reviewed By: ezyang

Differential Revision: D13081608

fbshipit-source-id: 1ea2d32e9ec9293b6ffa4b9e76c674cca55d5a1c
2018-11-27 12:59:48 -08:00
507ed9032e Fix include paths for Allocator.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14060

Reviewed By: ezyang

Differential Revision: D13081605

fbshipit-source-id: 02f23af174c0f0c38fb0163c2dfef3873ff5635d
2018-11-27 12:59:46 -08:00
3a71d5ee49 Move Allocator.h to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14059

Reviewed By: ezyang

Differential Revision: D13081606

fbshipit-source-id: d6ad59ad4e3d363268cd4307b6c999a168681246
2018-11-27 12:59:44 -08:00
0b10f147b6 Move UniqueVoidPtr to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14058

Reviewed By: dzhulgakov

Differential Revision: D13081602

fbshipit-source-id: e91ccf9fba9a7a02f99ed90b7a3a0fe7afd56832
2018-11-27 12:59:42 -08:00
8b1ca2810b Move ScalarTypeUtils.h to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14024

Reviewed By: ezyang

Differential Revision: D13081604

fbshipit-source-id: d7a09610f64eb2e9dd831bbb3c85f20691251594
2018-11-27 12:59:40 -08:00
44e21cf5bb Fix include paths for Scalar.h and ScalarType.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14023

Reviewed By: ezyang

Differential Revision: D13081609

fbshipit-source-id: c27eeafa381b39e043f0261ea7f6f634ee8bc238
2018-11-27 12:59:38 -08:00
50e9c56830 Move Scalar and ScalarType to c10/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14022

Reviewed By: ezyang

Differential Revision: D13015236

fbshipit-source-id: 92aac4e342d85f75a31837b2943fa5b80f0c35c9
2018-11-27 12:59:36 -08:00
3fca4bde50 Trace in-place ops (#14254)
Summary:
This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted.

I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254

Reviewed By: eellison

Differential Revision: D13166691

Pulled By: suo

fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b
2018-11-27 12:40:56 -08:00
ffbc3905a1 Fixed torch.multiprocessing.spawn for not being able to spawn like dataloader workers (#14391)
Summary:
Should fix: https://github.com/pytorch/pytorch/issues/14390

Now imagenet example works fine with multiprocessing and more than 1 dataloader worker
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14391

Reviewed By: calebho

Differential Revision: D13209800

Pulled By: teng-li

fbshipit-source-id: e8abc0fb38d4436cf3474dcbba0e28f4290e4d29
2018-11-27 12:37:41 -08:00
5fefb29a53 Tensor construction: combine Resize+mutable_data - 4/4 (#13856)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13856

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13007310

fbshipit-source-id: 941f064ef8934bb17fbfb706e6ed3db173b5d268
2018-11-27 12:34:25 -08:00
e22cc7c072 Print default values and introduce ir view classes (#14176)
Summary:
[Stacked commit, only review the last commit]

This PR adds support for printing default values in python printing as well as the logic
for parsing default values back in using the parser. For simplicity, this PR simply
creates a subgraph of the constant expressions and then runs that graph to generate the defaults.
A more lightweight approach should be possible later, but would require more machinery.

To make reading code in the printer easier, this also add ir_views.h.
Similar to tree_views.h these classes can provide views of some commonly used IR nodes
that have complicated structure and common operations on that structure.

Currently it has only read-only views for prim::If and prim::Loop,
but we should eventually add helpers to manipulate If/Loop nodes as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14176

Differential Revision: D13198455

Pulled By: zdevito

fbshipit-source-id: dc99ab9692804ccaedb60a55040c0b89ac7a6a6d
2018-11-27 11:48:27 -08:00
8408dff55a Add Type support to the fuser, fuse more (#14336)
Summary:
This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs.
We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes #13384 by returning the right type tensor (and adds a test where byte and double tensors are returned).
The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.)
I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results).

Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own.

I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336

Differential Revision: D13202620

Pulled By: soumith

fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef
2018-11-27 11:33:11 -08:00
bd629481fb Updating submodules
Reviewed By: yns88

fbshipit-source-id: e63160e97550942931bacaa860d91d591d2e1712
2018-11-27 11:23:32 -08:00
66c8bbf021 Add boolean dispatch for function overloading (#14081)
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.

This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.

Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081

Differential Revision: D13192228

Pulled By: driazati

fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
2018-11-27 10:51:32 -08:00
2cc35c161a Barrier synchronizes with prior work before completing (#14386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14386

See #13573, #14142, and #14271 for discussion.

This change updates ProcessGroupGloo to ensure that all prior
operations have completed before executing the barrier.

Reviewed By: manojkris

Differential Revision: D13205022

fbshipit-source-id: 673e7e6ca357dc843874d6dd8da590832e1de7fa
2018-11-27 10:46:42 -08:00
9598d380b0 Make ProcessGroup::Work::wait() throw (#14298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14298

This is a breaking API change for users of the C++ c10d API. The work
object defined wait() to return a boolean. If the work completed
successfully it would return true, if it didn't it would return false.
It was then up to the user to call the exception() function to figure
out what went wrong. This has proven suboptimal as it allows users to
forget about failure handling and errors may be ignored.

The work class is semantically very similar to std::future, where a
call to get() may throw if the underlying std::promise has set an
exception. This commit changes the semantic of the work class to be
similar to this and turns wait() into a void function that throws if
the work completes with an exception.

The exception() function can still be used to retrieve the exception
if isSuccess() returns false, but now returns an std::exception_ptr
instead of a reference to a std::exception.

Reviewed By: manojkris

Differential Revision: D13158475

fbshipit-source-id: 9cd8569b9e7cbddc867a5f34c6fd0b7be85581b8
2018-11-27 10:46:40 -08:00
03864b7b11 Add option structs and timeout field (#14297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14297

Adds option structs for allgather and barrier such that we have one
for every collective. Add timeout member field to every one of these
such that we can support per operation timeouts.

Use default constructed options struct for every collective process
group function exposed to Python.

Reviewed By: manojkris

Differential Revision: D13158474

fbshipit-source-id: 3d28977de2f2bd6fc2f42ba3108b63a429338906
2018-11-27 10:46:38 -08:00
52f50220d9 Refer to all work with ProcessGroup prefix (#14296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14296

There was mixed usage of "ProcessGroup::Work" and just "Work".
Adding prefix for readability/consistency.

Reviewed By: manojkris

Differential Revision: D13128977

fbshipit-source-id: a54a8784fa91cd6023c723cb83e9f626fb896a30
2018-11-27 10:46:36 -08:00
5865561a9a Remove algorithm caching in ProcessGroupGloo (#14295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14295

This is no longer used after moving to Gloo new style algorithms.

Closes #11912.

Reviewed By: manojkris

Differential Revision: D13111781

fbshipit-source-id: 53e347080e29d847cd9da36f2d93af047930690c
2018-11-27 10:46:34 -08:00
936c2bba23 Use new style barrier support in c10d/gloo (#14294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14294

This is the final collective to be ported to the new style where there
is no longer a need to keep a cached algorithm instance around. There
is a follow up change incoming to remove the algorithm caching
functionality in ProcessGroupGloo.

Reviewed By: manojkris

Differential Revision: D13111509

fbshipit-source-id: f3ea0d955a62029fc4e7cfc09055e4957e0943ac
2018-11-27 10:46:32 -08:00
50bc9dc9c3 fix doc for sparse.addmm (#14403)
Summary:
- fixing the doc issue in sparse.addmm

================ before change ==================
![image](https://user-images.githubusercontent.com/38509346/49063994-2f10fe80-f1ce-11e8-9ccc-54241bc45f0b.png)
![image](https://user-images.githubusercontent.com/38509346/49064064-641d5100-f1ce-11e8-865a-7227be7156ef.png)

================ post change ==================
![image](https://user-images.githubusercontent.com/38509346/49064078-76978a80-f1ce-11e8-8f38-f1f8ac9ce63b.png)
![image](https://user-images.githubusercontent.com/38509346/49064085-7bf4d500-f1ce-11e8-8a0d-bf9e5460d21f.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14403

Differential Revision: D13216582

Pulled By: weiyangfb

fbshipit-source-id: 52e0a20c6b341c37cfb31f281be3afe2a52ca532
2018-11-27 10:24:18 -08:00
a3cfab2d63 per-group and per-channel quantization (#14340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14340

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/25

Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors.

Using this in DNNLOWP operators will be done in a separate diff.

Reviewed By: dskhudia

Differential Revision: D13176386

fbshipit-source-id: e46c53e31e21520bded71b8ed86e8b19e010e2dd
2018-11-27 10:17:34 -08:00
49fe678fec Add variable_factories.h to cppdocs (#14381)
Summary:
This will document `torch::from_blob` and such.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14381

Differential Revision: D13216560

Pulled By: goldsborough

fbshipit-source-id: 112f60e45e4d38a8a9983fa71e9cc56bc1a73465
2018-11-27 10:13:23 -08:00
c19af59a6e Use integer math to compute output size of pooling operations (#14405)
Summary:
As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes.

This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU.

Fixes #13386.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405

Differential Revision: D13215260

Pulled By: soumith

fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d
2018-11-27 09:38:06 -08:00
c5cc1e3ab2 Delete legacy THCStream (long live THCStream). (#14246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14246

This commit systematically eliminates THCStream entirely from THC, replacing it
with at::cuda::CUDAStream.  In places where the previous pointer type showed up
in a public API signature, those functions are now only available to C++
clients.  (It would not be too difficult to make a C-compatible version of
CUDAStream, as it's really just a simple struct, but we leave this for
future work.)

All functions in THC that referred to THCStream were expunged in favor of their
modern counterparts.

One annoyance was that I didn't feel like redoing how the torch.cuda.Stream
binding code worked, but I really wanted to get rid of the stored THCStream*
pointer.  So I repurposed the bit-packing code I implemented for Stream hashing,
and used that to (reversibly) store streams in a uint64_t cdata field.  A perhaps
more future proof solution would be to get rid of cdata entirely, and store the
device and stream ID directly.

Billing of changes:
- All CUDAStream_ pointer API functions are now hidden and anonymously
  namespaced (instead of being in the impl namespace).  All use sites
  rewritten to use the modern C++ API.  Since CUDAStreamInternals is no
  longer part of the public API, the CUDAStreamInternals constructor and
  internals() method have been removed, and replaced with anonymous
  functions in the C++ file.
- device_index() returns DeviceIndex rather than int64_t now
- Stream and CUDAStream now have pack/unpack methods.  (CUDAStream checks
  that the unpacked bit-pattern is for a CUDA device.)
- THCStream.h header is removed entirely
- Most THCStream handling functions in THC API are removed

Reviewed By: gchanan

Differential Revision: D13121531

fbshipit-source-id: 48873262cc0a37c3eec75a7ba1c93c800da40222
2018-11-27 08:32:09 -08:00
388258fb5e Add hash functions for Stream, CUDAStream; fix Device hash function (#14191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14191

Previously, Device's hash function only worked for CPU and CUDA.  Now
it works for everything.

Implementing the bit concatenation was a bit tricky, and I got it wrong the
first time. See Note [Hazard when concatenating signed integers]

Reviewed By: smessmer

Differential Revision: D13119624

fbshipit-source-id: 36bfa139cfc739bb0624f52aaf466438c2428207
2018-11-27 08:32:08 -08:00
3ff70712c2 Implement NaN-propagating max/min on Vec256.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13399

Differential Revision: D13199957

Pulled By: resistor

fbshipit-source-id: 1565e079b13c5d4f42f2033830a7c997b7d824bc
2018-11-26 22:46:20 -08:00
a0ef8afd7e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 210f7eec65bea5e31817fb56dec27b0ab8af797a
2018-11-26 19:38:00 -08:00
f019a2d9b3 Remove unused executors, part 3 (#14199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14199

Remove legacy code for dag, async_dag

Reviewed By: salexspb

Differential Revision: D13019102

fbshipit-source-id: ff07e45304d9af4be0375215f4b642c4b0edb12d
2018-11-26 19:10:43 -08:00
7953b32dc4 Remove unused executors, part 2 (#14115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14115

Remove legacy implementation of prof_dag

Reviewed By: salexspb

Differential Revision: D13019096

fbshipit-source-id: 4f2bf676444d84eaa2cc1effcc3ebdc764e0a016
2018-11-26 19:10:42 -08:00
34239006b0 Remove unused executors, part 1 (#14117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14117

Removing unused legacy executors (htrace)

Reviewed By: salexspb

Differential Revision: D13019078

fbshipit-source-id: 19d0ed1b47a22cc17c27fdd15d748ced54806132
2018-11-26 19:10:40 -08:00
507cb16583 Delete OPENMP_STUB translation. (#14286)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14286

Differential Revision: D13205356

Pulled By: ezyang

fbshipit-source-id: 08e9821e4b32f8d7f3c41906e481f280ee6cf2e3
2018-11-26 19:08:07 -08:00
12558019a8 backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345)
Summary:
- introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345

Differential Revision: D13094070

Pulled By: weiyangfb

fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461
2018-11-26 17:47:48 -08:00
9e1805d38e Switch Int8ChannelShuffle operator to QNNPACK (#14362)
Summary:
1.8-2.2X better performance on ARM devices
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14362

Reviewed By: jerryzh168

Differential Revision: D13192312

Pulled By: Maratyszcza

fbshipit-source-id: 0d3dff067e300c7d741c42615b61246cbf09a829
2018-11-26 17:43:32 -08:00
2d6f039766 Fixed file init_method write/read race (#14388)
Summary:
This should fix the race among multiple processes: https://github.com/pytorch/pytorch/issues/13750

Essentially, the reader is trying to open the file, and will error out if it doesn't exist, we here factor in the timeout option of FileStore to apply a timeout for creating a file (should always be created anyway unless something is wrong), and more importantly, waiting for the file to be created.

Tested on both NFS and local drive, the race disappears when 8 concurrent processes do distributed training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14388

Differential Revision: D13207178

Pulled By: teng-li

fbshipit-source-id: d3d5d62c4c8f01c0522bf1653c8986155c54ff80
2018-11-26 17:09:35 -08:00
f639249d51 Fix dataloader iterator test (#14045)
Summary:
I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`.

Fixes https://github.com/pytorch/pytorch/issues/14276

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045

Differential Revision: D13092187

Pulled By: goldsborough

fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b
2018-11-26 17:06:41 -08:00
6f3002a50e Fixed c10d test (#14389)
Summary:
Most likely a typo.

Tested on 8-GPU machine

```
tengli@learnfair062:~/pytorch/test$ python test_c10d.py ProcessGroupNCCLTest.test_barrier
.
----------------------------------------------------------------------
Ran 1 test in 29.341s

OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14389

Differential Revision: D13207207

Pulled By: teng-li

fbshipit-source-id: aaffe14237076fe19d94e2fa4d9c093397f07bb9
2018-11-26 16:46:33 -08:00
1ca0ec7299 fix typo in torch.sum documentation (#14250)
Summary:
Notice that an extra colon was added to `:attr:`, so in https://pytorch.org/docs/stable/torch.html#torch.sum , `dim` shows up as ":attr::_dim_". This patch fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14250

Reviewed By: soumith

Differential Revision: D13146363

Pulled By: umanwizard

fbshipit-source-id: f7d03dcb0973aae248b56ab407ba8489f2b1fe36
2018-11-26 16:36:52 -08:00
cef23a4b1d More JIT type hierarchy refinement (#14127)
Summary:
JIT type system hierarchy refinement and refactors:

1. Make NumberType be the base type of IntType FloatType
2. Make single type container like OptionalType and FutureType share SingleElementType base type
3. Some refactors to make it more robust, e.g. adding python_str() for some types so that we have proper python_print serialization format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14127

Differential Revision: D13112657

Pulled By: wanchaol

fbshipit-source-id: 335c5b25977be2e0a462c7e4a6649c1b653ccb4f
2018-11-26 16:25:40 -08:00
afb2c0ce86 changing some rpath stuff (#14304)
Summary:
See if anything breaks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14304

Differential Revision: D13201418

Pulled By: pjh5

fbshipit-source-id: ac2101b61a23bda37329d4d923c3d9d120e718bf
2018-11-26 15:57:47 -08:00
b18063b39a Fix caffe2 => onnx exporter for ConvTranspose (#14143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14143

ConvTranspose has a per-operator attribute rename, which meant that the
global attribute rename for kernels => kernel_shape was not applied.
Changing the behavior so that the global renames always apply, but per-op
renames can override those for specific attributes.

Note: The python frontend path isn't actually used for ConvTranspose, but I
thought it would be good to make it consistent.

Reviewed By: yinghai

Differential Revision: D13113395

fbshipit-source-id: cd3f124b4b5c753a506d297138b7d002b51bfb38
2018-11-26 15:51:42 -08:00
5918de8e84 Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function
Differential Revision:
D13166669

Original commit changeset: ca965f9841d4

fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad
2018-11-26 14:55:04 -08:00
bb7fb7e45f remove CAFFE2_API from IdWrapper (#14044)
Summary:
it doesn't really make sense on a template class. Also it breaks if
you try to build in debug on Windows, so this will save someone some
frustration in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14044

Differential Revision: D13202960

Pulled By: anderspapitto

fbshipit-source-id: 617d78366993d5ecc2ba1f23bb90010f10df41f3
2018-11-26 14:08:56 -08:00
735cd06536 FeedTensor returns a Tensor (#14196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641

FeedTensor function used to take a pointer to Tensor and feed the content using Resize
and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead.

Reviewed By: dzhulgakov

Differential Revision: D13091163

fbshipit-source-id: 9abf2fd320baca76e050530c500dd29f8e2d0211
2018-11-26 13:05:44 -08:00
b13f91dbd9 Allow graph fuser to move chunks past multiple nodes. (#14055)
Summary:
Fixes #12290. Also speeds up JIT LSTM forward pass from 8.8ms to 7.8ms; previously, each JIT lstm cell used 2 fused kernels. Now, it only uses one fused kernel (which is how many kernels cudnn uses).

Explanation:

Let f, g, h be fusible ops.
```
x = f(v, w)
z = g(x, y)
a, b = chunk(z)
c = h(a, b)
```
becomes (before this PR):
```
x = f(v, w)
x', y' = broadcast_tensors([x, y])
ax, bx = chunk(x')
ay, by = chunk(y')
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
The graph fuser then puts g, g, and h into one FusionGroup and is unable
to move `x = f(v, w)` into the FusionGroup.

This PR lets the graph fuser move `x = f(v, w)` into the FusionGroup.
It does this by abstracting the broadcast_tensors + multiple chunk nodes
into one intermediate `prim::BroadcastingChunk[chunks, dim]` node.

A `BroadcastingChunk[chunks, dim](*inputs)` node is equivalent to:
- broadcasting all of *inputs
- chunk-ing each broadcasted input into `chunks` chunks along dim `dim`.

Abstracting the broadcasting chunk behavior away, it is now a lot easier
for the graph fuser to move (broadcast + chunk) past an operation. After
this PR, the above graph becomes:
```
x = f(v, w)
ax, bx, ay, by = BroadcastingChunk(x, y)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
Now, to move `x = f(v, w)` after the BroadcastingChunk, one just needs
to add f's operands to the BroadcastingChunk:
```
ay, by, av, bv, aw, bw = BroadcastingChunk(y, v, w)
ax = f(av, aw)
by = f(bv, bw)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```

cc apaszke mruberry zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14055

Differential Revision: D13159259

Pulled By: zou3519

fbshipit-source-id: 134e9e645c950384d9be6a06a883a10e17a73d7d
2018-11-26 12:31:49 -08:00
8cc5d54b66 Updating submodules
Reviewed By: yns88

fbshipit-source-id: b4d74bf58b5536a0de654dfe73d41b5e1126eec6
2018-11-26 12:21:09 -08:00
0d1f382e39 Removing Caffe2-specific conda infra
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11961

Differential Revision: D10045909

Pulled By: pjh5

fbshipit-source-id: e9c12124897ee586aeb8b6654b31e4b81687199a
2018-11-26 12:18:17 -08:00
2fa3c8327c fix tensor advanced indexing with assignment (#14311)
Summary:
Fix a mishandling of `foo[a] = b` when `a` was a tensor. We were assigning to a copy of `foo`, not a view of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14311

Differential Revision: D13196109

Pulled By: suo

fbshipit-source-id: c929401fda7c4a27622d3fe2b11278b08a7f17f1
2018-11-26 12:10:48 -08:00
80ba65e2f5 remove unnecessary zero_point argument from constructors (#14323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14323

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/24

As title says.

Reviewed By: dskhudia

Differential Revision: D13167073

fbshipit-source-id: 6d6c526fd6e29a14e97f71a0881f28ada8703107
2018-11-26 11:48:17 -08:00
0651b594d8 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 06e234f1a0217a268712832f21cb06b7109538a6
2018-11-26 11:27:01 -08:00
a10a993872 Fix -Wreturn-std-move (#14113)
Summary:
On clang-7 (internal) a warning, `-Wreturn-std-move`, is being emitted and raised to an error via `-Werror` for the code this PR fixes. The reason is that `autograd::make_variable` returns an `autograd::Variable`, so returning it from a function that returns `at::Tensor` disallows the compiler from eliding the return value (RVO). So let's explicitly convert the `autograd::Variable` to an `at::Tensor` before returning it.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14113

Differential Revision: D13105638

Pulled By: goldsborough

fbshipit-source-id: 6e1dc31c6512e105ab2a389d18807422ee29283c
2018-11-26 11:15:59 -08:00
90ed2f5aca minimize code compiled with avx2 and header includes from them (#14313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14313

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/22

This diff is an attempt to minimize code compiled with avx2.

Reviewed By: dskhudia

Differential Revision: D13166591

fbshipit-source-id: 2be241141f6d7478b86a422953791e237ff10268
2018-11-26 11:09:21 -08:00
fa73037233 Add proper from_blob overloads (#13982)
Summary:
There was an overload for `torch::from_blob` missing that allowed passing strides.

ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13982

Differential Revision: D13108089

Pulled By: goldsborough

fbshipit-source-id: b87594ec0bf55b35d106b4438bc18b2ce9fc8f71
2018-11-26 10:14:51 -08:00
b30c803662 allow concatenating "hybrid" (sparse/dense) tensors along their dense dimensions (#13761)
Summary:
Follow-up to #13577

The idea is to take each values tensor, concatenate it with zeros before and after itself (along the dimension corresponding to the one we're catting the tensors along), to get a tensor corresponding to the values for that tensor in the result. Then we concatenate all of those together to get the final values tensor. (Hopefully, this will be more clear from the example in the comments).

The indices are more straightforward: since we aren't concatenating along a sparse dimension, they don't change at all, so all we need to do are concatenate the indices from the different tensors together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13761

Differential Revision: D13160343

Pulled By: umanwizard

fbshipit-source-id: 13d7adecd369e0eebdf5bce3d90a51029b66bd1d
2018-11-26 10:06:49 -08:00
a13fd7ec28 Allow torch.utils.cpp_extension.load to load shared libraries that aren't Python modules (#13941)
Summary:
For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi  and fmassa will appreciate this.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941

Differential Revision: D13110592

Pulled By: goldsborough

fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20
2018-11-26 09:39:21 -08:00
a60368982b Batch more matrix multiplies (#13456)
Summary:
This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows.

cc mruberry ngimel zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456

Differential Revision: D12920339

Pulled By: zou3519

fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443
2018-11-26 09:20:35 -08:00
1ef949036c Enable native wrappers for the remainder of nn functions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14290

Differential Revision: D13162562

Pulled By: gchanan

fbshipit-source-id: 615e1727988bfeeade48f9b38162333a2e298f7b
2018-11-26 07:58:59 -08:00
60e7d04961 Add Recency Weighted into SparseLookup (#14291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14291

Add RecencyWeighted into SparseLookup.

Reviewed By: Wakeupbuddy

Differential Revision: D13147738

fbshipit-source-id: de5dc3aaee8ce7d41c6d30d2ff47e9786a7fa4da
2018-11-24 02:43:31 -08:00
6e1e2032d3 quote NUMPY_INCLUDE_DIR (#14341)
Summary:
when NUMPY_INCLUDE_DIR contains space character (e.g. "C:\Program Files (x86)\Microsoft Visual Studio\..."), cmake cannot receive correct path name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14341

Differential Revision: D13188408

Pulled By: soumith

fbshipit-source-id: b62127d90e53da94fe6af5d3bdd2ea4fd6546210
2018-11-23 21:34:01 -08:00
33d091f432 shape analysis fix (#14325)
Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325

Differential Revision: D13183296

Pulled By: suo

fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298
2018-11-23 11:24:24 -08:00
8e3240d022 Some minor fixes for Windows build script (#14218)
Summary:
1. Fix execution failure when some of the paths are not defined
2. Users can now optionally override install dir by setting `CMAKE_INSTALL_PREFIX`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14218

Differential Revision: D13180350

Pulled By: soumith

fbshipit-source-id: 8c9680d1285dbf08b49380af1ebfa43ede99babc
2018-11-23 08:17:16 -08:00
7557a993ab Allow dataloader to accept a custom memory pinning function (#14171)
Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.

This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.

The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171

Differential Revision: D13166669

Pulled By: soumith

fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab
2018-11-23 08:12:43 -08:00
c36156eded Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253)
Summary:
This issue was noticed, and fix proposed, by raulpuric.

Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward.  This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes.

The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.**  The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`.

Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive.  However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free.

I'm a little wary of the [def checkpoint(function, *args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list).  Python 3 seems happy with it.
Edit:  It appears Python 2.7 is NOT happy with a [kwarg after *args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification).  `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage.  I'm open to suggestions (a global flag perhaps)?

**Batchnorm may still be an issue, but that's a battle for another day.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253

Differential Revision: D13166665

Pulled By: soumith

fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7
2018-11-23 08:09:43 -08:00
1e05f4be73 Updating submodules
Reviewed By: yns88

fbshipit-source-id: e92b0c24a56b588dcf30542692cb4bdc2d474825
2018-11-22 22:04:37 -08:00
d55b25a633 Remove individual "using c10:xxx" statements (#13168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13168

We now have a "using namespace c10" in the at and caffe2 namespaces, we don't need the individual ones anymore

Reviewed By: ezyang

Differential Revision: D11669870

fbshipit-source-id: fc2bb1008e533906914188da4b6eb30e7db6acc1
2018-11-22 11:57:10 -08:00
f79fb58744 Make sure we bind input/output of Onnxifi op positionally (#14214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214

This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional.

Reviewed By: dzhulgakov

Differential Revision: D13134470

fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810
2018-11-22 00:31:01 -08:00
7fc34a4122 Convert gumbel_softmax, lp pooling weak functions and modules (#14232)
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232

Differential Revision: D13164506

Pulled By: wanchaol

fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
2018-11-21 23:44:24 -08:00
08b77d3844 Use ADL to find toString (#14021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14021

I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined.
Unfortunately, we call it by specifying at::toString() instead of relying on ADL.
This diff changes that to prepare the actual move.

Reviewed By: ezyang

Differential Revision: D13015239

fbshipit-source-id: f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870
2018-11-21 23:08:52 -08:00
0e93a03a3a Fix include paths for intrusive_ptr (#13692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13692

This now lives in c10/util, not ATen/core anymore.

Reviewed By: ezyang

Differential Revision: D12937091

fbshipit-source-id: ea2d420a15e7941a38d0b4c75e20ca18437c73f8
2018-11-21 23:08:50 -08:00
4160c13cd2 Move intrusive_ptr to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13691

Reviewed By: ezyang

Differential Revision: D12937090

fbshipit-source-id: fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9
2018-11-21 23:08:49 -08:00
e91c8e2f2d ignore generated caffe2 docs and virtualenvs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14309

Reviewed By: soumith

Differential Revision: D13166626

Pulled By: JoelMarcey

fbshipit-source-id: 4f11228d8b5da85cec222bf11282722a7319581b
2018-11-21 22:30:34 -08:00
3918e226fd Updating submodules
Reviewed By: yns88

fbshipit-source-id: 20976d595e68a08d746d8806fd0205d810656366
2018-11-21 22:02:07 -08:00
fb8c3d62fe removing quantization utility functions moved to fbgemm (#14301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14301

This diff removes quantization utility functions copied to fbgemm

Reviewed By: Maratyszcza

Differential Revision: D13159299

fbshipit-source-id: a7f3cd2af0aa241a8578d532a70a157da70d9289
2018-11-21 21:38:23 -08:00
8c4910b095 Cuda version comparison with CUDA_VERSION_STRING (#14302)
Summary:
Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition.

`
if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**For example:**
I have cuda 10.0 installed. My nvcc output looks like this
`Cuda compilation tools, release 10.0, **V10.0.130**
`

If I compile my application with caffe2. It gives me following error:

```
CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message):
  FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA
  headers say the version is 10.0.  This often occurs when you set both
  CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also
  setting PATH to point to the correct nvcc.  Perhaps, try re-running this
  command again with PATH=/usr/local/cuda/bin:$PATH.  See above log messages
  for more diagnostics, and see
  https://github.com/pytorch/pytorch/issues/8092 for more details.
```

**In this case, it got failed because**
cuda_version_from_header = 10.0
CUDA_VERSION = 10.0.130 (Came from NVCC)

`if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**Fix:**
We should compare header version with **major.minor format** which is given by CUDA_VERSION_STRING
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14302

Differential Revision: D13166485

Pulled By: soumith

fbshipit-source-id: 1b74e756a76c4cc5aa09978f5850f763ed5469b6
2018-11-21 21:02:28 -08:00
992e2750fd Updating submodules
Reviewed By: yns88

fbshipit-source-id: ee60b4dddf688608ef80043b1dc336d120a045d0
2018-11-21 21:02:26 -08:00
341b48529e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 366c29d09bec53459e2a4890c7fe8d10f45ff5c3
2018-11-21 20:31:53 -08:00
b26f82b0ec Robust NCCL barrier improvement to cover all devices combinations (#14271)
Summary:
This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream.

Test covered as well. Tested on 8-GPU machine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271

Differential Revision: D13164993

Pulled By: teng-li

fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1
2018-11-21 18:23:55 -08:00
b149456645 alias analysis (#14018)
Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.

2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".

3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.

- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018

Differential Revision: D13144906

Pulled By: suo

fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a
2018-11-21 17:48:46 -08:00
d55ba77a5d Remove extra include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14206

Reviewed By: dzhulgakov

Differential Revision: D13131318

fbshipit-source-id: 559b55b8d98cdf6b7d1d3e31237c5473edc5e462
2018-11-21 17:21:44 -08:00
85d3fccee7 Removed redundant allreduce options in DDP (#14208)
Summary:
This somehow is not cleaned up after the C++ migration. Unused and can be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14208

Differential Revision: D13132492

Pulled By: teng-li

fbshipit-source-id: 0f05b6368174664ebb2560c037347c8eb45f7c38
2018-11-21 16:56:46 -08:00
d9cdcc9a3b Add list inequality operator (#14129)
Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129

Differential Revision: D13123894

Pulled By: driazati

fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f
2018-11-21 16:32:58 -08:00
34db39d87a Add onnxifi support to SparseLengthsWeightedSum (#14210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14210

We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here.

Reviewed By: bddppq

Differential Revision: D13132320

fbshipit-source-id: b21c30c185c9e1fbf3980641bc3cdc39e85af2e1
2018-11-21 15:47:24 -08:00
60963c2ecb Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971)
Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971

Reviewed By: bddppq

Differential Revision: D12850675

Pulled By: yinghai

fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019
2018-11-21 15:44:50 -08:00
accbcca338 IDEEP fallback for ResizeNearest op (#14212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14212

TSIA

Reviewed By: yinghai

Differential Revision: D13134134

fbshipit-source-id: e3c5c9c8756d6e25b213f8dde9d809a44373d7a3
2018-11-21 13:44:07 -08:00
2cacb39a21 Fix ONNX_ATEN mode (#14239)
Summary:
Fix ONNX_ATEN mode by adding it to the validateBlock method.
Before this pr, validateBlock will throw an exception when using this mode.

I will add related test cases for ONNX_ATEN mode in a different pr once this is merged, since we dont have any currently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14239

Differential Revision: D13145443

Pulled By: zrphercule

fbshipit-source-id: 60e7942aa126acfe67bdb428ef231ac3066234b1
2018-11-21 13:15:23 -08:00
fe068d9032 Bump gloo (#14281)
Summary:
Includes more robust error handling and timeout support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14281

Differential Revision: D13158232

Pulled By: pietern

fbshipit-source-id: e80432799a020576d5abdcd9a21d66b629479caf
2018-11-21 11:27:42 -08:00
31ba34b73c fix comment on dnnlowp op arguments (#14265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14265

Fix comment

Reviewed By: hx89

Differential Revision: D13152106

fbshipit-source-id: fbe98906963cbd5cb20a583a737a792fbc38292e
2018-11-21 09:39:57 -08:00
6ce9907d51 native NN wrappers, including with buffers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14256

Differential Revision: D13148783

Pulled By: gchanan

fbshipit-source-id: 4b6179033cf1df26061b6731eaaa4e008692e592
2018-11-21 09:08:00 -08:00
91c0b7159a Remove header generated at configuration time (#14244)
Summary:
The build was picking up the empty stub header instead of the generated
one. Because of the large number of include paths we end up passing to
the compiler it is brittle to have both an empty stub file and a
generated file and expect the compiler to pick up the right one.

With the recent change to compile everything from a single CMake run we
can now use native CMake facilities to propagate macros that indicate
backend support. The stanzas target_compile_definitions with the
INTERFACE flag ensure that these macros are set only for downstream
consumers of the c10d target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14244

Reviewed By: teng-li

Differential Revision: D13144293

Pulled By: pietern

fbshipit-source-id: f49324220db689c68c126b159f4f00a8b9bc1252
2018-11-21 08:45:08 -08:00
788d2e87bd Address jittering issues in python_print (#14064)
Summary:
export - print a method with python_print
import - import a method with import_method

We want to ensure:

    export(g) == export(import(export(g)))

That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.

This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
  that they always parse in the same way
* when creating loop-carried dependencies, sort them
  by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
  in loops that are dead both after the loop and inside the body of the
  loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
  are probably rare in user code, and we need a way to communicate
  that we do not care about a variable name when re-parsing the graph.
  Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
  and family.
* Allow re-treeing to work as long as the only thing in its way is a
  constant node. These do not have side effects but are sometimes
  inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
 (or, if they are inlined, ensure they get assigned CONSTANT.cX number
  in a consistent order). Cleanup tuples (this is done in the compiler,
  but not in the tracer, leading to some tuple indexing jitter if not
  done).
* use strtod_l, not std::stod which can throw exceptions

Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
  cmake files. Threading it into setup.py allows us to turn on
  debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
  ~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064

Differential Revision: D13094637

Pulled By: zdevito

fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d
2018-11-21 06:38:29 -08:00
af82396f7f Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 27838fb2dad82c78906faf3cc2d124557c30e88f
2018-11-21 06:38:28 -08:00
166ee86b46 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 3c17e12a579245a84e9a56b1d8a1641232150675
2018-11-21 00:27:50 -08:00
7a654617eb Add tensor table in ModelDef and use it for jit script serialization and deserialization (#13861)
Summary:
As we discussed, the tensors in the torch script will be associated with the tensor data in the serialized file. So let's add a table of tensor (actually it's a repeated TensorProto filed) in the ModelDef. TensorProto.name will be the id.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13861

Reviewed By: dzhulgakov

Differential Revision: D13036940

Pulled By: zrphercule

fbshipit-source-id: ecb91b062ac4bc26af2a8d6d12c91d5614efd559
2018-11-20 23:37:50 -08:00
17432a1051 c10d Automatically retry on EINTR (#14180)
Summary:
Probably fixes https://github.com/pytorch/pytorch/issues/14170

Actually I probably shouldn't retry all `SYSCHECK` calls. I'll leave to the reviewers to decide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14180

Reviewed By: pietern

Differential Revision: D13144741

Pulled By: SsnL

fbshipit-source-id: d73288f76b18cae14b1b43dad4e5e8d010a96d95
2018-11-20 23:31:26 -08:00
bb301a431d Make NCCL backend support barrier op (#14142)
Summary:
This is a feature request from: https://github.com/pytorch/pytorch/issues/13573

As the title says, this PR makes NCCL backend support barrier op.

There are a couple scenarios that need to be addressed:
(1) When there is already a NCCL op happened, we need to record what GPU device(s)  the previous op happened and queue the allreduce barrier op on the same GPU device
(2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort.

As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return.

`test_distributed` should cover the test. I also manually tested both scenarios.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142

Differential Revision: D13113391

Pulled By: teng-li

fbshipit-source-id: 96c33d4d129e2977e6892d85d0fc449424c35499
2018-11-20 21:12:22 -08:00
1acaafbe70 Fix memory leakage in onnxifi transformer (#14245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14245

tsia

Reviewed By: bddppq, rdzhabarov

Differential Revision: D13144783

fbshipit-source-id: 5e07bb7ab883ba1af68547a26272cd320967b9e3
2018-11-20 18:03:05 -08:00
8f20d40bb7 Allow undefined tensors as constants (#14120)
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:

```python
torch.jit.script
def fn(x=None):
    # type: (Optional[Tensor]) -> Tensor
    return torch.jit._unwrap_optional(x)

torch.jit.script
def fn2():
    # type: () -> Tensor
    return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120

Differential Revision: D13124625

Pulled By: driazati

fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
2018-11-20 16:54:27 -08:00
d6bfc53b9e Export BatchNorm functional and module, add necessary JIT support (#14016)
Summary:
This PR did three things:

1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016

Differential Revision: D13112064

Pulled By: wanchaol

fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
2018-11-20 14:15:06 -08:00
1f871f126f Have PYTORCH_FUSION_DEBUG print C kernel source (#14213)
Summary:
- Move up handling the environment variable from CPU only to all
- Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n:
  1: print C source
  2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG)

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213

Differential Revision: D13135393

Pulled By: soumith

fbshipit-source-id: befa4ebea3b3c97e471393a9f6402b93a6b24031
2018-11-20 12:45:07 -08:00
1224ef9ea1 Delete backwards compatibility StorageImpl.h and TensorImpl.h (#14230)
Summary:
Since they directly include the real ones in core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14230

Differential Revision: D13140323

Pulled By: tugrulates

fbshipit-source-id: d7e3b94e891b2d7fa273d01c0b7edfebdbd7e368
2018-11-20 12:29:24 -08:00
9a281451ed remove unused parameters from caffe2_dnnlowp_utils.cc (#14164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14164

See title

Reviewed By: csummersea

Differential Revision: D13115470

fbshipit-source-id: d754f558cd06e5f4c1cd00315e912cdb7b50731a
2018-11-20 00:56:06 -08:00
3c2462cf24 use pragma once (#14163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14163

Some of the names we were using to guard the header file was too short (e.g. DYNAMIC_HISTOGRAM_H).

Reviewed By: csummersea

Differential Revision: D13115451

fbshipit-source-id: cef8c84c62922616ceea17effff7bdf8d67302a2
2018-11-20 00:56:04 -08:00
4224ce10a8 format python files (#14161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14161

Formatting using Nuclide

Reviewed By: hx89

Differential Revision: D13115348

fbshipit-source-id: 7432ce6072a1822d7287b4ebcfcb6309282e15ac
2018-11-20 00:56:02 -08:00
3c0ce51484 clang-format (#14160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14160

clang-format of C++ files

Reviewed By: hx89

Differential Revision: D13115201

fbshipit-source-id: d2ad65f66209e00578ef90f87f41272de2d24aa9
2018-11-20 00:56:00 -08:00
acd7811e33 Add sigmoid op based on MKL-DNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097

Differential Revision: D13105366

Pulled By: yinghai

fbshipit-source-id: d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4
2018-11-19 22:56:35 -08:00
c96b72d61f OSS build fix (#14192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14192

We can only use C10_* in OSS. The build is only broken if built with USE_FBGEMM=ON

Reviewed By: jianyuh

Differential Revision: D13121781

fbshipit-source-id: f0ee9a75997766e63e1da8a53de7ddb98296a171
2018-11-19 22:47:17 -08:00
6dacc20073 Make EncodeMethod in jit script serialization return a string (#14167)
Summary:
Nit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14167

Reviewed By: ezyang

Differential Revision: D13116584

Pulled By: dzhulgakov

fbshipit-source-id: c0e7e71a81004031564bd2fc59f393041e1283d5
2018-11-19 22:15:19 -08:00
a036f9a65f Create README.md of caffe2/quantization/server
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14217

Reviewed By: csummersea

Differential Revision: D13135086

Pulled By: jspark1105

fbshipit-source-id: bddf4f1c2dc5ec8ea6ebe9e265956f367e082d52
2018-11-19 21:59:34 -08:00
6dc28e666c CircleCI: fix NCCL install (#14172)
Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR fixes the issue.

This replaces https://github.com/pytorch/pytorch/pull/14124.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14172

Differential Revision: D13135087

Pulled By: yf225

fbshipit-source-id: 42fff3926734778713d483d74ba0a89e5502dd9e
2018-11-19 21:30:32 -08:00
03a02b6fd5 Fix a bug in test case of onnx::If
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14209

Differential Revision: D13132607

Pulled By: zrphercule

fbshipit-source-id: b7f7ccc6a6cbdeb57a7f88a1971d15dd81e6fc81
2018-11-19 18:46:21 -08:00
b807970aea Tensor type checking and informative error messages for torch.distributed (#14204)
Summary:
This will address https://github.com/pytorch/pytorch/issues/13574

This error message should be more informative to the user for all the non-multiGPU ops, since we python binding to multi-gpu ops always.

test_distributed should cover all. Also tested both RunTime errors.

```
>>> a = torch.ByteTensor([])
>>> b = [a, a]
>>> dist.all_reduce(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 809, in all_reduce
    _check_single_tensor(tensor, "tensor")
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 207, in _check_single_tensor
    "to be a torch.Tensor type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor to be a torch.Tensor type

>>> b = ["b"]
>>> dist.all_gather(b, a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 1006, in all_gather
    _check_tensor_list(tensor_list, "tensor_list")
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 225, in _check_tensor_list
    "to be a List[torch.Tensor] type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor_list to be a List[torch.Tensor] type
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14204

Differential Revision: D13131526

Pulled By: teng-li

fbshipit-source-id: bca3d881e41044a013a6b90fa187e722b9dd45f2
2018-11-19 18:30:54 -08:00
7d1db89ef9 Move stream functions from CUDAContext to CUDAStream (#14110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14110

I'm planning to move CUDAStream to c10/cuda, without also moving
CUDAContext, and so it's most convenient if these definitions
are in the actual header file in question.

Reviewed By: smessmer

Differential Revision: D13104693

fbshipit-source-id: 23ce492003091adadaa5ca6a17124213005046c2
2018-11-19 17:05:48 -08:00
50b914aeeb Move CUDAStreamInternals inside detail namespace. (#14109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14109

Previously it was at the top level, because the author was under
the impression that you could only refer to top-level C++ names
from C, but this is not true; you just need to make a stub struct
conditioned on __cplusplus.

Reviewed By: smessmer

Differential Revision: D13104694

fbshipit-source-id: ecb7ae6dcfa4ab4e062aad7a886937dca15fd1b2
2018-11-19 17:05:46 -08:00
e58bbbac18 Delete dependencies from CUDAStream; remove synchronize_with (#13920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920

I want to move CUDAStream and CUDAGuard to c10_cuda without also
bringing along CUDAContext or CUDAEvent for the ride (at least for
now).  To do this, I need to eliminate those dependencies.

There's a few functions in CUDAContext.h which don't really need
THCState, so they're separated out and put in general
purpose c10/cuda/CUDAFunctions.h

Reviewed By: smessmer

Differential Revision: D13047468

fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e
2018-11-19 17:05:41 -08:00
a20c7ce848 Fix race in AtomicFetchAdd. (#13479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479

Increases the lock scope to above Output() calls.

These calls potentially allocate the underlying blob/tensor
objects and multiple invocations race each other over the
same output blobs/tensors.

Reviewed By: bwasti

Differential Revision: D12891629

fbshipit-source-id: a6015cfdb08e352521a1f062eb9d94a971cfbdb0
2018-11-19 16:11:58 -08:00
1a29950478 Remove API macros from intrusive_ptr (#14137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14137

This is a templated header-only class and shouldn't need export/import macros.

Reviewed By: ezyang

Differential Revision: D13111712

fbshipit-source-id: c8c958e75b090d011d25156af22f37f9ca605196
2018-11-19 15:39:20 -08:00
1c2ed4eb23 Tensor construction: combine Resize+mutable_data - 1/4 (#13942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13942

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13054770

fbshipit-source-id: a9e86e5dfcb4f7cebf5243e1d359fad064561bed
2018-11-19 15:33:50 -08:00
8aa5174106 Tensor construction: combine Resize+mutable_data - 3/4 (#13944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13054836

fbshipit-source-id: 5de07a156687f1ee607d0450410881d9176a87a7
2018-11-19 15:28:13 -08:00
f34c848f52 Store the optimize flag in module (#14166)
Summary:
When the save/load of script module, we store optimize flag in module instead of encoding it in method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14166

Reviewed By: ezyang

Differential Revision: D13117577

Pulled By: dzhulgakov

fbshipit-source-id: dc322948bda0ac5809d8ef9a345497ebb8f33a61
2018-11-19 14:34:05 -08:00
7fd1ea6ab7 Cleanup caffe2 hipify exclude patterns (#14198)
Summary:
depthwise_3x3_conv_op.cu does not exist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14198

Differential Revision: D13127479

Pulled By: bddppq

fbshipit-source-id: ec6bd434055a49ea405c4b399bde8c074114f955
2018-11-19 14:27:56 -08:00
b6edd7bbb4 Support 'python_module' of 'nn' in native functions. (#14126)
Summary:
Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126

Reviewed By: ezyang

Differential Revision: D13109975

Pulled By: gchanan

fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec
2018-11-19 14:13:25 -08:00
1e73ab25f5 Use onnx proto_utils to support using protobuf-lite
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14150

Differential Revision: D13115586

Pulled By: bddppq

fbshipit-source-id: d6b6935a8deac60f6f58d62a71f6840182a72a51
2018-11-19 13:32:46 -08:00
6b4852213d Use fbgemm revision file added by shipit (#14105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14105

Pull Request resolved: https://github.com/facebook/fbshipit/pull/62

Use fbgemm revision file created by ShipIt for updating fbgemm revision for pytorch. We don't have to manually update submodule now.

Reviewed By: yns88

Differential Revision: D13072074

fbshipit-source-id: bef9eabad50f7140179c370a60bd9ca73067b9b5
2018-11-19 12:12:21 -08:00
b6290531aa Setup sccache for PyTorch ROCm CI (#14153)
Summary:
Discovered huge build time difference between caffe2 rocm build and pytorch rocm build (6min vs. 30min), turns out it's because the sccache setup needed in caffe2 docker images are not n pytorch build script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14153

Differential Revision: D13115097

Pulled By: bddppq

fbshipit-source-id: 88414f164b980f0e667c8e138479b4a75ab7692e
2018-11-19 11:31:55 -08:00
e387d945c2 allow empty index for scatter_* methods (#14077)
Summary:
Fixes #2027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077

Differential Revision: D13095788

Pulled By: ailzhang

fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3
2018-11-19 09:50:21 -08:00
751b5ea941 use at::Device throughout JIT (#14181)
Summary:
zdevito soumith

Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master.

fixes #13254
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181

Differential Revision: D13117688

Pulled By: soumith

fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f
2018-11-19 09:21:57 -08:00
fc61f1a1d1 Support named return arguments in native_functions. (#14100)
Summary:
Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the
return names part of the function declaration itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100

Differential Revision: D13101810

Pulled By: gchanan

fbshipit-source-id: 1c80574cd4e8263764fc65126427b122fe36df35
2018-11-19 08:27:20 -08:00
ce85150cb4 Split out CUDAMultiStreamGuard from CUDAGuard (#13912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13912

The implementation and API of CUDAMultiStreamGuard is less mature,
and it cannot be implemented generically (yet) in c10_cuda.  This
might be a reasonable thing to do eventually, but not for now.

Reviewed By: smessmer

Differential Revision: D13046500

fbshipit-source-id: 4ea39ca1344f1ad5ae7c82c98617aa348c327848
2018-11-19 08:20:11 -08:00
48099c23b4 Move AT_CUDA_CHECK to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910

Reviewed By: smessmer

Differential Revision: D13046201

fbshipit-source-id: 8d360a0e4d6c2edf070d130e600c6b04f0ee0058
2018-11-19 08:20:10 -08:00
928687bb24 Add c10 cuda library. (#13900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13900

Add c10 cuda library.

Right now, this is not used by anything, and only tests if the CUDA
headers are available (and not, e.g., that linking works.)

Extra changes:
- cmake/public/cuda.cmake now is correctly include guarded, so you
  can include it multiple times without trouble.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: smessmer

Differential Revision: D13025313

fbshipit-source-id: fda85b4c35783ffb48ddd6bbb98dbd9154119d86
2018-11-19 08:20:07 -08:00
2681852438 Switch Int8Add operator to QNNPACK (#14089)
Summary:
- Improved single-threaded performance due to optimized low-level micro-kernels
- Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well)
- Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089

Differential Revision: D13110135

Pulled By: Maratyszcza

fbshipit-source-id: 1f149394af5c16940f79a3fd36e183bba1be2497
2018-11-18 23:57:57 -08:00
92dbd0219f No more -werror for c10d (#14155)
Summary:
As the title says
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14155

Differential Revision: D13115769

Pulled By: teng-li

fbshipit-source-id: 278deba090364544d92fa603621604ce37fa974e
2018-11-18 13:53:41 -08:00
55b25365e9 Add ultra low precision options (#14133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14133

Experiment with ultra low precisions on the Resnext-101 URU trunk model

Reviewed By: jspark1105

Differential Revision: D10108518

fbshipit-source-id: f04d74fbe1c9e75efafcd9845719bdb2efbbfe9c
2018-11-18 12:51:34 -08:00
ef3d7963d8 Adds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888)
Summary:
Adds symbolic diff and tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13888

Differential Revision: D13115548

Pulled By: soumith

fbshipit-source-id: ba75b01a95a5715a7761724dda018168b6188917
2018-11-18 09:22:31 -08:00
07a8a730af Print warning when ROCm memory leaking is detected in pytorch tests (#14151)
Summary:
We keep seeing random failures in CI because of ROCm memory leaking, e.g:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3102//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3080//console

To make the CI more stable, turn it to warning instead of failure.

iotamudelta please help investigating the memory leaking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14151

Differential Revision: D13115096

Pulled By: bddppq

fbshipit-source-id: a13b68274ecba363d9d8436aa6a62ac40a77d78c
2018-11-18 00:11:44 -08:00
a5891e6124 Remove debugging code in test_cholesky_batched (#14156)
Summary:
They didn't turn up in my tests because I use pytest which doesn't
print debug statements if the tests pass

Differential Revision: D13115227

Pulled By: soumith

fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b
2018-11-17 22:28:21 -08:00
1bafa6236f Back out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4" (#14154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14154

Original commit changeset: e89c2e692178

Reviewed By: amateurcoffee

Differential Revision: D13115023

fbshipit-source-id: 8f9fb55842ae6c8139d5cd88ec6d0abb0c5cc5e7
2018-11-17 19:51:03 -08:00
12bb4742ad CostInference for 1D conv (#14009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14009

As title

Reviewed By: yinghai

Differential Revision: D13078718

fbshipit-source-id: 081e7b13ad6741c635ef413915b555f10f93bd33
2018-11-17 17:28:52 -08:00
a30ade1139 Batched cholesky decomposition (#14017)
Summary:
Implements batching for the Cholesky decomposition.

Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.

Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017

Differential Revision: D13087945

Pulled By: ezyang

fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e
2018-11-17 10:49:15 -08:00
390bf1e779 remove unnecessary file from avx2 list (#14012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14012

conv_dnnlowp_op.cc doesn't need avx2 anymore.

Reviewed By: dskhudia

Differential Revision: D13079665

fbshipit-source-id: dbfe8d2213de4969b6334d54de81d51149268cbd
2018-11-17 10:29:25 -08:00
505dedf6ad Change from using enum to int to store data_type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14140

Differential Revision: D13112937

Pulled By: bddppq

fbshipit-source-id: 124d9546bfbd1f9c207a21e40eb3646f7739bd58
2018-11-17 09:24:03 -08:00
4f0434d5ab Revert "CircleCI: fix NCCL install (#14124)" (#14146)
Summary:
This reverts commit a1fa9d8cf9b2b0e7373ec420c2487d4dfd0e587c.

[pytorch_linux_trusty_py2_7_9_build](https://circleci.com/gh/pytorch/pytorch/270206?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console):
```
Nov 17 07:37:27 + sudo apt-get -qq update
Nov 17 07:37:30 W: Ignoring Provides line with DepCompareOp for package gdb-minimal
Nov 17 07:37:30 W: You may want to run apt-get update to correct these problems
Nov 17 07:37:30 + sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
Nov 17 07:37:30 E: Command line option --allow-downgrades is not understood
Nov 17 07:37:30 + cleanup
Nov 17 07:37:30 + retcode=100
Nov 17 07:37:30 + set +x
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14146

Differential Revision: D13113912

Pulled By: bddppq

fbshipit-source-id: cd9d371cf72159f03d12a8b56ed5bd2060ebbe59
2018-11-17 00:35:31 -08:00
fade36668a Revert D10428917: [Caffe2] Add cost into profile observer
Differential Revision:
D10428917

Original commit changeset: 7c100e551bdd

fbshipit-source-id: 5164d9ba61cc103eccfdeb91a5cc140cea31a819
2018-11-16 23:30:07 -08:00
a43037fa11 Revert D10439558: Add cost for non-linear ops
Differential Revision:
D10439558

Original commit changeset: 9aeb05bac8b5

fbshipit-source-id: f00977b4f95bdd500d254eb44fb5b0c816506ee4
2018-11-16 23:30:05 -08:00
afc91e4900 Update FXdiv submodule (#14128)
Summary:
Use the most recent version that disables inline assembly.
I suspect inline assembly causes miscompilation on some versions of gcc7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14128

Reviewed By: bddppq

Differential Revision: D13112370

Pulled By: Maratyszcza

fbshipit-source-id: 36cc95dc51390a293b72c18ae982c3a515a11981
2018-11-16 22:45:26 -08:00
6d9a7d0e60 Rename neon2sse.h to NEON_2_SSE.h to match upstream repo
Summary:
- NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics
- Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h
- This patch fix incompatibilities between internal and upstream versions

Reviewed By: hlu1

Differential Revision: D13096755

fbshipit-source-id: 65e1df9a2a5e74bd52c9aee9be27469ba938cd8c
2018-11-16 21:41:53 -08:00
351478439f Disable QNNPACK for multi-architecture iOS builds (#14125)
Summary:
QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects:
- Disables QNNPACK in multi-arch iOS builds
- Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125

Differential Revision: D13112366

Pulled By: Maratyszcza

fbshipit-source-id: b369083045b440e41d506667a92e41139c11a971
2018-11-16 21:18:01 -08:00
d56b2258f4 Register caffe2 layer norm with c10 dispatcher (#13693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13693

We can't directly call the caffe2::Operator class from c10 yet because that class isn't deprotobuffed yet.
Instead, we factor out the kernel into a reusable static method and call it from the caffe2::Operator and
also register it with c10.

Reviewed By: ezyang

Differential Revision: D12912242

fbshipit-source-id: c57502f14cea7a8be281f9787b175bb6e402d00c
2018-11-16 20:17:47 -08:00
c905a81c92 Add c10/core/ to cmake build (#14111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14111

It was already in TARGETs, but we forgot it in cmake.

Reviewed By: ezyang

Differential Revision: D13105166

fbshipit-source-id: f09549e98ebca751339b5ada1150e00cc4cd9540
2018-11-16 20:17:45 -08:00
bb404e7a32 Update atol scale in dnnlowp test (#14135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14135

Update atol scale of dnnlowp test. Can't reproduce the flaky test error in the task locally even after setting the same seed value, but found according to comments in check_quantized_results_close(), atol_scale should be 1/1.9=0.526315789473684, which is larger than current value 0.51. So increase the atol_scale to 0.53.

Reviewed By: jspark1105

Differential Revision: D13108415

fbshipit-source-id: 1e8840659fdf0092f51b439cf499858795f9706a
2018-11-16 19:18:55 -08:00
c784f847de fix sparse_adagrad param_size overflow error (#14049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14049

param_size should be passed as int64_t

Reviewed By: hyuen

Differential Revision: D13090511

fbshipit-source-id: 7892d315d7c82c7d7ca103fb36d30cdf1fe24785
2018-11-16 18:53:32 -08:00
cbc94894fb Add cost for non-linear ops (#13327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13327

Add cost inference function to non-linear ops. Since the actual flops of the non-linear operator depends on the implementation, we use the number of non-linear operations as the proxy for the analytical flops for non-linear operators.

Reviewed By: jspark1105

Differential Revision: D10439558

fbshipit-source-id: 9aeb05bac8b5c7ae5d351ebf365e0a81cf4fc227
2018-11-16 18:53:30 -08:00
86dc3ab252 Add cost into profile observer (#12793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12793

Add analytical cost into profile observer. It includes the op level cost information for each op run and net level aggregated cost information for each op type.

It outputs the following information:
1. analytical flops
2. analytical bytes_read
3. analytical bytes_written

Example output at op level:
```I1017 14:58:14.245978 3686541 profile_observer_gpu.cc:26] --------- Starting operator FC op#24 ---------
I1017 14:58:14.246049 3686541 profile_observer_gpu.cc:33] Input 0: Tensor model1/embedded_encoder_inputs of type float. Dims: (17,1,256,):
I1017 14:58:14.246109 3686541 profile_observer_gpu.cc:33] Input 1: Tensor model1/encoder/layer0/fw/milstm/i2h_w of type float. Dims: (2048,256,):
I1017 14:58:14.246176 3686541 profile_observer_gpu.cc:33] Input 2: Tensor model1/encoder/layer0/fw/milstm/i2h_b of type float. Dims: (2048,):
I1017 14:58:14.246217 3686541 profile_observer_gpu.cc:44] Argument 0: name: "use_cudnn" i: 1
I1017 14:58:14.246271 3686541 profile_observer_gpu.cc:44] Argument 1: name: "cudnn_exhaustive_search" i: 0
I1017 14:58:14.246338 3686541 profile_observer_gpu.cc:44] Argument 2: name: "order" s: "NHWC"
I1017 14:58:14.246372 3686541 profile_observer_gpu.cc:44] Argument 3: name: "axis" i: 2
I1017 14:58:14.246418 3686541 profile_observer_gpu.cc:44] Argument 4: name: "quantization_scheme" i: 1
I1017 14:58:14.246470 3686541 profile_observer_gpu.cc:53] Output 0: Tensor model1/encoder/layer0/fw/milstm/i2h of type float. Dims: (17,1,2048,):
I1017 14:58:14.246596 3686541 profile_observer_gpu.cc:61] Cost (flops, bytes_read, bytes_written):
I1017 14:58:14.246649 3686541 profile_observer_gpu.cc:62]        17860608 2122752 139264
I1017 14:58:14.246677 3686541 profile_observer_gpu.cc:64] --------- Finished operator FC in 0.764221 ms ---------
```
Example output at net level:
```
I1017 11:13:44.675585 3146691 profile_observer_gpu.cc:165] ================ Detailed stats for net model0/encoder/layer0/bw/milstm ================
I1017 11:13:44.675662 3146691 profile_observer_gpu.cc:167] Cost (flops, bytes_read, bytes_written) per operator type:
I1017 11:13:44.675706 3146691 profile_observer_gpu.cc:169]        20992000 42045440 81920 FC
I1017 11:13:44.675745 3146691 profile_observer_gpu.cc:169]           20480 163840 81920 Mul
I1017 11:13:44.675824 3146691 profile_observer_gpu.cc:169]           20480 163840 81920 Sum
I1017 11:13:44.675878 3146691 profile_observer_gpu.cc:169]               0 0 0 ElementwiseLinear
I1017 11:13:44.675909 3146691 profile_observer_gpu.cc:169]               0 0 0 LSTMUnit
I1017 11:13:44.675958 3146691 profile_observer_gpu.cc:169]               0 0 0 rnn_internal_apply_link
```

Reviewed By: mdschatz

Differential Revision: D10428917

fbshipit-source-id: 7c100e551bdd3ac8d7c09be12c72d70a2d67cae1
2018-11-16 18:53:28 -08:00
a1fa9d8cf9 CircleCI: fix NCCL install (#14124)
Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR is trying to figure out why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14124

Reviewed By: teng-li

Differential Revision: D13112483

Pulled By: yf225

fbshipit-source-id: 5f65997586648805cf52217a261389625b5535e1
2018-11-16 18:53:26 -08:00
eeb3e67eeb Fixed MPI build with higher version of GCC (#14122)
Summary:
This appears as I enabled -Werror in c10d build. Good to catch this and fix it.

Should fix https://github.com/pytorch/pytorch/issues/14078 and https://github.com/pytorch/pytorch/issues/13962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14122

Differential Revision: D13110678

Pulled By: teng-li

fbshipit-source-id: f4c19e16976d65debbd33ed59e17ddbaa19f765a
2018-11-16 18:53:24 -08:00
778e23606b multiprocessing.spawn python version check (#14039)
Summary:
This will be super helpful to the user
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039

Differential Revision: D13089200

Pulled By: teng-li

fbshipit-source-id: 29e7507bd8fe5a0c58a85c52f976bfca282b4c1b
2018-11-16 18:53:23 -08:00
ce6192a21f Don't python bind _thnn_ functions. (#14101)
Summary:
This is needed for moving nn functions to native functions, but since some functions are already named
this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101

Differential Revision: D13102219

Pulled By: gchanan

fbshipit-source-id: 6bbcca33a03ab1bf648f1b73cadfe84339fa3050
2018-11-16 17:18:08 -08:00
55e1b1ec3e Fix docs/cpp/requirements.txt (#14121)
Summary:
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14121

Differential Revision: D13108063

Pulled By: goldsborough

fbshipit-source-id: 35cf65ba776e8826c5cab7ae6d3a2d446f87e7cc
2018-11-16 14:56:30 -08:00
8610ff1072 Allow cooperative structured objects to be passed modules in tracing (#13961)
Summary:
Before this patch, the JIT does not allow Module's forward to take
structured objects.
This patch allows cooperative objects to do so.
Cooperative means:
- It has a method self._jit_unwrap() that returns (a list/tuple of)
  tensors. These are then used in _iter_tensors.
- It has a method self._jit_wrap(flattened_input) that takes
  (a list/tuple?) the flattened_unput (potentially more than it needs)
  and returns itself (updated) and the unconsumed flattened_inputs.
  This is then used in the _unflatten mechanism.

This is all it takes to permit maskrcnn-benchmark to use
its structured BoxList/ImageList types and trace it without calling
the .forward directly.
I'll push a model working with this patch in
https://github.com/facebookresearch/maskrcnn-benchmark/pull/138

I must admit I haven't fully checked whether there are ONNX changes needed before it, too, can profit, but I would be hopeful that anything currently usable remains so.

fmassa zdevito

So the main downside that I'm aware of is that people will later want to use more elaborate mechanisms, but I think this could be done by just amending what wrap/unwrap are returning / consuming.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13961

Differential Revision: D13103927

Pulled By: soumith

fbshipit-source-id: 2cbc724cc4b53197388b662f75d9e601a495c087
2018-11-16 14:02:13 -08:00
fb6535ec70 Add SharedDataset (#13800)
Summary:
This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred.

jaliyae apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800

Differential Revision: D13075610

Pulled By: goldsborough

fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c
2018-11-16 13:07:10 -08:00
96e5d23bad remove dynamic initialization warning (#13913) (#13967)
Summary:
removed assignment in default constructor.
removed static shared memory and used dynamic shared memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13967

Differential Revision: D13089996

Pulled By: soumith

fbshipit-source-id: 2a218b909c849bed39636b45a02d10ebc279a0b0
2018-11-16 13:04:22 -08:00
5b1b8682a3 Missing .decode() after check_output in cpp_extensions (#13935)
Summary:
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13935

Differential Revision: D13090852

Pulled By: goldsborough

fbshipit-source-id: 47da269d074fd1e7220e90580692d6ee489ec78b
2018-11-16 12:16:29 -08:00
8e91da4cb3 Windows shared build (#13550)
Summary:
Hi guys,

I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.

What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.

After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550

Reviewed By: ezyang

Differential Revision: D13042597

Pulled By: orionr

fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
2018-11-16 12:16:28 -08:00
2c21de2007 Make JOIN_TIMEOUT longer for ppc64le (#14107)
Summary:
This should resolve the issue on ppc64le getting FAIL: test_proper_exit (__main__.TestDataLoader). This only happens when the CI build machine is very busy and fails with a timeout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14107

Differential Revision: D13103859

Pulled By: soumith

fbshipit-source-id: 268be80b59840853c5025f3211af272f68608fe5
2018-11-16 12:12:58 -08:00
c192788188 Log error from the net's run (#14035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14035

Log error meesage in case of net's run failure

Reviewed By: andrewwdye

Differential Revision: D13085431

fbshipit-source-id: d79f76782410cd3a5bd2d8d7f5fb1e535d821051
2018-11-16 12:06:50 -08:00
0d7a986da1 Change hip filename extension to .hip (#14036)
Summary:
xw285cornell

- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036

Reviewed By: xw285cornell

Differential Revision: D13091813

Pulled By: bddppq

fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
2018-11-16 11:55:59 -08:00
30018fcd0b Enable Caffe2 ROCm test on centos (#14090)
Summary:
xw285cornell petrex ashishfarmer rohithkrn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14090

Differential Revision: D13096874

Pulled By: bddppq

fbshipit-source-id: b471c6e4db95cd51567745a2f758d58bba7eafad
2018-11-16 11:51:58 -08:00
5a53861d3a Enable Caffe2 test on centos (#14091)
Summary:
Turns out we don't have any centos test CI job
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14091

Differential Revision: D13104722

Pulled By: bddppq

fbshipit-source-id: 22fe92ad4b7f2c391eea16b8b95658fa1ee605e2
2018-11-16 11:51:56 -08:00
1256cbaa69 Relax limits for gradients in test_jit's checkGraph (#14094)
Summary:
- This should help TestJit.test_lstm_fusion_concat_cuda
  to be less flaky. (Checked on manual_seed 0..99)
  Fixes: #14026
- Revert the renaming of test_fused_abs that was introduced
  to game the order of tests to avoid the flakiness above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14094

Differential Revision: D13100174

Pulled By: soumith

fbshipit-source-id: 91bb63b07a960a81dddfc0bf25c67696c0f6c46d
2018-11-16 11:43:52 -08:00
2983998bb3 add torch-python target (#12742)
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742

Reviewed By: soumith

Differential Revision: D13089691

Pulled By: anderspapitto

fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
2018-11-16 11:43:48 -08:00
cb86ae304e alias annotation parsing #2 (#14053)
Summary:
hopefully this one doesn't break master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14053

Differential Revision: D13093406

Pulled By: suo

fbshipit-source-id: 8fed44f1a3d463748726cb14acac2ea53dedf29b
2018-11-16 11:39:25 -08:00
77c2f4d0d7 Make THPDtype_New error instead of truncate (#14103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14103

Addressing T34828781, we change THPDtype_New so that it throws a RuntimeError if the length of name is greater than buffer size (DTYPE_NAME_LEN) - instead of truncating the string to fit the buffer.

Reviewed By: ezyang

Differential Revision: D13094600

fbshipit-source-id: d0dbf8fdfa342630c31f4d8ca7230d5f24a1254a
2018-11-16 11:35:18 -08:00
7c053b7e64 Add filler for SparseLengthsWeightedSum (#13949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13949

This diff adds support to fillers for `SparseLengthsWeight*` ops. It does 3 things:
1. Add the fillers for `SparseLengthsWeight*` ops
2. Add filling heuristics to consider the path of `LengthsRangeFill` -> `Gather` -> `SparseLengthsWeightedSum`, where the length input is shared by `LengthsRangeFill` and `SparseLengthsWeightedSum`. Therefore, we need to carefully bound the value of that length input so that at `Gather`, it does not index out-of-bound for the weight input of `Gather`.
3. Fix and simplify the logic of `math::RandFixedSum`, where we just keep rejecting the generated value if it violates the invariants.

Reviewed By: highker

Differential Revision: D13048216

fbshipit-source-id: bfe402e07e6421b28548047d18b298c148e0ec87
2018-11-16 11:31:05 -08:00
3c7b575a14 Update ATen doc with optional syntax (#14086)
Summary:
Update the readme to reflect the recent optional syntax change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14086

Differential Revision: D13096114

Pulled By: wanchaol

fbshipit-source-id: 713834d4d92021e1c7a31f3a56a00fb7da58c348
2018-11-16 10:03:24 -08:00
562f61a662 Add missing space in stft doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14092

Reviewed By: soumith

Differential Revision: D13100177

Pulled By: SsnL

fbshipit-source-id: 4eeaa3d0c04212516941d8d5a266aafb53bd9672
2018-11-16 09:57:06 -08:00
e4bb56570c Preemptively test for out-of-order length. (#13933)
Summary:
torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324

We were seeing this segfault on throw, pre-emptively checking avoids
this:

*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933

Differential Revision: D13090389

Pulled By: nairbv

fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
2018-11-16 08:39:05 -08:00
c7a247facf nomnigraph - support subgraph visualization (#13795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13795

Add ability for dot string generation for a single subgraph and python bindings (which is pretty useful for model exploration in Python)
Restructure DotGenerator class a bit to make it easy to implement this feature

Reviewed By: bwasti

Differential Revision: D13010512

fbshipit-source-id: 825665438394b7e6968ab6da167b477af82a7b62
2018-11-16 08:19:20 -08:00
d7b95dda51 nomnigraph - easy - expose hasProduce(NodeRef) to python (#14075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14075

Expose hasProduce(NodeRef) to python

Reviewed By: bwasti

Differential Revision: D13092930

fbshipit-source-id: f1ec06e73e0f5f6a16ad0cbb7d2e3e499a861d8e
2018-11-16 08:19:18 -08:00
e7f5fceb99 nomnigraph - easy - expose inducesEdges and addNode to python's NNSubgraph (#14074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14074

expose inducesEdges and addNode to python's NNSubgraph. This make it easy to manually construct a NNSubgraph in python

Reviewed By: bwasti

Differential Revision: D13092885

fbshipit-source-id: a94ed0b318162e27e3a4b5a4954eb6d169da7405
2018-11-16 08:19:16 -08:00
7b0f674367 Two small improvements to TorchConfig.cmake (#13849)
Summary:
- Fix the test for TORCH_INSTALL_PREFIX in the environment.
  The previous version didn't actually work.
- Add a guess path to find_package for Caffe2. I'd suspect that
  it's close to the Torch one.

I noticed these while compiling PyTorch custom ops, in particular for the C++ side when you don't want to go through Python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13849

Differential Revision: D13090186

Pulled By: ezyang

fbshipit-source-id: cfe98900ab8695f008506a8d0b072cfd9c673f8f
2018-11-16 07:41:57 -08:00
1b1cdd944c Keep ModuleList consistent with python list in __setitem__ function. (#13102)
Summary:
`ModuleList` class function `__setitem__` has implicit rist
```
In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)])

In [27]: mlist
Out[27]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
)

In [28]: mlist[-1] = nn.ReLU()

In [29]: mlist
Out[29]:
ModuleList(
  (0): ReLU()
  (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1))
  (-1): ReLU()
)

In [30]: mlist[-1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-229d1b6823a0> in <module>()
----> 1 mlist[-1]

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx)
    134             return ModuleList(list(self._modules.values())[idx])
    135         else:
--> 136             return self._modules[self._get_abs_string_index(idx)]
    137
    138     def __setitem__(self, idx, module):

KeyError: '2'

```

modified as
```
    def __setitem__(self, idx, module):
        idx = self._get_abs_string_index(idx)
        return setattr(self, str(idx), module)
```
to fix it.

```
In [31]: class NewModuleList(nn.ModuleList):
    ...:     def __setitem__(self, idx, module):
    ...:         idx = self._get_abs_string_index(idx)
    ...:         return setattr(self, str(idx), module)
    ...:

In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)])

In [33]: mlist[-1] = nn.ReLU()

In [34]: mlist
Out[34]:
NewModuleList(
  (0): ReLU()
  (1): ReLU()
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102

Differential Revision: D13092480

Pulled By: ezyang

fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749
2018-11-16 07:39:26 -08:00
a3f39f1ebb Fix randint docs (#14083)
Summary: Closes #14079

Differential Revision: D13095904

Pulled By: soumith

fbshipit-source-id: e39319c5326bfdf6f401eaddebe94474349901c3
2018-11-16 03:04:02 -08:00
2fe4711eb4 Revert "Remove OptionsGuard from ATen (#13738)" (#14082)
Summary:
This reverts commit 37cb357d8da3427900b8f72f6de7e77b77dcdbae.

Try to see if it unbreaks master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082

Differential Revision: D13095888

Pulled By: bddppq

fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9
2018-11-15 23:47:36 -08:00
45fd77d3b7 Adding GLOO_SOCKET_IFNAME env to allow user set gloo device (#14065)
Summary:
Address https://github.com/pytorch/pytorch/issues/14063

This is a lot easier to use, follow the NCCL convention since they provide the similar NCCL_SOCKET_IFNAME.

We can later document this better.

Tested on my two hosts, and work out of the box
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14065

Differential Revision: D13095522

Pulled By: teng-li

fbshipit-source-id: 131dff212626f1aab7e752427f1b684845b909dc
2018-11-15 22:33:56 -08:00
3808e9fad3 Caffe2: Fix for creating entries of external_input in predic_net (#12979)
Summary:
Currently after performing export it gives two entries of externel_input
  of input data in predict_net proto because it extends the externel_input
  twice once seperately using input blob and one it is extendind all the entries
  of external_input from proto in which input blob is already included

Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979

Differential Revision: D12916349

Pulled By: soumith

fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d
2018-11-15 22:33:50 -08:00
1e8aeb0bee fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14076

Differential Revision: D13095528

Pulled By: suo

fbshipit-source-id: 78d08719ad5579dc0d6bb9563972df393e4286fe
2018-11-15 22:10:06 -08:00
3a15de9e44 Fix CUDA_tensor_apply1 base case (#14056)
Summary:
I got some build errors when modifying the `bernoulli_tensor_cuda_kernel` in my Generator refactor https://github.com/pytorch/pytorch/pull/13070. Turns out the functions signature for `CUDA_tensor_apply1` was a little wrong. This PR fixes it. Following is the code and error I was getting before this patch:

Code:
```
template<typename scalar_t, typename prob_t>
void bernoulli_tensor_cuda_kernel(
    at::Tensor& ret, const at::Tensor& p,
    std::pair<uint64_t, uint64_t> seeds) {
  // The template argument `4` below indicates that we want to operate on four
  // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details.
  at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>(
      ret, p,
      [seeds] __device__(scalar_t& v1, const prob_t& p1) {
      at::cuda::Philox4_32_10 engine(
                                seeds.first,
                                blockIdx.x * blockDim.x + threadIdx.x,
                                seeds.second);
      auto x = at::cuda::standard_uniform_distribution(engine);
      assert(0 <= p1 && p1 <= 1);
      v1 = static_cast<scalar_t>(x <= p1);
    }
  );
}
```

Error:
```
ov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAApplyUtils.cuh(236): error: no suitable conversion function from "const lambda [](uint8_t &)->void" to "int" exists
Nov 15 23:43:03           detected during:
Nov 15 23:43:03             instantiation of "void at::cuda::<unnamed>::ApplyOp1<Op, scalar, IndexType, ADims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar, IndexType> &, const Op &, int, IndexType, Offsets...) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, remaining_steps=1, Offsets=<>]"
Nov 15 23:43:03 (282): here
Nov 15 23:43:03             instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply1<Op,scalar,IndexType,ADims,step>(at::cuda::detail::TensorInfo<scalar, IndexType>, IndexType, Op) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, step=1]"
Nov 15 23:43:03 (735): here
Nov 15 23:43:03             instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,step,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, step=1, Op=lambda [](uint8_t &)->void]"
Nov 15 23:43:03 (774): here
Nov 15 23:43:03             instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, Op=lambda [](uint8_t &)->void]"
Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(118): here
Nov 15 23:43:03             instantiation of "void <unnamed>::bernoulli_scalar_cuda_kernel<scalar_t>(at::Tensor &, double, std::pair<uint64_t, uint64_t>) [with scalar_t=uint8_t]"
Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(227): here
Nov 15 23:43:03
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14056

Differential Revision: D13095362

Pulled By: soumith

fbshipit-source-id: 6416bc91616ec76036479062a66517557a14d1b9
2018-11-15 21:33:07 -08:00
037d6b697b Add ResizeNearest DNNLOWP op (#13940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13940

As in title

Reviewed By: jspark1105

Differential Revision: D13054325

fbshipit-source-id: 81af5f095a1aca92d4b5e1fe0e71ae2f21b43922
2018-11-15 21:03:01 -08:00
f66cb02016 Turn fbgemm off by default for pytorch (#14048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14048

Setting USE_FBGEMM to OFF by default until we figure out properly separating avx2 code. See [this issue](https://github.com/pytorch/pytorch/issues/13993).  Pytorch can still be compiled with fbgemm by using USE_FBGEMM=ON.

Reviewed By: jspark1105

Differential Revision: D13090454

fbshipit-source-id: 6e0e92612e4362a306e376df3dc33e8edeb066e9
2018-11-15 18:42:16 -08:00
f17b2fdf1b Fixed THD DistributedDataParallel not picklable (#14051)
Summary:
This fixed https://github.com/pytorch/pytorch/issues/12261
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14051

Differential Revision: D13091703

Pulled By: teng-li

fbshipit-source-id: 16eb85a259c981f3cacd2fbaecc0edbae292e358
2018-11-15 18:10:47 -08:00
37cb357d8d Remove OptionsGuard from ATen (#13738)
Summary:
Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`.

ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738

Differential Revision: D13000962

Pulled By: goldsborough

fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72
2018-11-15 17:37:27 -08:00
8f4dc192b6 Fix DataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured (#14038)
Summary:
I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something.

Fixes https://github.com/pytorch/pytorch/issues/14002

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038

Differential Revision: D13088132

Pulled By: goldsborough

fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f
2018-11-15 17:30:41 -08:00
f930c4307c Clean up executor's execution flags (#13869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13869

Remove unused flags and consolidate them into one struct

Reviewed By: yinghai

Differential Revision: D13032207

fbshipit-source-id: 2cef093589036238732099e3851a97e739b5fd55
2018-11-15 17:11:51 -08:00
874a8a321b Fix out of order member fields initializaitons (#14015)
Summary:
xw285cornell

Unfortunately it's not easy to add -Werror=reorder flag since there are out of order initializations in thrust headers as well, and the rocm cmake macro hip_include_directories doesn't offer a way to include headers as external headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14015

Reviewed By: soumith

Differential Revision: D13081104

Pulled By: bddppq

fbshipit-source-id: 2540421cb29cf556c79f2d86c460bde6ea5a182e
2018-11-15 17:11:50 -08:00
31d41a983a Revert D13088038: [pytorch][PR] [jit] extend alias annotations
Differential Revision:
D13088038

Original commit changeset: 49dc5d0e9cd4

fbshipit-source-id: b77e4607f3cbd9c202c522a436f90e9a98acd4b4
2018-11-15 16:55:11 -08:00
6d378d3740 Updating C++ documentation to PyTorch theme. (#13791)
Summary:
Updates C++ documentation to the PyTorch Sphinx theme.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13791

Reviewed By: soumith

Differential Revision: D13013908

Pulled By: brianjo

fbshipit-source-id: 253a91c6784ad72aa1c37426cd4a945061a60fec
2018-11-15 16:45:52 -08:00
0d29846d5e Convert more weak functions (#14003)
Summary:
Same deal as #13707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14003

Differential Revision: D13076403

Pulled By: driazati

fbshipit-source-id: eb3cb3b2c31caf1de591b613bdc4c9a6ed4e1767
2018-11-15 16:45:50 -08:00
c5afad5579 Fix skip logic in caffe_translator_test.py (#13627)
Summary:
Avoid false failure by checking for the presence of the test data in setup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13627

Differential Revision: D13090324

Pulled By: ezyang

fbshipit-source-id: e85571943d168c0007212d7b1a5b99ffa0c39235
2018-11-15 16:45:49 -08:00
0e93500841 Remove async_polling (#13825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13825

async_polling was an intermediate step towards async_scheduling and is not used

Reviewed By: yinghai

Differential Revision: D13019059

fbshipit-source-id: eee6ba53e7f476ddb481afba3bf1768303864d32
2018-11-15 16:23:15 -08:00
0573169e23 Import a method from an python_print string (#13959)
Summary:
* Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures.
* Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero
* Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type.
* Fix a bug where empty blocks did not correctly emit "pass" in printer.
* Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object.
* Other minor printing bugs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959

Reviewed By: jamesr66a

Differential Revision: D13073519

Pulled By: zdevito

fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd
2018-11-15 16:11:37 -08:00
84d464f8f9 Revert "Upgrade mkldnn bridge to reduce overhead of bridge itself (#1… (#14040)
Summary:
…2164)"

This reverts commit 4b7c6150d848d134d1fe850e777dc68321d35465.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14040

Differential Revision: D13089531

Pulled By: yinghai

fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf
2018-11-15 15:34:15 -08:00
90b0c4f43d Tensor construction: combine Resize+mutable_data - 2/4 (#13943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13943

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13054815

fbshipit-source-id: e89c2e69217880980187f2befb844c277e51c1e0
2018-11-15 15:34:14 -08:00
136f5c9fe1 Replaced using declaration with explicit constructors 3/3 (#13875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13875

This replaces a using declaration with an explicit constructor

Reviewed By: mnovakovic

Differential Revision: D13033260

fbshipit-source-id: ce4cc5667ee66abdeebd1e49466c3cf3a65ffb96
2018-11-15 14:52:47 -08:00
3fbb753512 Revert D12873145: [pt1][tensor][refactor] FeedTensor returns a Tensor
Differential Revision:
D12873145

Original commit changeset: 653735c20d61

fbshipit-source-id: aa6e40a6a24c6f90acbe87b32b3be0020e2584f8
2018-11-15 14:52:46 -08:00
d91c686c33 extend alias annotations (#13632)
Summary:
Grab bag of additions to alias annotations that were useful when writing the alias analysis pass. Not very organized since these were mostly split off from that PR.
- Switch alias sets to actual sets, since we will want to union them.
- Correctly parse alias set unions `a|b`, and correctly parse wildcards
- Move writes into `AliasInfo`, which cleans up some code that was passing a `writes` vector everywhere and simplifies tracking aliased writes during analysis.
- Change Tensor list extraction ops to return wildcard tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13632

Differential Revision: D13088038

Pulled By: suo

fbshipit-source-id: 49dc5d0e9cd4895427fea3a87b0ec325bd5fe437
2018-11-15 14:23:40 -08:00
c7e0db140e use fabs instead of absf in fuser code for aten::abs (#13985)
Summary:
absf didn't work for CUDA

Fixes: #13971
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13985

Differential Revision: D13084601

Pulled By: soumith

fbshipit-source-id: 0027ee719ae2b6a2bfce9c26f21db9c5e6159686
2018-11-15 13:23:59 -08:00
c3578b561c Skip all builtin functions when importing names from _C._VariableFunctions to torch (#13884)
Summary:
We don't want builtin functions of `_C._VariableFunctions` to replace those of `torch`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13884

Reviewed By: ezyang

Differential Revision: D13044686

Pulled By: yf225

fbshipit-source-id: 23657d47a4e2fd8ee41103cd6a13c639ce107f67
2018-11-15 13:23:57 -08:00
4b7c6150d8 Upgrade mkldnn bridge to reduce overhead of bridge itself (#12164)
Summary:
Upgrade mkldnn bridge to reduce overhead of bridge itself
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12164

Reviewed By: yinghai

Differential Revision: D10159149

Pulled By: wesolwsk

fbshipit-source-id: 5ede1130c00a2cd3afe301dcb94bcb89e01bc5a2
2018-11-15 12:54:06 -08:00
3de0fd846f Fix converter to accept const NetDef&
Summary: convertToNNModule didn't accept `const Netdef&`.  fixed this

Reviewed By: duc0

Differential Revision: D13057450

fbshipit-source-id: dc6fa2c86077a56b955f15c369b941a2d32de911
2018-11-15 12:18:11 -08:00
5639332a28 fix the deeptext issue (#14005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14005

the partial initialization of tensor is no longer supported, we need to fix multiple places

Reviewed By: hl475

Differential Revision: D13078206

fbshipit-source-id: a1be2bd2a9f573db54e1366a0d7a17cc2e0db0c9
2018-11-15 12:13:45 -08:00
b8de8f6261 Refactor tensor construction in onnxifi_op
Summary: att

Reviewed By: ezyang

Differential Revision: D13028624

fbshipit-source-id: efd8dee5d59f26830a15bb17211eee373f6c8dee
2018-11-15 11:23:21 -08:00
464c0c2204 Use realpath for loaded libraries (#13936)
Summary:
I noticed `CDLL` needs an absolute path (when calling `torch.ops.load_library`)

zdevito soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13936

Differential Revision: D13075605

Pulled By: goldsborough

fbshipit-source-id: 297c490cfa3bfaf540b95a9c2644d9153abe4c32
2018-11-15 11:23:20 -08:00
17b2d2d373 fix TensorPrinter when tensor have 0 size. (#13986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13986

if totoal_count == 0, it crash on:

  values_stream << tensor_data[total_count - 1];

Reviewed By: jerryzh168

Differential Revision: D13066438

fbshipit-source-id: b7a2d681ca0cf5b68d78872c94fac6de9c5de2dc
2018-11-15 07:51:13 -08:00
4574ea3bec Make RNN operator handle exceptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13997

Reviewed By: dzhulgakov, bddppq

Differential Revision: D13072518

Pulled By: ilia-cher

fbshipit-source-id: c4fd897038b6dca41db652b9e063fc12d98f6d07
2018-11-15 00:48:22 -08:00
6d094224b9 Fix optional import/export, export multi-margin-loss (#13877)
Summary:
This PR did two thing:

1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic
2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877

Differential Revision: D13076090

Pulled By: wanchaol

fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8
2018-11-15 00:45:22 -08:00
ddbd87e310 Build with -Werror (#13998)
Summary:
Also fixed a warning

As a thought while trying to solve #12854
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13998

Reviewed By: pietern

Differential Revision: D13078615

Pulled By: teng-li

fbshipit-source-id: eb25c429d7dd28b42e4e95740a690d5794a0c716
2018-11-14 22:45:30 -08:00
5390ab1d52 Dont crash on 1d convolution (#13999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13999

Temporary mitigation for SEV3 https://our.intern.facebook.com/intern/sevmanager/view/s/168910/

Reviewed By: yinghai

Differential Revision: D13075307

fbshipit-source-id: 4df2bcc37b91900653443f7766d5bb080ca3f5a9
2018-11-14 22:38:00 -08:00
eb024cd1d0 don't throw in matchTypeVariables (#13989)
Summary:
Avoid throwing on match errors. In general, it's not good to throw when failure is expected.

But the real reason I'm doing this is it makes it annoying to set a breakpoint on exceptions in my debugger 😛
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13989

Differential Revision: D13069980

Pulled By: suo

fbshipit-source-id: 636d4371f8a5be45c935198b73cdea06275b1e9e
2018-11-14 21:45:19 -08:00
20e395a130 Fixed uninitialized warning (#14001)
Summary:
Fixing: https://github.com/pytorch/pytorch/issues/12014
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14001

Differential Revision: D13078583

Pulled By: teng-li

fbshipit-source-id: 6c8d663da81bc3e564f0643926d67260df828dd8
2018-11-14 21:37:11 -08:00
e3bb6ff334 Move c10 dispatcher prototype to c10/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13690

Reviewed By: dzhulgakov

Differential Revision: D12912235

fbshipit-source-id: 974b85790c23335be8130a50aa4692e3ddcd2bf9
2018-11-14 18:04:36 -08:00
4b0fc5200b Fix include paths for typeid.h (#13689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13689

Now that typeid.h lives in c10/util, the include paths should reflect that.

Reviewed By: ezyang

Differential Revision: D12912237

fbshipit-source-id: e54225f049f690de77cb6d5f417994b211a6e1fb
2018-11-14 18:04:09 -08:00
72da09bb4d Canonicalize THD includes with .. in them
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13980

Reviewed By: jerryzh168

Differential Revision: D13062706

fbshipit-source-id: 100e10d1bae7efc3e13f029708c2c1dd053ce074
2018-11-14 17:43:56 -08:00
7ea9c674bc migrate subgraph slicing to use moveBefore/moveAfter (#13862)
Summary:
Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability.

The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR.

Future steps:
- Get rid of code duplication (see above)
- Use DynamicDAG to back the `moveBefore/After` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862

Differential Revision: D13072871

Pulled By: suo

fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0
2018-11-14 17:33:36 -08:00
2356c8d542 device inference for Adam (#13990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13990

to make sure ITER blob lives on CPU.

Reviewed By: xianjiec

Differential Revision: D13056070

fbshipit-source-id: 148edbf745e50e886da3eb99d4e485d11c1924e2
2018-11-14 17:21:08 -08:00
fed8d8975a Various improvements to hipify_python.py (#13973)
Summary:
- Speed up hipify_python.py by blacklisting useless (and quite large)
  directory trees that it would otherwise recurse into

- Pass around relative paths instead of absolute paths.  This makes it
  easier to do filename matches based on the root of the tree.

- Redo the streaming output to contain more useful information

- Make it handle c10/cuda correctly, rewrite c10::cuda to
  c10::hip, and the header name from CUDAMathCompat.h to
  CUDAHIPCompat.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13973

Differential Revision: D13062374

Pulled By: ezyang

fbshipit-source-id: f0858dd18c94d449ff5dbadc22534c695dc0f8fb
2018-11-14 17:11:24 -08:00
02152c515e Ensure nn Losses check scalar vs non-scalar values.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860

Reviewed By: ezyang

Differential Revision: D13029364

Pulled By: gchanan

fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53
2018-11-14 16:46:27 -08:00
6811e32f03 Support exporting Gather and BatchGather to ONNX (#13987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13987

Gather and BatchGather are also used in sparse network.

Reviewed By: bddppq, houseroad

Differential Revision: D13067290

fbshipit-source-id: e09572a5c4544768f9e1af48166f7c8d78127e63
2018-11-14 15:40:17 -08:00
7daa829bce Implement unsqueeze for sparse vectors (this also makes stack work out of the box)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13760

Differential Revision: D13065342

Pulled By: umanwizard

fbshipit-source-id: a5e2e80f87ffbbfdf8759b1b593ef34d290ae907
2018-11-14 15:23:05 -08:00
ff4f4a0a35 Retry test on "Address already in use" error (#13911)
Summary:
This fixes #13907.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13911

Differential Revision: D13046256

Pulled By: pietern

fbshipit-source-id: bab70cd73ef868e23d4857b06e72830ad29ddb4f
2018-11-14 15:23:03 -08:00
61a0df5af0 Canonicalize THC/THCTensorMasked.cuh include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13977

Reviewed By: jerryzh168

Differential Revision: D13062564

fbshipit-source-id: 77d42585198cd75bc8a2625787604552e5369787
2018-11-14 14:56:30 -08:00
01d606e048 Canonicalize TH/THRandom.h include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13975

Reviewed By: jerryzh168

Differential Revision: D13062526

fbshipit-source-id: 510e0ff5ce68c20c2f46bae71efa8e4355c6ce05
2018-11-14 14:56:27 -08:00
9e1655bb22 Canonicalize THCUNN/linear_upsampling.h include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13979

Reviewed By: jerryzh168

Differential Revision: D13062649

fbshipit-source-id: 28b2cbe97613b485ab11bf35be60ca6ee668bbef
2018-11-14 13:50:30 -08:00
af6d1ec52c Canonicalize THCUNN/common.h include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13978

Reviewed By: jerryzh168

Differential Revision: D13062631

fbshipit-source-id: 2b1b13c28ee8be603b0cdca46c7ac7f86317c39f
2018-11-14 13:30:27 -08:00
a7d43702d4 Canonicalize THCGenerate*.h includes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13976

Reviewed By: jerryzh168

Differential Revision: D13062604

fbshipit-source-id: 48b7e2a2bdf97c55820036db9a4ff18a1f4dbce2
2018-11-14 13:30:25 -08:00
f446c67e2f submodule update to fix compilation warnings (#13925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13925

Fixing compilation warnings; Already fixed in fbgemm repo so just updating submodule

Reviewed By: jianyuh

Differential Revision: D13048100

fbshipit-source-id: 568f0f90a5499b6f2cab525b2379299d1565bbae
2018-11-14 13:27:32 -08:00
587f769a99 Fix missing symbol linker error when using libtorch generated on windows : (#13672)
Summary:
Libtorch is missing some symbols when generated on windows, causing linker errors when using it.

It seems like there were some issues in the past with enabling   CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS to export all symbols during the build.
(See the link below :
    - Enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS :  https://github.com/pytorch/pytorch/pull/3617?fbclid=IwAR084kOPgLUvYjpJMvGG_Q22IPcvmzlywamytdhxd5U3hELkESO6yM8BGfo
    - Disabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS :  https://github.com/pytorch/pytorch/issues/9092?fbclid=IwAR0QSeEcXNh8A1zrgCQvsEq-0S0GJvHBywhZ6kDvoHe6TeRUsTNRzzgXea0 and https://github.com/pytorch/pytorch/pull/9693?fbclid=IwAR2cSya4fbeHvF-BYkXk2NesXjQ3ZWg9vHJ3ivrT9GDJYqHSpg518KAMzW8 )

So enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS is not an option. But some symbols are still missing for Libtorch to be working.
We added some functions to TORCH_API in this PR, but we might be missing some.
(We also tried adding the whole structure Method  (struct TORCH_API Method { ... }) instead of adding the functions separately, but the build fails with a "one or more multiply defined symbols found" error)

Do you have any recommendations on how to detect functions that should/shouldn't be in TORCH_API, so the build is successful and the generated Libtorch has all the required exported symbols?

I also attached toch_exports_missing.txt, which contains the symbols that are exported with the CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS flag enabled but not in the current Libtorch version.
( by generating the output for both torch.dll libraries with "dumpbin /EXPORTS torch.dll" and comparing both outputs and generating the difference)
So any symbol that could be missing from Libtorch should be in this list, but the list has more than 8000 symbols, and I am not sure which ones require to be exported and added to TORCH_API.

This PR currently exports the missing symbols for torch::jit::script::Method that appears in the attached list (in the exception of defaultSchemaFor, and emit_call_to that cause a "multiply defined symbols" error).

[torch_exports_missing.txt](https://github.com/pytorch/pytorch/files/2558466/torch_exports_missing.txt)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13672

Differential Revision: D12959348

Pulled By: soumith

fbshipit-source-id: ef7e85b047b3937dc6aa01ba67e4e01f8eae4eca
2018-11-14 12:00:36 -08:00
0478d32cb8 Move AlignOf, SmallVector and ArrayRef to c10.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916

Reviewed By: smessmer

Differential Revision: D13046722

fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d
2018-11-14 11:13:16 -08:00
4983397c02 Better documentation and warning (#13946)
Summary:
This is to address https://github.com/pytorch/pytorch/issues/12603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13946

Differential Revision: D13055254

Pulled By: teng-li

fbshipit-source-id: 20a206ebd3456eac9dc50584664c4bca3ee955d1
2018-11-14 10:41:46 -08:00
143ba72264 Move cosine_similarity to ATen (#12199)
Summary:
I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199

Differential Revision: D13062326

Pulled By: nairbv

fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd
2018-11-14 10:41:44 -08:00
53c3a92a50 consistent rounding (#9)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/9

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13960

The vectorized code was rounding to even in halfway cases with _mm256_round_ps + (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) (see more details in https://software.intel.com/en-us/node/523819), but we were still using std::round in a couple of places which does rounding away from zero in halfway cases.
With this diff, we use std::nearbyint in all scalar code (except a few cases where we don't care exact rounding mode and uses rint which is the fastest in general) to be more consistent. nearbyint is the same as what the vectorized code does only when the current rounding mode is FE_TONEAREST but in practice this is OK because we almost always use the default rounding mode FE_TONEAREST.

This is inspired by Marat's diff for mobile quantization.

Reviewed By: dskhudia

Differential Revision: D13017719

fbshipit-source-id: 6b8f99db7ea2e233aa2e3bd2adf622e03ed6258e
2018-11-14 10:21:42 -08:00
96663edca6 Remove the hip ignore; it conflicts with real in-tree HIP development. (#13972)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13972

Differential Revision: D13062253

Pulled By: ezyang

fbshipit-source-id: 4442b194bb08e4f718dff844743d23fd3a6dc8e9
2018-11-14 10:03:19 -08:00
35a24a9a94 Example with edge case 0 for torch.sign (#13771)
Summary:
The behavior of the edge case 0 is not self-evident for the `torch.sign` function ( I personally expected a result of 1):
```python
>>> a = torch.tensor([0.7, -1.2, 0., 2.3])
>>> a
tensor([ 0.7000, -1.2000,  0.0000,  2.3000])
>>> torch.sign(a)
tensor([ 1., -1.,  0.,  1.])
```
This is not currently documented, I think it is worth it to give a simple example showing this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13771

Differential Revision: D13044520

Pulled By: ailzhang

fbshipit-source-id: c3011ccbdf1c13348f6c7242b06a9aa52ebc9204
2018-11-14 09:16:09 -08:00
dead6632b3 bug fix for 1D conv in NHWC layout (#13813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13813

Title says it all.

Reviewed By: hx89

Differential Revision: D13017652

fbshipit-source-id: e3cea6c7dee2878119d154bb9f3efbc329d7c0d5
2018-11-14 09:16:07 -08:00
4341dd2753 Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906)
Summary:
This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d.

Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906

Reviewed By: ezyang

Differential Revision: D13044507

Pulled By: gchanan

fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e
2018-11-14 07:58:35 -08:00
46c0e2c268 Clean up caffe2/tools/build_pytorch_libs.{sh,bat} (#13954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13954

- Remove extra include directories from BASIC_C_FLAGS.  We suspect that
  in some rare cases on Windows, this can cause us to get confused about
  which header to include.  Make this agree with build_pytorch_libs.sh
  Ditto with BASIC_CUDA_FLAGS
- Delete CWRAP_FILES from both places; it's unused in sh, and it's
  dead in CMAKE
- Delete NO_NNPACK in Windows, replace with USE_NNPACK (I'm not sure
  if this actually does anything on Windows lol)
- Delete a bunch of defunct cmake arguments from the build (NOT
  build_caffe2) target.

Reviewed By: soumith

Differential Revision: D13056152

fbshipit-source-id: efcc06c65a9f3606666196f3fe5db268844d44d9
2018-11-14 07:42:11 -08:00
a440629f14 Remove defunct build.sh/THConfig.cmake (#13953)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13953

Differential Revision: D13056128

Pulled By: ezyang

fbshipit-source-id: 9fd17f4fe000ac06144b04be996ef6849de2bafa
2018-11-14 07:42:09 -08:00
fbabe5bf62 Rename c10::detail to c10::impl (#13838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838

According to Sebastian, the detail convention is specifically for header-private
functionality.  That's not what c10/detail is; it's general, library private headers
which may be used in multiple places within PyTorch.  Rename it to impl to avoid
the confusion in nomenclature.

Reviewed By: smessmer

Differential Revision: D13024368

fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef
2018-11-14 07:39:37 -08:00
db5aeafa60 Avoid grabbing DeviceGuard in at::empty when possible (#13785)
Summary:
Changed at::empty to allocate the correct amount of memory instead of
"allocate 0 memory and then resize it to the necessary size".

This leads to a 300 ns speedup for at::empty for a cuda tensor of size (64, 2048).
(1790ns -> 1460ns for at::empty).

Also does the following:
Removes DeviceGuards for:
- empty_* functions that end up calling functions that already have a
  DeviceGuard
- t(), which gets called a lot in LSTMs,
- Remove one of the two DeviceGuard that at::empty(...) uses. It only
  needs one for correctness, the other comes from the resize_
  implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13785

Reviewed By: ezyang

Differential Revision: D13004938

Pulled By: zou3519

fbshipit-source-id: f45b7e6abe06c05d1f81cc53e190c7bab6d1c116
2018-11-14 07:39:35 -08:00
1e45e7a404 Speed up fusion compiler tensor allocation (#13914)
Summary:
Previously the fusion compiler would allocate an empty tensor and then
resize it to the correct size. This PR changes the fusion compiler to
allocate a tensor of the correct size the first time around. The
difference between these approaches for a single tensor is around 400ns;
for something like LSTMCell's FusionGroup that emits 8 outputs this is
theoretically a 3us win.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13914

Differential Revision: D13046728

Pulled By: zou3519

fbshipit-source-id: e2f28c0dc2ee5bcfee0efe10610039694691415c
2018-11-14 07:26:27 -08:00
109dd5b412 Move typeid to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13688

Reviewed By: ezyang

Differential Revision: D12912240

fbshipit-source-id: 1632172003682f62cea9b8c52596c3c0d8504b23
2018-11-14 02:58:04 -08:00
97036d3c30 FileStore auto deletes file and FileStore::add bug fix (#13708)
Summary:
This addressed: https://github.com/pytorch/pytorch/issues/11874

and we will have the identical file init_method behavior as the previous THD file init.

Also the FileStore::add bug is pretty annoying.

Two bugs:
(1) Add doesn't append to the end of the file.
(2) Cache doesn't get updated.

Both are fixed and tests are covered.

I examined the /tmp to ensure that all temp files are auto deleted after test_c10d.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13708

Reviewed By: pietern

Differential Revision: D12972810

Pulled By: teng-li

fbshipit-source-id: 917255390aa52845f6b0ad0f283875a7a704da48
2018-11-14 01:34:22 -08:00
e2a7d43dfd Use the torch.proto to store script module (#13736)
Summary:
Directly operate protobuf in the serializer/deserializer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13736

Reviewed By: dzhulgakov

Differential Revision: D13028487

Pulled By: houseroad

fbshipit-source-id: e578474008874f00f2a22f0a2ffd85f52643881a
2018-11-14 00:22:09 -08:00
2871d3951f More robust ->match behavior (#13952)
Summary:
Allow schema matching against string literals to work even with
white space and other minor differences.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13952

Differential Revision: D13056043

Pulled By: zdevito

fbshipit-source-id: 0b502ce8311587308370285f7062914fce34faf0
2018-11-13 23:40:42 -08:00
346c418fc9 Add caffe2 clang7 build CI job
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13928

Differential Revision: D13053770

Pulled By: bddppq

fbshipit-source-id: 8a015d4d8c3fb6a98b86ce7d7d96c13fc4f0d3f5
2018-11-13 23:12:23 -08:00
5151d33287 Unflake the ordering enforcement test (#13919)
Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.

Fixes https://github.com/pytorch/pytorch/issues/13634

colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919

Differential Revision: D13051718

Pulled By: goldsborough

fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
2018-11-13 21:05:02 -08:00
f4e502a8c5 Added MIOpen conv transpose op (#13938)
Summary:
This pull request contains changes for:
1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc
2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc
3. Modifying the tests to run convTranspose op using MIOpen engine

Differential Revision: D13055099

Pulled By: bddppq

fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865
2018-11-13 21:01:52 -08:00
5059beb644 Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail (#13902)
Summary:
Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail()

Otherwise crash trace:

```
caffe2/caffe2/operators/hip/top_k_radix_selection_hip.cuh:409:7: error:  '__assert_fail':  no overloaded function has restriction specifiers that are compatible with the ambient context 'gatherTopK'
      assert(writeIndex < outputSliceSize);
      ^
glibc/include/assert.h:88:6: note: expanded from macro 'assert'
   : __assert_fail (#expr, __FILE__, __LINE__, __ASSERT_FUNCTION))
     ^
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13902

Reviewed By: bddppq

Differential Revision: D13042820

Pulled By: xw285cornell

fbshipit-source-id: 5117f6946db8109ae35e644e7423c8456e65e61f
2018-11-13 20:55:50 -08:00
0bedaf9cf6 Update setup.py to support Nvidia TX2 (#13939)
Summary:
add platform.machine() == 'aarch64' for supporting Nvidia TX2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13939

Differential Revision: D13055834

Pulled By: soumith

fbshipit-source-id: 0fadc87adf9e6b796978ce743e824eb98b006856
2018-11-13 20:10:35 -08:00
79ec5de3fc Add some more files to gitignore. (#13924)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13924

Differential Revision: D13047983

Pulled By: ezyang

fbshipit-source-id: bb2a8aa747d0c8195084c650006518df2a00daab
2018-11-13 19:02:57 -08:00
c3680e2b19 Fix sum() on fp16 (#13926)
Summary:
The size of the shared and global memory buffers were incorrect for float16.
They were sized based on float16 elements, but the buffers store intermediate
float32 values.

Fixes #13909
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13926

Differential Revision: D13048334

Pulled By: colesbury

fbshipit-source-id: 5a07df53f1152d5920258e91ed3f1e1de89b29e1
2018-11-13 16:50:36 -08:00
3002cb2ad0 Revert D13007266: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4
Differential Revision:
D13007266

Original commit changeset: a9f0427a11db

fbshipit-source-id: c23bb511bb26108405b7e8622377fc18573d4311
2018-11-13 16:44:33 -08:00
76d8979afe Revert D13007287: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 3/4
Differential Revision:
D13007287

Original commit changeset: c89a24458e04

fbshipit-source-id: 74d3fe310f1f551e2f52c6e3d9a744a47767b4b1
2018-11-13 16:41:53 -08:00
fbd50bbfb9 Revert D13007246: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 1/4
Differential Revision:
D13007246

Original commit changeset: 230de42a3843

fbshipit-source-id: 40ce266826f00d320f7215169188ef4ead232660
2018-11-13 16:41:52 -08:00
30676bdcd3 Finish up TODOs in python printer (#13879)
Summary:
* Correctly adds annotate when needed for lists
* Parser/Emitter handles octal escapes so we do not fail for some strings.
* more complete keyword list in pretty printer
* floating point numbers are always printed with a decimal to ensure
  we never mistake them in parsing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13879

Differential Revision: D13037860

Pulled By: zdevito

fbshipit-source-id: f09ab174fc33402a429b21a5bfaf72e15c802cad
2018-11-13 16:39:46 -08:00
8311bbee7f Fix Windows build and test in CI (#11716)
Summary:
This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC.

ebetica ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716

Reviewed By: orionr

Differential Revision: D13038253

Pulled By: goldsborough

fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8
2018-11-13 16:35:54 -08:00
f649d8b3a9 add floordiv and bitwise ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13873

Reviewed By: driazati, wanchaol

Differential Revision: D13033709

Pulled By: eellison

fbshipit-source-id: df7edee0f790038fb2a806d20640ad25c70b50eb
2018-11-13 16:32:22 -08:00
7c1fe17288 fix UnpackSegments cuda op (#13917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13917

There is a bug in UnpackSegments cuda op when setting "max_length".

  "buck test mode/opt //caffe2/caffe2/python/operator_test:pack_ops_test -- test_pack_with_max_length_ops"
 fails on trunk.

This diff fixed this bug.

Reviewed By: xianjiec

Differential Revision: D13045106

fbshipit-source-id: 4d640d61405bb86326dc33c81145824060cf987e
2018-11-13 15:38:58 -08:00
cd49afce64 Allow attaching additional net info when supplying the benchmark net (#13820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13820

We would like to provide an option to show additional info of the net to be benchmarked.

Reviewed By: highker, rdzhabarov

Differential Revision: D13018219

fbshipit-source-id: d3ec69901bdae58117a482ddd2c327b0f8cf7cb6
2018-11-13 15:08:25 -08:00
23e19ebfa7 add non expotential emphasis loss to Lambdarank
Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function.

Reviewed By: itomatik

Differential Revision: D9891514

fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8
2018-11-13 14:54:04 -08:00
dfa4767754 Update nccl submodule to latest (#13921)
Summary:
This should include fix to the issue: https://github.com/NVIDIA/nccl/issues/153
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13921

Differential Revision: D13048999

Pulled By: teng-li

fbshipit-source-id: a83f3bbb004f4a4137d187a010c7ec6b48f27eeb
2018-11-13 14:22:39 -08:00
c46dd5163f Temporarily disable part of test_spectral_norm (#13908)
Summary:
See #13818 for suggestions about a long-term fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908

Differential Revision: D13047262

Pulled By: colesbury

fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed
2018-11-13 14:19:16 -08:00
5163a28917 Convert more weak functions (#13707)
Summary:
Convert some more functions to match up with features added. Some
conversions were unsuccessful but the type line was left in for later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707

Differential Revision: D13030210

Pulled By: driazati

fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc
2018-11-13 13:50:57 -08:00
53bc5fb043 Support nn.Sequential in script (#13889)
Summary:
This PR makes weak modules in `nn.Sequential` get properly compiled
when used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13889

Differential Revision: D13039559

Pulled By: driazati

fbshipit-source-id: d3266305f0e206b2a19b63230ac2ab8f02faa603
2018-11-13 13:48:58 -08:00
5cfccd76e6 Jit load error msg (#13894)
Summary:
When loading a non-existant / non-openeable file, the current error message is
```
Expected to read 8 bytes but got %llu bytes0
```

This
- fixes two ASSERTM formatting calls (including the above),
- throws a more specific error message if the ifstream constructor sets `.fail`.

Here is someone apparently confused by the current message: https://github.com/facebookresearch/maskrcnn-benchmark/pull/138#issuecomment-437848307
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13894

Differential Revision: D13043228

Pulled By: soumith

fbshipit-source-id: b348b482c66d5e420874ae6e101b834106b89e82
2018-11-13 12:33:31 -08:00
283062f574 Tensor construction: combine Resize+mutable_data - 2/4 (#13852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13007266

fbshipit-source-id: a9f0427a11dbe084a30837aa32da67c9302cbc6c
2018-11-13 12:28:35 -08:00
e030ee8197 Tensor construction: combine Resize+mutable_data - 3/4 (#13854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13007287

fbshipit-source-id: c89a24458e0428485402b3eb23519a92804d768e
2018-11-13 12:28:33 -08:00
9d36c37bdb Tensor construction: combine Resize+mutable_data - 1/4 (#13853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13853

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13007246

fbshipit-source-id: 230de42a3843d71599e812d5511f52f3af47f59b
2018-11-13 12:26:02 -08:00
96a01f82d1 Remove unnecessary include (#13878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13878

This removes a dependency to another header to simplify moving this header to c10.
Also fix some include paths to prepare that move

Reviewed By: ezyang

Differential Revision: D13036478

fbshipit-source-id: cbddb5281498256fddcbebce61aa606c51b7b8d7
2018-11-13 12:18:28 -08:00
60a85857dd s/CAFFE_ENFORCE_WITH_CALLER/AT_ASSERTM/ (#13829)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC sinkingsugar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13829

Differential Revision: D13019452

Pulled By: ezyang

fbshipit-source-id: cf8b58b25a484720d9a612df6dd591c91af6f45a
2018-11-13 11:24:51 -08:00
561bc09026 Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13844

In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%).

We've fixed this in Caffe2 D9601217, and we should do the same to ATen as well.

Reviewed By: ezyang

Differential Revision: D13025486

fbshipit-source-id: 04f4f0d9af6287b0400ca1842fb2cdac1f8cdb70
2018-11-13 11:17:16 -08:00
0d2762e876 Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876)
Summary:
Some of our arch people (mkolod, Aditya Agrawal, kevinstephano) notified me that the sequence number annotations weren't showing up for forward methods of custom autograd functions, which was breaking their nvprof dump parsing.  Two one-line fixes in the appropriate code paths.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13876

Differential Revision: D13042381

Pulled By: ezyang

fbshipit-source-id: a114118f5c07ad4ba482e7a4892d08805b23c65b
2018-11-13 11:10:32 -08:00
266bb8bf30 FeedTensor returns a Tensor (#13641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641

FeedTensor function used to take a pointer to Tensor and feed the content using Resize
and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead.

Reviewed By: ezyang

Differential Revision: D12873145

fbshipit-source-id: 653735c20d611ff6ac9e380d8b3c721cb396a28f
2018-11-13 10:50:32 -08:00
98b450deb9 Clean optional undefined tensor syntax in ATen yaml files and codegen (#13871)
Summary:
Previously there're multiple undefined tensor syntax exists in ATen definition files, this PR make all follows the same "?" syntax
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13871

Differential Revision: D13033486

Pulled By: wanchaol

fbshipit-source-id: 7673bc22d08cd6975503deb51fba47ada6bc5156
2018-11-13 10:37:42 -08:00
Jie
bbc7412615 (#13765)
Summary:
fix cuda native batch norm for small feature planes.
  1. fixed warp reduction divergent call of WARP_SHFL_XOR, causes hang with CUDA_ARCH > 7.0
  2. split Normalization.cu into two files for code reuse, preparation for sync BN
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13765

Differential Revision: D13043331

Pulled By: soumith

fbshipit-source-id: bf8565bff6ba782475ad0e4be37ea53c8052eadf
2018-11-13 10:14:37 -08:00
8559fcf791 Unpin Sphinx. (#13831)
Summary:
Sphinx 1.8.2 is released, per https://github.com/sphinx-doc/sphinx/issues/5419

Fixes #11618

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13831

Differential Revision: D13020339

Pulled By: ezyang

fbshipit-source-id: 4c7f3aff172efd3aca54ef48ac9052989cce5e4c
2018-11-13 09:45:12 -08:00
f6e4fc071a Fix a bug that causes nvcc to emit an unknown option error (#13904)
Summary:
Using `"-Xcompiler -fPIC"` causes nvcc to emit the following:

    nvcc fatal   : Unknown option 'Xcompiler -fPIC'

As per fixes lower down in the file (see also issue #7126 on GitHub),
the fix is to replace it with `"-Xcompiler" "-fPIC"`. This one was
apparently missed when the original fix was applied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13904

Differential Revision: D13043189

Pulled By: soumith

fbshipit-source-id: 6dc6d325671e4d08cd8e6242ffc93b3bd1f65351
2018-11-13 09:41:44 -08:00
f112aa746a Fix document about torch.get_default_dtype() (#13890)
Summary:
Minor fix.
```
torch.get_default_dtype() → :class:`torch.dtype`
```
→
```
torch.get_default_dtype() → torch.dtype
```
:class: is not rendered in https://pytorch.org/docs/stable/torch.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13890

Differential Revision: D13040704

Pulled By: colesbury

fbshipit-source-id: 5fadb01ad365042d5df2bac058f4ae89b281d3b7
2018-11-13 09:25:32 -08:00
a83a1544b1 Move device_guard from _th_ functions to the wrapper. (#13842)
Summary:
This is what we would want to check in anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13842

Differential Revision: D13025463

Pulled By: gchanan

fbshipit-source-id: d1ff9b10f4adc811bbd3db15b440ed00c16c82d1
2018-11-13 08:03:36 -08:00
e43fb1d26d Fix cuda out of memory test (#13864)
Summary:
torch.randn(big_number_here, dtype=torch.int8) is wrong because randn
isn't implemented for torch.int8. I've changed it to use torch.empty
instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13864

Differential Revision: D13032130

Pulled By: zou3519

fbshipit-source-id: d157b651b47b8bd736f3895cc242f07de4c1ea12
2018-11-13 07:30:30 -08:00
7f002008f1 remove ShouldFp32FallbackToNCHW (#13814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13814

D10333829 implemented 3D conv in NHWC in fp32 ops so int8 ops don't need special handling anymore.

Reviewed By: hx89

Differential Revision: D13017666

fbshipit-source-id: 41df449f5e21c4c7134cc5c480e559f8c247069b
2018-11-13 00:52:41 -08:00
a7eee0a1e9 Add Reshape if there is add_axis when exporting C2 concat (#13798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13798

The semantics of C2 and ONNX Concat is a bit different. C2 concat accepts "add_axis" arg and will raise the dim if so. It's equivalent of attaching a Reshape after plain concat in ONNX.

Reviewed By: rdzhabarov

Differential Revision: D13012867

fbshipit-source-id: da23e555bae709fd2a373b04dcb9db4e984ae315
2018-11-12 22:27:49 -08:00
a17c0118a5 fix stability in bce with pos_weight formula (#13863)
Summary:
Fixes #13773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863

Differential Revision: D13031803

Pulled By: ailzhang

fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f
2018-11-12 22:04:24 -08:00
0bfbdcac89 fix bug in D13017777
Summary:
Mistakenly created an infinite recursive call.

(Note: this ignores all push blocking failures!)

Reviewed By: jianyuh

Differential Revision: D13038053

fbshipit-source-id: 8b760cb73b5369647d8ef651b8c196ac3f7af04d
2018-11-12 21:57:31 -08:00
ce48958606 enable more unit tests (#13166)
Summary:
This enables the distributions and utils test sets for ROCm.
Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit.

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166

Differential Revision: D12814759

Pulled By: bddppq

fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e
2018-11-12 18:49:52 -08:00
cec3455a8b Add gitignore item for YCM config
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13805

Reviewed By: yinghai

Differential Revision: D13031332

Pulled By: bddppq

fbshipit-source-id: 279b7bb8879e49eef8abed51dc30b4b7ea0a2fa9
2018-11-12 16:58:56 -08:00
1600649792 Fix for nightly builds (#13779)
Summary:
Being tested on nightlies manually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13779

Reviewed By: yinghai

Differential Revision: D13001930

Pulled By: pjh5

fbshipit-source-id: 954eaabe052914b7b23c74e922666bf9dbfb630a
2018-11-12 16:38:14 -08:00
b052fe6c2f Upgrade DLPack
Summary: Needed to use TVM

Reviewed By: ajtulloch

Differential Revision: D12994038

fbshipit-source-id: f0b6c48a43a87fac37fcef73b78026d8384cd022
2018-11-12 15:59:46 -08:00
8480fe0105 Fix up creation of unique data nodes
Summary:
There was a bug in the uniqueness check that only made the first run
unique

Reviewed By: duc0

Differential Revision: D13013504

fbshipit-source-id: ecf7526d0fafd7968f1301734123f93968efef46
2018-11-12 15:37:08 -08:00
03c0f4fbe7 Use RNG mutex for randperm on CPU (#13832)
Summary:
When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in https://github.com/facebookresearch/maskrcnn-benchmark/pull/93#issuecomment-435479043. This PR fixes the bug.

Closes https://github.com/pytorch/pytorch/issues/1868.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13832

Differential Revision: D13025453

Pulled By: yf225

fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d
2018-11-12 15:27:41 -08:00
fc79f70f9a CircleCI: Add Linux CUDA 10 build (#13858)
Summary:
Moving CUDA 10 build to CircleCI so that we have one less job running on Jenkins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13858

Differential Revision: D13031916

Pulled By: yf225

fbshipit-source-id: 57aa54941d7f529e7094c8d037b836ec2fb6191c
2018-11-12 15:07:34 -08:00
8de9564c12 Fix gcc-7 build in caffe2/caffe2/quantization/server/activation_distribution_observer.cc (#13799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13799

Fix broken operator=

Reviewed By: jspark1105

Differential Revision: D13014333

fbshipit-source-id: 6075906ecf0735bd9a74d57108036a33e1575df8
2018-11-12 14:52:51 -08:00
f1a2bc4eae Corrected python lib path on windows to be consistent with Linux (#13848)
Summary:
The python lib path on Windows was set to an incorrect path. This fixes it to be consistent with Linux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13848

Differential Revision: D13030945

Pulled By: soumith

fbshipit-source-id: 7fb9013ffe66cff98018aea25fdb5cda03cbceb1
2018-11-12 14:39:55 -08:00
53a3c46950 Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (#12899)
Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)

2) CentOS 7.5 docker (ROCm 279)

* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.

PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280

For attention: bddppq ezyang

Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899

Differential Revision: D13029424

Pulled By: bddppq

fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
2018-11-12 14:39:54 -08:00
1caa341c68 Add torch.multiprocessing.spawn docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13846

Differential Revision: D13029595

Pulled By: pietern

fbshipit-source-id: b733b00f7070c18535c31801f20e6e717eec7748
2018-11-12 14:39:52 -08:00
1a0cb08918 allow Node::isAfter to work across blocks (#13855)
Summary:
Extend `isAfter` to work for nodes in different blocks. This is useful if we want to ask a question like "are any of the uses of value `v` after this node", since uses may be inside inner blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13855

Differential Revision: D13030528

Pulled By: suo

fbshipit-source-id: f681405396f3ec68eec1a2cb92e40873921a4b78
2018-11-12 14:39:50 -08:00
75bf877534 Preventing error where ninja build files are overwritten when invokin… (#13698)
Summary:
…g clean and build together
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13698

Differential Revision: D13030905

Pulled By: soumith

fbshipit-source-id: 234576ac92e0aa8c2d2409958d3cf85eb29ed1f3
2018-11-12 14:39:48 -08:00
686e83223f add ops between float & int, and change list equality output to be a boolean
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13793

Reviewed By: wanchaol

Differential Revision: D13010872

Pulled By: eellison

fbshipit-source-id: 2c8248f30b51eab1a87290711f99b7ceb6df2009
2018-11-12 14:39:47 -08:00
e3839dfc35 Add matplotlib to docs/requirements.txt (#13828)
Summary:
Used in docs/source/scripts/build_activation_images.py.

Don't know if we need a specific version. I installed the latest version (3.0.2) and that works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13828

Differential Revision: D13030294

Pulled By: pietern

fbshipit-source-id: b4e7b381182036645924453a1e2abb719090bbc4
2018-11-12 13:43:07 -08:00
5bf14c23b7 Bump Caffe2 docker images to version 230
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13857

Differential Revision: D13029637

Pulled By: bddppq

fbshipit-source-id: 73c4a0f3d39257a2312b36c9dd55dc001067d9c4
2018-11-12 13:26:23 -08:00
309cc76469 BaseType:: -> this-> (#13817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13817

gcc7 doesn't like BaseType::func<..>() . Should use this->func<...>()

Reviewed By: hx89

Differential Revision: D13017777

fbshipit-source-id: 0cf68d459b44379b1c103cf74382857db9a91bef
2018-11-12 12:51:12 -08:00
6093f29409 Update coverage info (#13788)
Summary:
Right now we dont have coverage info of how many pytorch operators can be exported to onnx. This pr will add torch.nn operators to it, while later functional modules will be added as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13788

Differential Revision: D13010448

Pulled By: zrphercule

fbshipit-source-id: 19349cabaeff42fda3620bb494f7ec4360d96b76
2018-11-12 12:39:12 -08:00
d8f35c42be nomnigraph - easy - support blob renaming (#13845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13845

Support renaming a blob in nomnigraph

Reviewed By: itomatik

Differential Revision: D13026762

fbshipit-source-id: fc8cecb4562a6c618ce5c8e2ff79a2a282a8ff09
2018-11-12 12:32:10 -08:00
0c375571f5 Support OptionalType export and type match (#13647)
Summary:
* Adds `OptionalType` support for import/export
    * Optionals get exported along with their contained type, i.e. 'Optional[int]'
* Allows concrete types and `None` to be passed to an op that takes an optional
* Converts `softmax`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647

Differential Revision: D12954672

Pulled By: driazati

fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a
2018-11-12 12:15:25 -08:00
bf00008aa1 Use SmallVector for TensorImpl sizes and strides. (#13649)
Summary:
This removes dynamic allocations for sizes/strides for tensors with <= 5
dims. This should cover the most common tensor use cases; we use a lot
of 4D tensors in images (N, C, H, W) and LSTMs use tensors with 3 or fewer dims.

Benchmarking results can be found here:
https://gist.github.com/zou3519/ce4182722ae7e2a228bc8b57ae60b0e9
The quick summary is that this PR:
- makes aten LSTM's forward pass ~1ms faster and improves JIT lstm perf
  as well
- Tensor as_strided is now 200ns faster for dimensions <= 5
- at::empty performance is 200ns slower for dimensions > 5. For dims <= 5,
  there is no noticeable perf change.
- Variable ops are 200-500ns faster because Variables never used their
  sizes/strides fields in the first place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13649

Differential Revision: D12950409

Pulled By: zou3519

fbshipit-source-id: 0bd87ec9f712ddc0d533a347d781e3a91a954b90
2018-11-12 10:40:32 -08:00
aef9e76283 Get pretty printer ready for use as a serialization format (#13616)
Summary:
Get pretty printer ready for use as a serialization format

This PR adds a bunch of functionality to the pretty printer (now called python_printer to reflect
the fact that it will be used to output valid python source). The idea is to get the printer
ready for use as serialization format.  This PR does not have tests beyond what the pretty
printer already had. PRs stacked on this one will do round-trip export/import to test this functionality more robustly.

Notes:
* PythonPrinter is an evolution of the original pretty printer. However, much of it has changed so it is best just to
  read it as a new implementation. Trying to correlate it to the original implementation is probably not much help.
* The printer tries to get reasonably close to how the original function was likely written, such as
  writing expressions rather than making intermediates when possible. We may decide to turn this off
  for the actual serialization, but it is useful for pretty printing.
* tensor field access was changed so that prim::device and family have schema
* fixed a bug in the compiler where setUniqueName gets called even when a value already has one.
  this sometimes assigned really poor names to graph inputs
* Graph::insert gains an optional range argument to make range-preserving inserts easier.
* prim:: ops that can have schema now have schema. This is because when we parse them back in,
  we will need the schema to correctly set their output types.
* there is code in the python printer to complain if you try to add a prim op and do not update the printer.
* BuiltinModule is generalized to take an operator namespace and a version number for work in future commits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13616

Reviewed By: goldsborough

Differential Revision: D13008252

Pulled By: zdevito

fbshipit-source-id: 32b33bc6410d6ca1c6f02bd6e050f8d5eea32083
2018-11-12 10:21:30 -08:00
b7a7ab364b Improve mm / addmm error message with sparse tensors (#13796)
Summary:
and write derivatives in terms of native functions.

This is the same as https://github.com/pytorch/pytorch/pull/13648 but has a fix for the canonicalize op jit pass to propagate shape information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13796

Reviewed By: ezyang

Differential Revision: D13012281

Pulled By: gchanan

fbshipit-source-id: 88d0d91e72b5967c51ff865350fcbdd7ffed92ef
2018-11-12 07:16:47 -08:00
8752214fb7 Apply weight-decay before momentum in the SGD optimizer. (#13801)
Summary:
While trying to understand why two implementations of the same model, one in Python, one using the C++ api (via some [ocaml wrappers](https://github.com/LaurentMazare/ocaml-torch)) did not perform equally well, I noticed that the Python and C++ implementation of SGD slightly differ on weight decay.

- In the [Python version](https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L91-L93) weight decay is applied *before* momentum (and so momentum applies to the weight decay).
- In the C++ implementation the weight decay is applied *after* momentum.

In the couple computer-vision models I have looked at the Python version performs a little better so this PR tweaks the C++ implementation to perform weight-decay *before* momentum. This is possibly caused by having more regularization - maybe increasing the weight decay while keeping the current code would hold the same improvements however a nice advantage of this change is to put the C++ and Python version in line. After this change my Python and C++/ocaml models performed similarly when using the same weight-decay parameter.

Maybe there was some real reason to have weight decay after momentum in the C++ version but I haven't found any.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13801

Differential Revision: D13020709

Pulled By: soumith

fbshipit-source-id: 7c2ac245577dd04bc3728aec4af0477120a60f13
2018-11-11 23:54:50 -08:00
7e8572be2d Change method-only _th_ prefix Declarations to functions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13754

Reviewed By: ezyang

Differential Revision: D12988489

Pulled By: gchanan

fbshipit-source-id: b62bb9288f67d72320925c36283f6ce6cbf95d20
2018-11-11 15:47:06 -08:00
003f97cefa fc layer accept axis argument (#13822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13822

as title

Reviewed By: xianjiec

Differential Revision: D12996338

fbshipit-source-id: 1aa61e71e2d79535325ea7034c82e1cb6bf3a9f6
2018-11-11 13:44:57 -08:00
e35418b3be New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342

This PR introduces a few new concepts:

- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
  provide a generic interface for interfacing with device and stream state,
  without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
  and dynamically dispatched device guard implementations.  Dynamic
  dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
  from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
  but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
  devices.
- Optional variants of all the aforementioned guards, which are a no-op if
  no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
  a device on every guard.

There are some subtle semantic changes, which have been thoroughly documented
in the class definition.

BC-breaking changes:

- Move constructor/assignment have been removed from all device guard
  implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
  'reset_device', because if you switch devices/device types, the stream/device on the
  previous device is unset.  This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams.  Use CUDAStreamGuard
  or CUDAMultiStreamGuard as appropriate for your use case.

Reviewed By: dzhulgakov

Differential Revision: D12849620

fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
2018-11-11 12:11:10 -08:00
4b86a215ca moving simd adagrad code to perfkernels (#13549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13549

caffe2/perfkernels has a nice framework to switch btw implementations optimized for different instructions at runtime.
This can be a good preparation to implement avx512 adagrad kernels.

Reviewed By: hyuen

Differential Revision: D12882872

fbshipit-source-id: a8f0419f6a9fd4e9b864c454dad0a80db267190c
2018-11-11 00:20:39 -08:00
d97ac82bf5 Back out "Revert D12967258: Support more data types in ONNXIFI transform" (#13812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13812

Original commit changeset: 2cf95bdc5ed8

Looks like in iOS, `uint64_t` is not the same as `size_t`. :( Fixed it here.

Reviewed By: houseroad

Differential Revision: D13017390

fbshipit-source-id: d33854ce341225aba372fb945c3704edc14f9411
2018-11-10 20:00:34 -08:00
786f9ba6ea Remove potential infinite loop from test_c10d.py (#13816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13816

If common.find_free_port() returns the same port over and over again,
and the TCPStore fails to bind to it over and over again, this
function has the potential to loop forever. If we can't find a free
port after 10 tries, we are safe to assume something is wrong...

Differential Revision: D13017700

fbshipit-source-id: 2139a0ea0f30ce08b5571f80ae0551f1fa7ba4a2
2018-11-10 17:58:13 -08:00
c3603301d7 Fix race condition in TCPStoreDaemon initialization (#13815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13815

If the TCPStoreDaemon was constructed and destructed shortly after, it
was possible for the controlPipeFd_ to get initialized by the
background thread after the stop() function was already called. Then,
the destructor hangs on waiting for the thread to terminate, when the
termination signal (closing the write side of the control pipe) will
never happen.

Differential Revision: D13017697

fbshipit-source-id: 9528286fbfc773237990f1a666605d27bac2c0e5
2018-11-10 17:54:21 -08:00
4c3b76c402 Add std::string to the getTypePtr for JIT inference of custom op types (#13683)
Summary:
This allows custom ops to take string parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13683

Differential Revision: D13017010

Pulled By: soumith

fbshipit-source-id: 7c40aca7f57ba3f8812d34bc55828ff362c69bd2
2018-11-10 12:58:53 -08:00
7c02f285dc Revert D12967258: Support more data types in ONNXIFI transform
Differential Revision:
D12967258

Original commit changeset: 688076e6f504

fbshipit-source-id: 2cf95bdc5ed8f1e13646bc5cf8139bdc516861d7
2018-11-10 12:34:31 -08:00
5923d76f96 Support more data types in ONNXIFI transform (#13745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13745

We need to support types beside `int64` and `float`.

Reviewed By: bddppq, rdzhabarov

Differential Revision: D12967258

fbshipit-source-id: 688076e6f504b2bf24bba89714df87a678c5638a
2018-11-10 10:41:01 -08:00
c85463fc74 Allow Gather to handle empty data (#13781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13781

allow Gather Op to handle empty data.

Reviewed By: intermilan

Differential Revision: D13001267

fbshipit-source-id: 633c8471b637c56be8f6574f9bf9430785073977
2018-11-10 10:00:47 -08:00
4f622c26b9 fix ffs intrinsic for long long (ROCm 290) (#13804)
Summary:
* Switch to __ffsll in Embedding which is the correct intrinsic here.
* Fix WARP_BALLOT and ffsll in LookupTable as well.

Fix comes from iotamudelta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13804

Differential Revision: D13016184

Pulled By: bddppq

fbshipit-source-id: 2287a78ee9e592630336a073ad1e55a90e1f946d
2018-11-10 02:02:43 -08:00
d02781a2ef Make InterpresterStateImpl a intrusive_ptr_target (#13784)
Summary:
InterpresterStateImpl con continue its lifecycle by increment the ref
count itself. This patch also removes InterpresterState::clone()
interface that conflicts with intrusive_ptr_target that disallows copy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13784

Differential Revision: D13015451

Pulled By: highker

fbshipit-source-id: a05f1ea6549d52ec693ccffefaa4d520b2474b8c
2018-11-09 23:39:18 -08:00
079e86a915 schematize some prim ops (#13790)
Summary:
We're relying on the default function schema (which contains no argument information) in places where we don't need to. This is bad because alias analysis will be very conservative when it doesn't have schema information present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13790

Differential Revision: D13009185

Pulled By: suo

fbshipit-source-id: 023516937bd3dcae8a969185a89c55f38d691ba5
2018-11-09 15:50:29 -08:00
e552c04d53 Add proper comment for dispatch_to (#13783)
Summary:
Add proper comment to the fix in https://github.com/pytorch/pytorch/pull/13700
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13783

Differential Revision: D13009956

Pulled By: wanchaol

fbshipit-source-id: 34f5259204dab12f4159ab191e7b08e2f5226292
2018-11-09 15:48:15 -08:00
7b2fb012a8 Make potrs batched (#13453)
Summary:
- This is a straightforward PR, building up on the batch inverse PR, except for one change:
  - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty.

Billing of changes:
- Add batching for `potrs`
- Add relevant tests
- Modify doc string

Minor changes:
- Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`.
- Add test for CUDA `potrs` (2D Tensor op)
- Move the batched shape checking to `LinearAlgebraUtils.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453

Reviewed By: soumith

Differential Revision: D12942039

Pulled By: zou3519

fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35
2018-11-09 15:16:26 -08:00
e3e6ca1102 operator serialized test coverage summary document (#13703)
Summary:
Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators.

Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703

Reviewed By: dzhulgakov

Differential Revision: D12970810

Pulled By: ajyu

fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1
2018-11-09 15:04:08 -08:00
014ea1e1f8 Improve CUDA out-of-memory error message (#13751)
Summary:
```
The new error message now looks like (from Python):

  RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 7.33 GiB free; 179.00 KiB cached)

Summary of terms:

  "total capacity": total global memory on GPU
  "already allocated": memory allocated by the program using the
                       caching allocator
  "free": free memory as reported by the CUDA API
  "cached": memory held by the allocator but not used by the program

  The "allocated" amount  does not include memory allocated outside
  of the caching allocator, such as memory allocated by other programs
  or memory held by the driver.

  The sum of "allocated" + "free" + "cached" may be less than the
  total capacity due to memory held by the driver and usage by other
  programs.

  Note that at this point cuda_malloc_retry has already returned all
  possible "cached" memory to the driver. The only remaining "cached"
  memory is split from a larger block that is partially in-use.
```

This also fixes an issue where on out-of-memory could cause an unrelated subsequent CUDA kernel launch to fail because `cudaGetLastError()` was not cleared.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13751

Differential Revision: D13007177

Pulled By: colesbury

fbshipit-source-id: ea7121461b3f2a34646102959b45bde19f2fabab
2018-11-09 14:33:28 -08:00
ae7c6bcfcf Make c10 buildable by itself. (#13742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13742

Along the way, I switch us to globbing directories by hand,
so we don't actually pick up generated cpp files in c10/build
(if you're doing the normal idiom for a cmake build.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: dzhulgakov

Differential Revision: D12988039

fbshipit-source-id: 08b7ec50cfef82b767b4ca9972e5ba65bc45bcbb
2018-11-09 13:40:39 -08:00
09369fa9d7 Fix clang_tidy.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13776

Differential Revision: D13002845

Pulled By: goldsborough

fbshipit-source-id: 7b019a032680796cbb04f733b31749ef7c6abe54
2018-11-09 11:46:50 -08:00
79ceecec8e Optional undefined tensor support (#13650)
Summary:
This PR is a part of task to unblock standard library export.
* we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue.
* Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy.
* It also tries to export bilinear as an example of undefined tensor(None) input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650

Differential Revision: D12967026

Pulled By: wanchaol

fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a
2018-11-09 11:29:57 -08:00
607094c4bf fix null-pointer-use in reshape_op.h
Summary:
UndefinedBehaviorSanitizer: null-pointer-use ../fbcode/third-party-buck/gcc-5-glibc-2.23/build/libgcc/include/c++/5.5.0/bits/stl_vector.h:794:16
```
Here we take the address of the first element in the empty vector. Fix the error by guarding against empty source.

Reviewed By: pixelb

Differential Revision: D12989957

fbshipit-source-id: ac5ec366385df835b546bd1756e30cd762f13a7a
2018-11-09 10:07:04 -08:00
107e067654 Move IdWrapper to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13687

Reviewed By: ezyang

Differential Revision: D12912238

fbshipit-source-id: f7a37de52cd3b3c45b3b0e9eeb29dff624fa0258
2018-11-09 10:02:45 -08:00
332a7db35e Use MNIST dataset in C++ integration test (#13737)
Summary:
We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests.

ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737

Differential Revision: D12990936

Pulled By: goldsborough

fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477
2018-11-09 09:55:02 -08:00
a63ef1d605 Suggest git submodule update --init --recursive (#13769)
Summary:
We now have submodules that have submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13769

Reviewed By: soumith

Differential Revision: D13000203

Pulled By: SsnL

fbshipit-source-id: 63c0c19c6c9d25ae3bf255a2421a82ca68278866
2018-11-09 08:41:44 -08:00
a1b2f1710d Remove _th_is_contiguous, make is_set_to a function, not a method.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13725

Differential Revision: D12980246

Pulled By: gchanan

fbshipit-source-id: e5c5742a67e5a25062df736e28b44c133a635ca8
2018-11-09 07:02:38 -08:00
10a1534c43 Remove _th methods that also have a function. (#13721)
Summary:
There's no reason we need these as the native function wrapper calls into the function anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13721

Differential Revision: D12977449

Pulled By: gchanan

fbshipit-source-id: 54701ebe2f0bb2b55484cb437501c626e6471347
2018-11-09 06:57:20 -08:00
9ffabcfcaa Use nested variant of getValueTrace to allow more flexible tracing script modules (#13597)
Summary:
When tracing scripted functions, we used to only allow Tensor arguments.
This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing
tuples).

Fixes: #13566
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597

Differential Revision: D12990464

Pulled By: soumith

fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c
2018-11-09 06:24:02 -08:00
dca3c2c60f Save and execute futures in a task queue (#13212)
Summary:
Upon calling wait(), save the forked thread and the current thread to a
task queue. A idling thread (which currently is single threaded) should
pick a ready task and run till there is nothing in the task queue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13212

Differential Revision: D12884522

Pulled By: highker

fbshipit-source-id: b3942a0ee63c148e05f5f41bdc73007fa3c3368e
2018-11-09 01:46:35 -08:00
4484f67b47 Revert D10203439: [pytorch][PR] Fix batch norm multiplier init
Differential Revision:
D10203439

Original commit changeset: 999cc134a45e

fbshipit-source-id: 7871e384063db2f3788169338e9c965d5f8ac351
2018-11-09 00:37:05 -08:00
26751ce300 Fix the improper use of windows-native slashes (#13220)
Summary:
Trying to fix #12510.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13220

Differential Revision: D12994483

Pulled By: soumith

fbshipit-source-id: adbaf7e7a0a7cd1fc3ec947ddb209b55a9cda2a6
2018-11-08 21:09:44 -08:00
44fb23a2f5 Add ability to annotate jit types inside function (#13752)
Summary:
This adds torch.jit.annotate for annotating the type of an intermediate.
This is Py2/3 compatible, e.g.:

```
from torch.jit import annotate
from typing import List

torch.jit.script
def foo():
  a = annotate(List[int], [])
```

This is needed to output valid python programs from our IR. It removes
the need for the empty list constructors.

A future patch can add support to the C++ parser and Python 3,
via desugaring:

```
a : int = b
a = anntoate(int, b)
```

But this functionality is not required for serialization so is not added in this patch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13752

Differential Revision: D12989885

Pulled By: zdevito

fbshipit-source-id: 161573a7352094543dc0d33a892f2a3b9103d847
2018-11-08 20:25:00 -08:00
5ae3b44255 Added HIP top_k operator (#13747)
Summary:
This PR contains changes for:
1. Adding HIP top_k operator in Caffe2
2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils
3. Removing the top_k operator test from ROCm test ignore list
4. Bug fixes in related code in THC/THCAsmUtils.cuh

Differential Revision: D12986451

Pulled By: bddppq

fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504
2018-11-08 20:14:53 -08:00
32b3fe8ce6 CircleCI: enable OSX jobs again (#13731)
Summary:
CircleCI now offers 60x OSX concurrency, which is 2x of what we currently have in Jenkins. This should help alleviate the OSX CI wait time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13731

Differential Revision: D12993737

Pulled By: yf225

fbshipit-source-id: f475ad9a1d031eda95b7cacdaf52f31fbb2f4f93
2018-11-08 20:09:05 -08:00
2ee4ef5290 Change all namespace fbgemm2 in the new fbgemm2 to namespace fbgemm (#13740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13740

We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”:

This DIFF changes all namespace fbgemm2 to namespace fbgemm.

The purpose is to avoid the confusion of "fbgemm2" when we release our FBGEMM open source.

Reviewed By: jspark1105

Differential Revision: D12850449

fbshipit-source-id: 08cc47864b157e36fbceddb7a10bf26218c67bd8
2018-11-08 19:59:12 -08:00
55964abb11 Change all namespace fbgemm in the old fbgemm to namespace fbgemm0 (#13701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13701

We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”:

This DIFF changes all namespace fbgemm to namespace fbgemm0.

Reviewed By: jspark1105

Differential Revision: D12848727

fbshipit-source-id: 47935e9e2c4714a7ce1bfc3f7e4d6a334130132e
2018-11-08 19:59:10 -08:00
a8e303dc46 change USE_MKLDNN default from ON (from #13303) to OFF for ppc64le (#13759)
Summary:
MKLDNN is not supported on ppc64le change USE_MKLDNN to OFF for ppc64le
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13759

Differential Revision: D12993121

Pulled By: soumith

fbshipit-source-id: 539d5cfcff2c03b59fa71e10b52fac333a64c381
2018-11-08 19:33:39 -08:00
dd3f52fbe6 Remove _th_ndimension, which doesn't actually do anything. (#13723)
Summary:
Tensor.ndimension is hardcoded.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13723

Reviewed By: ezyang

Differential Revision: D12979461

Pulled By: gchanan

fbshipit-source-id: b95251b74a7b96ebcce2331f847873216968124d
2018-11-08 19:29:59 -08:00
c9be135bb9 Fix batch norm multiplier init (#12325)
Summary:
Fixes #12259
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12325

Differential Revision: D10203439

Pulled By: SsnL

fbshipit-source-id: 999cc134a45e2554313adb7eb93ee98e1f84335f
2018-11-08 19:00:00 -08:00
42001e7c17 Fix clang-tidy for Python2 (#13735)
Summary:
`clang_tidy.py` doesn't run with Python2 right now. Needs a minor fix

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13735

Differential Revision: D12990613

Pulled By: goldsborough

fbshipit-source-id: ad19b229a14188fd048dde198a7f4c3483aeff95
2018-11-08 17:57:08 -08:00
89b54229b1 Make _th_unfold and _th_view into functions, from methods.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13724

Reviewed By: ezyang

Differential Revision: D12979865

Pulled By: gchanan

fbshipit-source-id: 92462198f3c51664f7973c142956774d88d831ca
2018-11-08 16:36:55 -08:00
00e752a46e Move cpu copy to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13347

Reviewed By: ezyang

Differential Revision: D12850691

fbshipit-source-id: d72577efb0ccb6df69e33f0c0a94c9f71937ccf8
2018-11-08 15:56:41 -08:00
51f58f0990 Fix typo in CTC loss doc comments. (#13727)
Summary:
`target_lenghts` -> `target_lengths`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13727

Differential Revision: D12981582

Pulled By: zou3519

fbshipit-source-id: e5e02b26cf3030a91494655ff863273333cc4133
2018-11-08 14:50:48 -08:00
bff931a10d implement concatenation of sparse tensors (#13577)
Summary:
With this change applied, `torch.cat` works for sparse tensors.

The algorithm is just to concatenate the values, and give the new values the proper indices (which will be the same as their old indices in every dimension except the catted dimension, and their old indices plus the sum of the size of every previous tensor in the catted dimension).

This is my first time contributing to PyTorch so please feel free to tell me if this approach seems totally wrong.

Coming next: `torch.stack` for sparse tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13577

Differential Revision: D12980948

Pulled By: umanwizard

fbshipit-source-id: 51ebdafee7fcd56d9762dcae9ebe5b4ab8e1dd6b
2018-11-08 14:15:30 -08:00
65ff84b49e Catch error by reference in module.cpp (#13743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13743

"catch by reference, throw by value"

Catching polymorpic type std::bad_weak_ptr was an error earlier.

Reviewed By: goldsborough

Differential Revision: D12982626

fbshipit-source-id: 0ff22c0352acc7a94078ce6d5b2a4e56fee75be5
2018-11-08 13:49:21 -08:00
8a5869a3f7 Move function_schema to aten/core (#13729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13729

final move to expose function_schema to caffe2

Differential Revision: D12981563

fbshipit-source-id: e4f7fa611a2498a96c27dfa8bfd18e10ad781c10
2018-11-08 13:28:37 -08:00
85bde3801b Tracer now records Python variable names (#13441)
Summary:
This is probably slow but it should make the traces more understandable and make debugging easier. Any suggestions for how to make it faster (i.e. make it so we don't have to traverse all of locals() and globals()) would be appreciated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13441

Differential Revision: D12879763

Pulled By: jamesr66a

fbshipit-source-id: b84133dc2ef9ca6cfbfaf2e3f9106784cc42951e
2018-11-08 13:08:42 -08:00
64a910bac7 Remove unnecessary tools/ qualification. (#13706)
Summary:
H/t kalisp for pointing it out

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13706

Differential Revision: D12983983

Pulled By: ezyang

fbshipit-source-id: 6a43cdde142fe64550121b16716f206e7c4d68d6
2018-11-08 12:55:19 -08:00
4fadf571fd handle flat rolling (no dim specified) T36264909 (#13588)
Summary:
update roll to behave as in numpy.roll when dimension to roll not specified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588

Differential Revision: D12964295

Pulled By: nairbv

fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1
2018-11-08 12:39:35 -08:00
59d021b63a Fix nn threshold test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13734

Differential Revision: D12983358

Pulled By: driazati

fbshipit-source-id: 6db30b8bbc8e34c6e01f678724dfca9555a86177
2018-11-08 12:31:39 -08:00
0a090fe60a Fix torch.dist for infinity, zero and minus infinity norms (#13713)
Summary: Fixes #13559

Differential Revision: D12981556

Pulled By: zou3519

fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d
2018-11-08 12:03:07 -08:00
a92ff57a4d update range doc (#13730)
Summary:
Update range documentation to show that we don't support start or increment parameters
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13730

Differential Revision: D12982016

Pulled By: eellison

fbshipit-source-id: cc1462fc1af547ae80c6d3b87999b7528bade8af
2018-11-08 11:40:52 -08:00
869ef71343 AsyncNet: option for time based tracing and trace path (#13440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13440

Time based tracing is easier to look at when multiple nets are running asynchronously.
This diff also adds an option to change the path to dump trace files.

Reviewed By: aazzolini, ilia-cher

Differential Revision: D12479259

fbshipit-source-id: 94d379634ba7b90c111c92b1136ffa4226b8bb8c
2018-11-08 11:34:34 -08:00
556ff8e7b7 Add builtins for size() and list with defaults (#13639)
Summary:
* `aten::size()` to match `torch.Tensor.size`
* `aten::list_with_default` for semantics of `torch.nn.modules.utils.list_with_default`
* converts `adaptive_avg_pool2d` and `adaptive_avg_pool3d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13639

Differential Revision: D12954670

Pulled By: driazati

fbshipit-source-id: 68c30af0efc02c60af5fb8c9715b2435cc01a0d9
2018-11-08 11:26:35 -08:00
d01cb70497 build with mkl-dnn by default (#13303)
Summary:
build with mkl-dnn by default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13303

Reviewed By: yinghai

Differential Revision: D12979633

Pulled By: orionr

fbshipit-source-id: 00d23fa27c0d13e82f7e5acb3ebd00ed7ba1d5dc
2018-11-08 11:18:27 -08:00
8581d3ec67 Allow blacklist ops in onnxifi transform
Differential Revision: D12945523

fbshipit-source-id: cf5055652591bd1dd8d4be92b7fd6a40a0764536
2018-11-08 09:59:03 -08:00
fd9aaa6b79 Fix linking errors on Windows (#13100)
Summary:
1. Removes the flag "/FORCE:UNRESOLVED" that shouldn't be used.
2. Fix the code logic for ONNX_BUILD_MAIN_LIBS on Windows
3. Add a patch for protobuf using CMake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13100

Differential Revision: D12978950

Pulled By: orionr

fbshipit-source-id: db9eb8136acf5712cfb5a24ed228b7934d873331
2018-11-08 09:54:09 -08:00
3e877a70e3 Enable unused-private-field warning (#13450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13450

Pull Request resolved: https://github.com/facebook/react-native/pull/22065

This diff enables -Wunused-private-field clang warning for Android builds and fixes all broken targets.

Reviewed By: gkmhub

Differential Revision: D12881793

fbshipit-source-id: 515555661e137be9e7b20eac9b5bdcb549d6a094
2018-11-08 09:23:11 -08:00
df022f8078 Disable CopyFrom src with uninitialized storage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12692

Reviewed By: li-roy, dzhulgakov

Differential Revision: D10392295

fbshipit-source-id: 3a37173b03e76862ec421e0b6d0b0e322b2749b5
2018-11-08 07:45:42 -08:00
4472ad3b2f Move functional _Reduction to its own module (#13401)
Summary:
To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script

Depends on #13484 for saving rng state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401

Differential Revision: D12868501

Pulled By: driazati

fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27
2018-11-08 01:04:10 -08:00
de41d1ae0b Enable junk fill for the default CPU allocator (#13377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377

* Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false.
* Fix tests to use 64-bit counters that IterOp and LearningRateOp demands.
* Fix kernels that uses uninitialized memory.

Reviewed By: salexspb

Differential Revision: D10866512

fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821
2018-11-08 00:02:37 -08:00
21991c05a9 Support assignment to subscripted lhs expr (#13486)
Summary:
Support things like `foo[0] = bar` in script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13486

Differential Revision: D12964550

Pulled By: suo

fbshipit-source-id: 3dda8ffd683d1b045787c65bfa0c7d43b0455658
2018-11-07 23:07:57 -08:00
411d89ca64 Fix the bug in dispatch_to when calling cpu() (#13700)
Summary:
When we added to in #13146, we did not emit the cast correctly in one of the dispatch overloads, then when we call .cpu(), the dtype will always be the default float type, which is wrong.

CC jamesr66a eellison
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13700

Differential Revision: D12968699

Pulled By: wanchaol

fbshipit-source-id: c1aaf2bf6a163643ce5360797da61c68271d8bf8
2018-11-07 22:57:35 -08:00
90ea61800f operators/quantized/server -> quantization/server (#13660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13660

Any change in server side quantized operator was triggering ios-sanity-check with more than 5 hours testing time. I suspect this was because the operator code was synced with xplat directory. This diff moves server side quantized operators to caffe2/caffe2/quantization/server to avoid this issue.

Reviewed By: hx89

Differential Revision: D12955420

fbshipit-source-id: b6c824b9de5e2a696f8c748e1b2c77d81d46746b
2018-11-07 22:54:13 -08:00
2448a83d30 Give broadcast_coalesced tensors different version counters (#13594)
Summary:
In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain.

Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370

edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594

Differential Revision: D12967311

Pulled By: SsnL

fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5
2018-11-07 21:49:35 -08:00
5dd153b1c2 speed up torch.sparse_mask() cpu kernel (#13290)
Summary:
- `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()`
- previous `sparse_mask(D, S)` cpu kernel is not parallelized
- this PR speed up the cpu kernel for two separated cases:
  - `D.dim == S.sparse_dim`: simply parallelize the kernel
  - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation
- performance:

`D.dim == S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz)
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)

>>> %timeit D.sparse_mask(S)

======= before change =======
6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

======= after change =======
333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

`D.dim > S.sparse_dim`
```
>>> nnz = 100000
>>> dims = [1000, 1000, 2, 2]
>>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
               torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, dims[2], dims[3])
>>> size = torch.Size(dims)

>>> S = torch.sparse_coo_tensor(I, V, size).coalesce()
>>> D = torch.randn(dims)
%timeit D.sparse_mask(S)

======= before change =======
495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

======= after change =======
594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290

Differential Revision: D12878336

Pulled By: weiyangfb

fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37
2018-11-07 20:02:17 -08:00
6bfce16873 fix flip() shape bug in CPU (#13344)
Summary:
- a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing
- this PR brings in `filp()` CUDA implementation for CPU kernel
- with this change:
```
>>> t = torch.randn(1, 3, 4, 5)
>> t.flip(1, 3).shape
torch.Size([1, 3, 4, 5])
```
- performance:
```
====== with this PR ======
>>> a = torch.randn(1000, 1000)
>>> %timeit -r 100 a.flip(0, 1)
1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each)

====== Perf at previous PR #7873 ======
100 loops, best of 3: 11 ms per loop
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344

Differential Revision: D12968003

Pulled By: weiyangfb

fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d
2018-11-07 19:53:49 -08:00
1616587540 Redo jit/type and utils/functional to ATen/core (#13455)
Summary:
This is a redo of the previous move which broke OS X and Windows tests -- RTTI seemed to be broken
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13455

Differential Revision: D12883775

Pulled By: bwasti

fbshipit-source-id: 2b6c65e8150e6f89624c6ee99c389335c6fb4bb8
2018-11-07 18:11:29 -08:00
87b47ff850 Remove .data() use in C++ frontend (#13675)
Summary:
Removes the last uses of `.data()` in implementation code of the C++ frontend.

CC yf225

ezyang ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13675

Differential Revision: D12966061

Pulled By: goldsborough

fbshipit-source-id: fbc0c83c3ba56598ff853bc7b1ddf9005fdd9c41
2018-11-07 17:30:29 -08:00
eb88098e11 Kill c10d/private/CUDAUtils.hpp (#13681)
Summary:
Use AT_CUDA_CHECK instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13681

Differential Revision: D12966607

Pulled By: teng-li

fbshipit-source-id: da0431f588969791a19519368edb909b9c3dc5ab
2018-11-07 17:09:08 -08:00
c8bb665b5d Fix a bug in tuple assignment (#13656)
Summary:
Previously, we did not distinguish between `a = b` (simple assignment),
and `a, = b` (tuple destructuring of a singleton tuple).

The second case would fail in the string frontend, and would not unpack
in the python frontend. This patch fixes both issues and also cleans up
the error reporting for unexpected expressions on the LHS.

Will likely conflict with #13486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13656

Differential Revision: D12964566

Pulled By: zdevito

fbshipit-source-id: 992b19e5068aef59a78cd23cb0e59a9eeb7755d1
2018-11-07 16:44:22 -08:00
9900a8dd89 Remove outdated css and font files in html docs (#13699)
Summary:
The stylesheet at docs/source/_static/css/pytorch_theme.css is no longer necessary for the html docs build. The new html docs theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme.

The Lato font is also no longer used in the new theme.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13699

Differential Revision: D12967448

Pulled By: soumith

fbshipit-source-id: 7de205162a61e3acacfd8b499660d328ff3812ec
2018-11-07 16:31:28 -08:00
7978ba45ba Update path in CI script to access ninja (#13646)
Summary:
We weren't running C++ extensions tests in CI.
Also, let's error hard when `ninja` is not available instead of skipping C++ extensions tests.

Fixes https://github.com/pytorch/pytorch/issues/13622

ezyang soumith yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13646

Differential Revision: D12961468

Pulled By: goldsborough

fbshipit-source-id: 917c8a14063dc40e6ab79a0f7d345ae2d3566ba4
2018-11-07 14:31:29 -08:00
bf9b5dffbf ensure flake8 ignores non-conforming python files generated by build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13680

Differential Revision: D12964332

Pulled By: nairbv

fbshipit-source-id: a28358c265fd305f5f8cf893d25d34d6b5929210
2018-11-07 14:27:41 -08:00
d4f9dbfa66 Remove catch check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13677

Differential Revision: D12961992

Pulled By: goldsborough

fbshipit-source-id: 1f0207704d05ac67ed1ec1502bec617c845d9f79
2018-11-07 12:27:15 -08:00
dceec1de30 Distributed Data Parallel documentation for PT1 release (#13657)
Summary:
This should fix https://github.com/pytorch/pytorch/issues/12604

Make html and look through the html pages to make sure that everything looks good
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13657

Reviewed By: calebho

Differential Revision: D12954250

Pulled By: teng-li

fbshipit-source-id: 40e1925ec0cdce5e6a1d8ba29537937da8ef9194
2018-11-07 12:11:57 -08:00
216c5d0bdc caching packed matrix (#13626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13626

Reuse pack matrix of weights.

Reviewed By: dskhudia

Differential Revision: D12916630

fbshipit-source-id: f0ec5734f5506134a79d9c0601146488e15c3afe
2018-11-07 12:03:39 -08:00
94fe8faa00 new QNNPACK dwconv support and tests (#13652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13652

new dwconv 3x3 5x5 tests provided

Reviewed By: Maratyszcza

Differential Revision: D12951866

fbshipit-source-id: f853bb7412a724de594ed36c6b2b69ec268d6464
2018-11-07 12:03:35 -08:00
1413dd4bfc Added the finer bucketing option for DDP (#13607)
Summary:
We only need this for backward, for FWD cast, the non-fine-grained bucketing should be better since it's sequential anyway.

Test should be covered all by c10d test, reduced bucket size to make bucketing happen in c10d test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13607

Differential Revision: D12944515

Pulled By: teng-li

fbshipit-source-id: d982e8dca2874c91d39b30b73a85bfbeb768c508
2018-11-07 12:00:55 -08:00
044d00516c Rename DistBackend -> Backend (#11830)
Summary:
Also add docs for get_backend, Backend, and reduce_op

fixes #11803

cc The controller you requested could not be found. pietern apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11830

Differential Revision: D9927991

Pulled By: SsnL

fbshipit-source-id: a2ffb70826241ba84264f36f2cb173e00b19af48
2018-11-07 11:58:12 -08:00
afc7dbd586 Hipify caffe2/utils/math_gpu.cu (#13521)
Summary:
This PR adds caffe2/utils/math_gpu.cu to pyHipify

bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13521

Differential Revision: D12954843

Pulled By: bddppq

fbshipit-source-id: a2bf367da07e49cb7807ba6876b42d0733fc8205
2018-11-07 11:34:15 -08:00
0f59dcb317 Remove partially initialized Tensor + CopyFrom (#13629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13629

Previously we have a Tensor which has a initialized storage(therefore a known device_type) and
then we'll call CopyFrom on it to initialize the sizes and data.

We want to eliminate partially initialized Tensor by replacing the pattern of calling CopyFrom with a partially initialized Tensor with either splitting that to undefined Tensor + initialization API(1)(3) or combine all the initialization in the same step(2).

1. member variable initialization + CopyFrom
Previously we have a tensor that is initialized with device_type, and then use CopyFrom to populate the content, now we remove the partial initialization by make the original member variable an undefined Tensor and use ReinitializeFrom to copy from another Tensor.

2. Output + CopyFrom
Previously, we first get a tensor with device_type, and then CopyFrom another Tensor,
We changed it two combining these two operations into OperatorBase::OutputTensor.

3. Output + custom functions
Example can be found in TransformGPU function.
In this case we move the part that initializes the tensor outside of the function, and do that explicitly outside so that we could reuse the Output functions to make a fully initialized Tensor.

Note that to keep the original semantics, both of the APIs has a caching effect based on device_type, which means we only create a Tensor object when device_type does not match or the Tensor is undefined, otherwise, we will reuse the original Tensor object.

Reviewed By: dzhulgakov

Differential Revision: D12848855

fbshipit-source-id: 37bb4ddc1698ebea533b73006eeb1218faa8ddf8
2018-11-07 11:31:03 -08:00
6c8ac50753 Fix exception catching to catch c10::Error properly (#13665)
Summary:
In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace.
c10::Error are subclass of `std::exception` and not `std::runtime_error`.

I removed `runtime_error` in all places in our code and replaced them with `const exception`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665

Differential Revision: D12958396

Pulled By: soumith

fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465
2018-11-07 11:22:48 -08:00
674e23bbab Fixed a small error in docstrings for ConvTranspose3d (#13668)
Summary:
In the example for ConvTranspose3d, the docstring had "Conv3d" instead of "ConvTranspose3d" in one instance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13668

Differential Revision: D12958372

Pulled By: soumith

fbshipit-source-id: 5ec901e20b90f4eed2bf04c5b417183ec2096447
2018-11-07 11:22:46 -08:00
2fe9e3a207 Remove catch from caffe2/.gitmodules
Summary: Step 3 to remove catch submodule from PyTorch

Reviewed By: ezyang

Differential Revision: D12959020

fbshipit-source-id: 49347de8b027433d422b653dd854ad76349d0e25
2018-11-07 11:10:09 -08:00
e7652cfb40 Remove caffe2/submodules/catch-rev.txt
Summary: Step 1 to remove catch submodule from PyTorch

Reviewed By: ezyang

Differential Revision: D12958997

fbshipit-source-id: ab4b9e103ac83ad490375440722f95247eb1ac7f
2018-11-07 11:10:07 -08:00
ab0c72ab6f Replace cursors with OrderedDict (#13427)
Summary:
This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly,  using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere.

For this I did:

1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules.
2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too.
3. Deleted all uses of Cursor.
4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation.
5. Added many tests for the OrderedDict use in `nn::Module`.

ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427

Differential Revision: D12894092

Pulled By: goldsborough

fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d
2018-11-07 11:10:05 -08:00
b652c2de50 Rename dim(i) -> size(i)
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(dim(i)->size(i)): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12935287

fbshipit-source-id: 700050640c756d7064c8db4fd50fe6a1421a61ef
2018-11-07 11:07:26 -08:00
4326873330 Skip std and var tests in pytorch rocm CI (#13662)
Summary:
https://github.com/pytorch/pytorch/pull/13435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13662

Reviewed By: soumith

Differential Revision: D12958408

Pulled By: bddppq

fbshipit-source-id: 170b59769fbed149c9246b6549c62160e27d2404
2018-11-07 10:10:25 -08:00
9403eddce4 Fix tracing bug for custom ops (#13654)
Summary:
Due to a logic bug, tracing is broken for custom ops. Unfortunately, there also weren't any tests for tracing custom ops.

The fix is a single line change of moving `pop(stack, std::get<Is>(arguments)...);` before `node = getTracedNode<Is...>(schema, arguments);`. Other changes are added tests and improved commenting/formatting.

Fixes https://github.com/pytorch/pytorch/issues/13564

CC The controller you requested could not be found. fmassa

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13654

Differential Revision: D12952887

Pulled By: goldsborough

fbshipit-source-id: 87d256576f787c58e8d8f5c13a0fecd0ec62a602
2018-11-07 09:22:44 -08:00
edd2e38023 Clean up a couple of items in the C2 test scaffolding (WIP) (#7847)
Summary:
- Py3 compatibility
- utility functions refactoring
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7847

Reviewed By: pietern

Differential Revision: D9355096

Pulled By: huitseeker

fbshipit-source-id: 8e78faa937488c5299714f78075d7cadb1b2490c
2018-11-07 09:16:13 -08:00
10fdcf748a swap with empty vector to force deallocation (#13625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13625

v.clear() doesn't guarantee deallocation and it was causing memory capacity issues

Reviewed By: jianyuh

Differential Revision: D12941938

fbshipit-source-id: b9c80828b122a44e883b32f43b5d8dfb36065773
2018-11-07 08:33:34 -08:00
398d310bac changes for cumsum/cumprod backward not depending on TH. (#13570)
Summary:
This is a subset of https://github.com/pytorch/pytorch/pull/13467 which is failing with ASAN errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13570

Differential Revision: D12922619

Pulled By: gchanan

fbshipit-source-id: 007470243d8aee719ab9441abf29f06b4c84d59f
2018-11-07 07:45:33 -08:00
a228a95b94 Rename ndim() -> dim() - 1/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12935693

fbshipit-source-id: f24f1c10cd5bbb9e63cda0a0da989e6e3766380a
2018-11-07 07:30:11 -08:00
4794da03f8 Rename ndim() -> dim() - 4/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12935774

fbshipit-source-id: 2a7cb7da534da73b61f01eb0ff124abf193309ee
2018-11-07 07:30:09 -08:00
57ec8f111f Rename ndim() -> dim() - 6/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12935827

fbshipit-source-id: 80ecb034c243dbfd267b9f131cee9d7afd5ef063
2018-11-07 07:27:45 -08:00
e60a7c2c88 codemod tensor.type().is_cuda(), tensor.type().is_sparse() (#13590)
Summary:
Followup to #12841

Changed these to not require type dispatch:
tensor.type().is_cuda() -> tensor.is_cuda()
tensor.type().is_sparse() -> tensor.is_sparse()
isVariable(tensor.type()) -> tensor.is_variable()

This probably does not affect performance
very much in most cases but it is nice to have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13590

Reviewed By: ezyang

Differential Revision: D12929301

Pulled By: zou3519

fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7
2018-11-07 07:27:42 -08:00
e70321ed9e Remove unnecessary type dispatches from Variable::Impl ctor (#13630)
Summary:
This should improve the performance of wrapping a tensor in a Variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13630

Reviewed By: ezyang

Differential Revision: D12944960

Pulled By: zou3519

fbshipit-source-id: 89fa78a563e46a747d851a90ffd1b5cf3cd2d0d7
2018-11-07 07:27:40 -08:00
2ae8e46105 Rename ndim() -> dim() - 2/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12935727

fbshipit-source-id: a0c306c8f451a671b80db54fef5aa091ed58bfe5
2018-11-07 07:25:20 -08:00
7341ab0a33 Fix range of target examples and JIT test case for CTC loss.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13644

Differential Revision: D12949733

Pulled By: gchanan

fbshipit-source-id: 1c4cacbb6a50d5002165bdd0a7881883db5c8249
2018-11-07 07:04:31 -08:00
a132a7d9ce Add autodiff support for a few additional operators (#13288)
Summary:
Added aten::{avg_pool2d, log_softmax, max_pool2d_with_indices, threshold},
enabled aten::{expand, view}.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13288

Differential Revision: D12954929

Pulled By: soumith

fbshipit-source-id: 6fba58af82cafbc7446705d8c8145cdeaf4954ca
2018-11-06 23:24:12 -08:00
a1ba29a2c0 Change to use json format to store disabled_features in hipify (#13595)
Summary:
Since json is a builtin module in Python (>= 2.6), this makes pyhipify
can be invoked without installing any extra dependencies.

petrex iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13595

Differential Revision: D12931045

Pulled By: bddppq

fbshipit-source-id: 31d68fb6e730fd9d11593550ca531423cb0596e9
2018-11-06 22:06:10 -08:00
7d64c9df39 Remove C2GEMMContext (#13443)
Summary:
C2GEMMContext is a remnant of old times when Int8 ops used gemmlowp.
It is no longer needed: formerly gemmlowp-based ops use QNNPACK with pthreadpool interface, and other ops (Int8Add, Int8ChannelShuffle) use Caffe2 thread pool interface directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13443

Differential Revision: D12887773

Pulled By: Maratyszcza

fbshipit-source-id: bd2732e2c187b399c8a82efebdd244457720256b
2018-11-06 21:50:53 -08:00
dbc467545f Update weak script modules to match fns (#13631)
Summary:
Add weak modules for those that use weak script functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13631

Differential Revision: D12945328

Pulled By: driazati

fbshipit-source-id: 6cb235763bf5ab35c7b32e0f734f08d22418594f
2018-11-06 21:22:52 -08:00
14004cbef6 Native batch norm (#13263)
Summary:
- Move batch norm from TH(CU)NN to native
- Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode)
- It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision.

Compared to the ill-fated #12368
- I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own)
- I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`.
- I have merged master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263

Differential Revision: D12946254

Pulled By: SsnL

fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d
2018-11-06 20:05:54 -08:00
392ca1e59f Remove compileFunction (#13640)
Summary:
This finishes a TODO to get torch.jit.script to go through the same
pathway as methods, removing the need for forward_schema and
for compileFunction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13640

Differential Revision: D12949713

Pulled By: zdevito

fbshipit-source-id: 3d1a5f14910d97a68670a3fd416bdbfe457f621d
2018-11-06 19:37:06 -08:00
ce6edbfbd9 Fixed NCCL backend not being built (#13653)
Summary:
An regression caused by NCCL build refactoring earlier.

CC fmassa

Fixing: https://github.com/facebookresearch/maskrcnn-benchmark/issues/122
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13653

Differential Revision: D12952555

Pulled By: teng-li

fbshipit-source-id: b42e2a88fff83c9ddd58eeb33e933f1f59f51c52
2018-11-06 19:33:49 -08:00
2cd912bcc2 Fix more spectral norm bugs (#13350)
Summary:
Problems with SN and DP after #12671 :
1. in eval mode, `weight_orig` is not getting correct gradient #12737 .

    Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval.

2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong.

    Fix: Make `weight` not a buffer anymore and always calculate it as above.

3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.

    Fix: This PR clones `u` and `v` before using them.

To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.

cc The controller you requested could not be found. crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350

Differential Revision: D12931044

Pulled By: SsnL

fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef
2018-11-06 19:16:13 -08:00
eb29485ed8 Support custimzed timeout when fetching blob from KVStore (#13582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13582

Worker nodes sometimes witness timeout failures when getting session_id blob from Zeus, which due to delays in master node setting the blob.
This diff will add flexibility to specify longer timeout for getting blobs from Zeus.

Reviewed By: pietern

Differential Revision: D12926156

fbshipit-source-id: b1a4d1d9cf7de084785bfa4a8a0cd3cfd095ba5c
2018-11-06 18:54:56 -08:00
bc1de6ae7d CircleCI: disable output buffering to better locate test timeout (#13516)
Summary:
ASAN test timeout such as https://circleci.com/gh/pytorch/pytorch/165649?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link doesn't actually show where the timeout happened because of the bash output buffering. This PR turns off the buffering to better surface the error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13516

Differential Revision: D12952513

Pulled By: yf225

fbshipit-source-id: 48058c021470e5aa7a2246e1fcd974cfabf5df54
2018-11-06 18:14:26 -08:00
619c2f8b44 small fixes regarding docu of torch tensors (#13635)
Summary:
Removed duplicate doc args block.
Made statements involving 'each element' more precise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13635

Differential Revision: D12946987

Pulled By: soumith

fbshipit-source-id: a17da92f69086b530ff769cf4662ae29843fd188
2018-11-06 17:24:42 -08:00
508f676c50 Rename ndim() -> dim() - 5/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: salexspb

Differential Revision: D12935787

fbshipit-source-id: 303d71d3eb050789af2ab9575e5dcc48f6037086
2018-11-06 16:38:35 -08:00
6cf450744f propagate python op error msg (#13624)
Summary:
Correctly propagate the error msg from a python op to the JIT interpreter. In the interpreter we wrap the exception and re-throw it as a Runtime Exception. Potentially in a future diff we can throw the same type of python exception as was originally thrown.

Fix for https://github.com/pytorch/pytorch/issues/13560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13624

Differential Revision: D12948756

Pulled By: eellison

fbshipit-source-id: 94cdf4c376143c5e40dcb9716aefb3c1e2d957db
2018-11-06 16:28:39 -08:00
feff7be294 Remove RTTI from jit/type.h (#13591)
Summary:
RTTI can't be used on android so this is needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13591

Differential Revision: D12914402

Pulled By: bwasti

fbshipit-source-id: be8c8c679bb20c7faaa7e62cd92854cedc19cb3a
2018-11-06 16:19:52 -08:00
18de330e86 CMake integration for int8 server operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13558

Reviewed By: Maratyszcza

Differential Revision: D12945460

Pulled By: dskhudia

fbshipit-source-id: 1a91027b305fd6af77eebd9a4fad092a12f54712
2018-11-06 15:45:15 -08:00
76c1b5cd79 Fix overflow error in stats_put_ops
Summary:
I was hitting this error:

caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'

So, the assignment from int64_t to float loses some precision and because of that we overflow.

Reproduced this issue with this diff D12945013

Reviewed By: mlappelbaum, jdshi-fb

Differential Revision: D12927086

fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683
2018-11-06 15:41:51 -08:00
e73943e488 Remove partially initialized Tensor + ShareData (#13522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13522

Currently Tensor is a shared pointer to the underlying implementation, rather than a value, copying
the pointer will share the underlying TensorImpl, ShareData probably don't make sense anymore.

Reviewed By: dzhulgakov

Differential Revision: D12871708

fbshipit-source-id: d3773c66b7ed0bf1c37e886f69f59aec158b216b
2018-11-06 15:23:41 -08:00
Jie
fbe3c3f57f (#13435)
Summary:
Moved torch.var torch.std to use THC reduction kernel, this greatly improves performance for computing variance over non-contiguous dimensions.

Resolving #13192
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13435

Differential Revision: D12947137

Pulled By: soumith

fbshipit-source-id: c0a22cb799fa57e8fbed81c7dcb880666f461883
2018-11-06 14:42:26 -08:00
393ad6582d Use torch:: instead of at:: in all C++ APIs (#13523)
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.

Note that since we're just talking about typedefs, this change does not break any existing code.

Once this lands I will update stuff in `pytorch/tutorials` too.

zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523

Differential Revision: D12942787

Pulled By: goldsborough

fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
2018-11-06 14:32:25 -08:00
be424de869 Add torch.multiprocessing.spawn helper (#13518)
Summary:
This helper addresses a common pattern where one spawns N processes to
work on some common task (e.g. parallel preprocessing or multiple
training loops).

A straightforward approach is to use the multiprocessing API directly
and then consecutively call join on the resulting processes.

This pattern breaks down in the face of errors. If one of the
processes terminates with an exception or via some signal, and it is
not the first process that was launched, the join call on the first
process won't be affected. This helper seeks to solve this by waiting
on termination from any of the spawned processes. When any process
terminates with a non-zero exit status, it terminates the remaining
processes, and raises an exception in the parent process. If the
process terminated with an exception, it is propagated to the parent.
If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is
mentioned in the exception as well.

Requires Python >= 3.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518

Reviewed By: orionr

Differential Revision: D12929045

Pulled By: pietern

fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd
2018-11-06 14:08:37 -08:00
056f2cd238 ATen/test/basic.cpp: Catch2Gtest (#12142)
Summary:
In #11846 , we immigranted all catch tests in Aten/test/ to use gtest except of basic.cpp for a GPU bug (valgrind related).
In this PR, we will find out what the bug is, and immigrant last piece of aten catch to use gtest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12142

Differential Revision: D12946980

Pulled By: zrphercule

fbshipit-source-id: cf3b21f23ddec3e363ac8ec4bdeb4bc4fe35f83b
2018-11-06 14:00:18 -08:00
06bfabf1f5 add tests to no-gtest
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13637

Differential Revision: D12946644

Pulled By: suo

fbshipit-source-id: 161ddab275d5315fc053030d0f4956a4529602b1
2018-11-06 13:46:07 -08:00
137150be88 add unwrap optional operator (#13599)
Summary:
Add a builtin to refine the type of Optional[T] -> T. This is a short-term solution to unblock porting of the the standard library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13599

Reviewed By: driazati, wanchaol

Differential Revision: D12943193

Pulled By: eellison

fbshipit-source-id: 31c893a78d813313bbbc1d8212b5c04e403cfb4d
2018-11-06 11:54:56 -08:00
1906305c07 Consolidate argument checkers (#13623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13623

Moves the bulk of shared argument checkers in the gloo backend to Utils.hpp.

Reviewed By: teng-li

Differential Revision: D12934598

fbshipit-source-id: 7b80e67ccc3425f21498c30fbe7837af314f96f2
2018-11-06 11:52:38 -08:00
7ffa864953 Speed up tensor.options() by avoiding type dispatch (#13330)
Summary:
Also speeds up tensor.is_variable(), tensor.layout(), and
tensor.device(). This PR speeds up tensor.options() from 54ns to 17ns,
resulting in a comparable speedup in torch.as_strided performance:
https://gist.github.com/zou3519/7645262a4f89e237405857925bb872c3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13330

Differential Revision: D12847695

Pulled By: zou3519

fbshipit-source-id: 60b303671b0cce7b6140068c7f90c31d512643be
2018-11-06 11:39:28 -08:00
464dc31532 Add README to tools, delete defunct scripts. (#13621)
Summary:
Some extra documentation for other bits too.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13621

Differential Revision: D12943416

Pulled By: ezyang

fbshipit-source-id: c922995e420d38c2698ce59c5bf4ffa9eb68da83
2018-11-06 11:20:53 -08:00
6aee5488b5 correct omp dependency for mkl-dnn (#13449)
Summary:
The motivational of this PR is to enforce mkldnn to use the same omp version of caffe2 framework.
Meanwhile, do not change other assumptions within mkldnn.

Previously, the MKL_cmake_included is set in caffe2 in order to disable omp seeking in mkldnn.
But, with such change, mkldnn has no chance to adapt for mkl found by caffe2.
Then, some building flags of mkl will be not set in mkldnn.
For example, USE_MKL, USE_CBLAS, etc.

In this PR, we enforce set the MKLIOMP5LIB for mkldnn according to caffe2, and tell the mkl root path in MKLROOT for mkldnn. Then, mkldnn is built as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13449

Differential Revision: D12899504

Pulled By: yinghai

fbshipit-source-id: 22a196bd00b4ef0a11d350a32c049304613edf52
2018-11-06 10:48:09 -08:00
a7ee632dff Various Test and build fixes (#13556)
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed

Differential Revision: D12918456

Pulled By: soumith

fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
2018-11-06 07:13:47 -08:00
9ca9469de6 mm backwards to not depend on TH. (#13575)
Summary:
This is a subset of https://github.com/pytorch/pytorch/pull/13476.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13575

Differential Revision: D12923473

Pulled By: gchanan

fbshipit-source-id: 8787808d2ab377cc535f69c3c63dcd671c72b7db
2018-11-06 06:47:44 -08:00
3c1d593a27 cumsum/cumprod derivatives not depending on TH. (#13579)
Summary:
This is identical to https://github.com/pytorch/pytorch/pull/13467 but doesn't include the tests in common_invocations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13579

Differential Revision: D12925404

Pulled By: gchanan

fbshipit-source-id: 0a52fd26b15c7e0bbdfec03948f3e6c849e65091
2018-11-06 06:42:01 -08:00
95ca66763d Add math functions overloaded over different numeric types for cuda and hip (#13602)
Summary:
petrex ashishfarmer rohithkrn iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602

Reviewed By: dzhulgakov

Differential Revision: D12935797

Pulled By: bddppq

fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235
2018-11-06 01:40:31 -08:00
d03c6ba50d Adding Fetching Real number representation
Summary: Adding Fetching Real number representation for int8 tensor in workpace.py

Reviewed By: harouwu

Differential Revision: D12936556

fbshipit-source-id: f8756a37bce21c93d44d52faf5da9c9bd6473f4a
2018-11-05 23:35:24 -08:00
3c32f897ca Rename ndim() -> dim() - 3/6
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: dzhulgakov

Differential Revision: D12935748

fbshipit-source-id: fccec04e28ec049789f772e70d691382cb8927e0
2018-11-05 23:21:40 -08:00
Jie
bbacd859ab Updating heuristics for cudnn persistent RNN (#13612)
Summary:
modifying rnn heuristics to exclude GPU with sm == 7.5 from using perssistent RNN
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13612

Differential Revision: D12937455

Pulled By: soumith

fbshipit-source-id: 5cdaea083d55383b85dbe6e5443f1b36e578e4f5
2018-11-05 21:35:44 -08:00
fc6a9a19ea Add torch._C._nn built-in, more weak fns (#13322)
Summary:
This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script

NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR)

Converts these functions to weak script:
* `threshold`
* `relu`
* `hardtanh`
* `relu6`
* `elu`
* `selu`
* `celu`
* `leaky_relu`
* `rrelu`
* `tanh`
* `sigmoid`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322

Differential Revision: D12852203

Pulled By: driazati

fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809
2018-11-05 21:02:18 -08:00
10d67716db bump docker image to 262 (#13581)
Summary:
We updated valgrind version in our recent docker image.
https://github.com/pietern/pytorch-dockerfiles/pull/23
https://github.com/pytorch/ossci-job-dsl/pull/31
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13581

Reviewed By: goldsborough

Differential Revision: D12936485

Pulled By: zrphercule

fbshipit-source-id: 981532394b23e8d8ecfd6b2458ddf03710d5ac67
2018-11-05 20:43:39 -08:00
bad8235a3a Disabling NCCL coalesced bcast test since it hangs in CI (#13606)
Summary:
Functionality test shouldn't be affected since we have both backends testing for the same thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13606

Differential Revision: D12937185

Pulled By: teng-li

fbshipit-source-id: 03d897b6690f7932654fdb7d11a07016dfffa751
2018-11-05 20:34:15 -08:00
9ef98624b3 Don't allocate empty Storage/StorageImpl for Variable. (#13580)
Summary:
Variable owns a Tensor which already has a Storage/StorageImpl
if necessary. The Variable ctor was unnecessarily allocating *another*
Storage/StorageImpl, which costs around 200ns.

This PR gets rid of that behavior and cuts the `as_variable` time from
670ns to 475ns, reducing Variable overhead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13580

Differential Revision: D12925495

Pulled By: zou3519

fbshipit-source-id: 4f5ec33776baa848d1c318abcf40b57125b3bed7
2018-11-05 19:24:14 -08:00
02d3787a19 Support new upsample in symbolic, caffe2 backend & caffe2 frontend (#13272)
Summary:
We updated the description of upsample_op in onnx: https://github.com/onnx/onnx/pull/1467
Therefore, we need to support the new upsample_op in caffe2-onnx backend as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13272

Reviewed By: houseroad

Differential Revision: D12833656

Pulled By: zrphercule

fbshipit-source-id: 21af5282abaae12d2d044e4018a2b152aff79917
2018-11-05 19:13:57 -08:00
ebaabfbbd5 ReinitializeTensor function for refactoring Tensor as member variable (#13147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13147

We want to refactor
```
class A {

void func() {
  x_.Resize(dims);
  auto* data = x_.mutable_data<T>();
}

Tensor x(CPU);
};
```

to
```
class A {
void func() {
  ReinitializeTensor(&x_, dims, at::dtype<T>().device(CPU));
  auto* data = x_.mutable_data<T>();
}

Tensor x_; // Undefined Tensor
};
```

This diff adds the ReinitializeTensor function.

Reviewed By: dzhulgakov

Differential Revision: D10861298

fbshipit-source-id: 9f432297d07a4890e29bb68436364e0b2e2545e7
2018-11-05 19:13:55 -08:00
a340dce133 Replaces c10d's CUDAEvent with ATen's (#13464)
Summary:
This PR:

- Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files
- Updates c10d's usage of CUDAEvent to reflect the ATen API
- Updates c10d's usage of streams to reflect the ATen API
- Removes use of historic THCState in the touched c10d files
- (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case.

The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13464

Reviewed By: teng-li

Differential Revision: D12924291

Pulled By: pietern

fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0
2018-11-05 19:13:52 -08:00
e2272dd312 Remove ATen/README.md in favor of cppdocs/notes/tensor_basics.rst (#13601)
Summary:
Removes aten/README.md (and some other files dating from when aten was its own repo), and moves the not outdated documentation into a note called "Tensor Basics". I updated the text lightly but did not overhaul the content.

CC zdevito

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13601

Differential Revision: D12934480

Pulled By: goldsborough

fbshipit-source-id: 012a4267b4d6f27e4d5d55d6fc66363ddca10b41
2018-11-05 19:13:50 -08:00
af4a228426 Fix erase_number_type pass, negative indices in c2 and some onnx symbolics (#12888)
Summary:
The PR did two things:

1. fix the bug in erase_number_type on node inputs
2. handle negative indices for dim-reduce in caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12888

Reviewed By: houseroad

Differential Revision: D12833486

Pulled By: wanchaol

fbshipit-source-id: c3ceb400d91f0173b73ad95e392b010c3c14db7d
2018-11-05 19:13:49 -08:00
2398a3255e fbgemm submodule update (#13592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13592

submodule update for fbgemm

Reviewed By: jspark1105

Differential Revision: D12929740

fbshipit-source-id: 546e4d7042696ffc5b0ee7cabd236ec944d218e7
2018-11-05 17:39:20 -08:00
b1c57caaf9 Move flat_hash_map to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13527

Reviewed By: ezyang

Differential Revision: D12912239

fbshipit-source-id: bb44d3ff87c4ca94943ec2667acf1e7ce2b3c914
2018-11-05 17:39:18 -08:00
b7c9575c93 Move LeftRight to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13526

Reviewed By: ezyang

Differential Revision: D12912241

fbshipit-source-id: 70525a9b20daa8aae623d0cb4002acecc34b1932
2018-11-05 17:39:16 -08:00
8fafa7b6ac Remove size() from BatchDataset and templatize IndexType (#12960)
Summary:
This PR brings to changes to the recently landed C++ Frontend dataloader:

1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration.
2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility.

See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables.

Some additional minor updates and code movements too.

apaszke SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960

Differential Revision: D12893342

Pulled By: goldsborough

fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108
2018-11-05 17:13:09 -08:00
1969898647 Convert functional dropouts to weak script (#13484)
Summary:
To convert `nn.functional.dropout`
* `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions`
* rng state between calls in the tests needed to be made consistent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484

Differential Revision: D12929622

Pulled By: driazati

fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56
2018-11-05 17:13:07 -08:00
23e3a12d5e Add pass support to script (#13535)
Summary:
This PR adds basic support for `pass` statements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13535

Differential Revision: D12929529

Pulled By: driazati

fbshipit-source-id: 70c7c52630d46e76366c4caa875d6c5419a1e03f
2018-11-05 17:13:06 -08:00
df67d4180a Validate schema with no returns (#13525)
Summary:
If there is no return type then the returns of the schema are not
checked against the returns in the graph, so this PR adds an error if
that case is detected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13525

Differential Revision: D12929524

Pulled By: driazati

fbshipit-source-id: da562e979482393098830bbded26729a2499152a
2018-11-05 16:51:55 -08:00
7b9d755d88 Restructure torch/torch.h and extension.h (#13482)
Summary:
This PR restructures the public-facing C++ headers in a backwards compatible way. The problem right now is that the C++ extension header `torch/extension.h` does not include the C++ frontend headers from `torch/torch.h`. However, those C++ frontend headers can be convenient. Further, including the C++ frontend main header `torch/torch.h` in a C++ extension currently raises a warning because we want to move people away from exclusively including `torch/torch.h` in extensions (which was the correct thing 6 months ago), since that *used* to be the main C++ extension header but is now the main C++ frontend header. In short: it should be possible to include the C++ frontend functionality from `torch/torch.h`, but without including that header directly because it's deprecated for extensions.

For clarification: why is `torch/torch.h` deprecated for extensions? Because for extensions we need to include Python stuff, but for the C++ frontend we don't want this Python stuff. For now the python stuff is included in `torch/torch.h` whenever the header is used from a C++ extension (enabled by a macro passed by `cpp_extensions.py`) to not break existing users, but this should change in the future.

The overall fix is simple:

1. C++ frontend sub-headers move from `torch/torch.h` into `torch/all.h`.
2. `torch/all.h` is included in:
    1. `torch/torch.h`, as is.
    2. `torch/extensions.h`, to now also give C++ extension users this functionality.

With the next release we can then:
1. Remove the Python includes from `torch/torch.h`
2. Move C++-only sub-headers from `all.h` back into `torch.h`
3. Make `extension.h` include `torch.h` and `Python.h`

This will then break old C++ extensions that include `torch/torch.h`, since the correct header for C++ extensions is `torch/extension.h`.

I've also gone ahead and deprecated `torch::CPU` et al. since those are long due to die.

ezyang soumith apaszke fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13482

Differential Revision: D12924999

Pulled By: goldsborough

fbshipit-source-id: 5bb7bdc005fcb7b525195b769065176514efad8a
2018-11-05 16:46:52 -08:00
1b64c0f8fe Error msg on TCP backend (#13596)
Summary:
Clean it up from my queue:

https://github.com/pytorch/pytorch/issues/12721

```
>>> torch.distributed.init_process_group(backend="tcp")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 275, in init_process_group
    backend = DistBackend(backend)
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 55, in __new__
    raise ValueError("TCP backend has been deprecated. Please use "
ValueError: TCP backend has been deprecated. Please use Gloo or MPI backends for collective operations on CPU tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13596

Differential Revision: D12931196

Pulled By: teng-li

fbshipit-source-id: bb739b107ad7454e2e0a17430087161fedd4c392
2018-11-05 16:40:02 -08:00
74819087de Mixed precision DDP hang fix and fine-grained option for DDP perf (#13496)
Summary:
When go to mixed precision fp16 training, DDP randomly hangs.  Initially, I thought this smells like a similar NCCL bug I filed a while ago. It turns out it's not. Again, I am seeing different rank process has different size.  How could this even happen?

It turns out that take_tensors will generate a list of bucketed tensors in an un deterministic order, because, the key to the map is a pointer.  An interesting bug digging and fix.

Now fp16 DDP training should be fully working now.

Also, added another take_tensor fine grained helper that aims to improve the performance of DDP, making it a TODO to replace the DDP take_tensors with that.

Fixed: https://github.com/pytorch/pytorch/issues/12150
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13496

Differential Revision: D12920985

Pulled By: teng-li

fbshipit-source-id: 26f3edae7be45a80fa7b2410a2e5a1baab212d9c
2018-11-05 16:22:15 -08:00
84cfc28f23 Note on Tensor Creation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13517

Differential Revision: D12914271

Pulled By: goldsborough

fbshipit-source-id: df64fca6652525bc814f6fd3e486c87bff29b5b5
2018-11-05 16:10:58 -08:00
f6ff5d8934 Append parameters when checking graphs for TorchScript Methods (#13553)
Summary:
Also, add an assertion in the GraphExecutor to make sure we don't
access memory out of bounds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13553

Differential Revision: D12924796

Pulled By: soumith

fbshipit-source-id: ea2a134084538484178b8ebad33d6716a8e1d633
2018-11-05 16:07:36 -08:00
f3c197d6fa Add explicit c10:: namespace to converter (#13593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13593

Should fix up master

Reviewed By: orionr

Differential Revision: D12929779

fbshipit-source-id: 23119f5bf1d9f1e37e8ed01bfa2cc40647725390
2018-11-05 14:52:16 -08:00
7faca2a217 Add new style broadcast support in c10d/gloo (#13497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13497

This replaces the existing broadcast implementation with the new style collective call in the gloo backend. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU broadcast implementation.

Reviewed By: teng-li

Differential Revision: D12890013

fbshipit-source-id: 43f346fb2814f421bedc7babf89169703a46bb9c
2018-11-05 13:52:07 -08:00
d2f26a450e Add new style allreduce support in c10d/gloo (#13426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13426

This replaces the existing allreduce implementation with the new style collective call in the gloo backend. This is the first one to include both a CPU and a CUDA path. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU allreduce implementation. This is not much different from the current situation in the case where there is a single input tensor per call (which is the case when called from DistributedDataParallel).

Reviewed By: teng-li

Differential Revision: D12855689

fbshipit-source-id: 574281d762dd29149fa7f634fb71f8f6a9787598
2018-11-05 13:52:05 -08:00
d50dd47ccd Add reduce support in c10d/gloo (#13425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13425

This adds support for the new style reduce collective call in the gloo backend.

Reviewed By: teng-li

Differential Revision: D12869404

fbshipit-source-id: 93c641e6aba3b03c796bda80737547c565cfa571
2018-11-05 13:52:02 -08:00
8f0f97749c Add allgather support in c10d/gloo (#13424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13424

This adds support for the allgather collective call in the gloo backend. The gloo implementation does not support multiple inputs per rank (nor one or more outputs per rank), so we use a temporary flattened buffer and unflatten once the collective finishes.

Reviewed By: teng-li

Differential Revision: D12832009

fbshipit-source-id: 2f5c1934a338589cef1d3192bd92ada135fecd7a
2018-11-05 13:52:01 -08:00
75c2b34c86 Add gather support in c10d/gloo (#13423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13423

This adds support for the gather collective call in the gloo backend. The gloo implementation does not yet support the mode where the root has multiple output tensors (one per rank), so we use a temporary flattened buffer and unflatten on the root once the collective finishes.

Reviewed By: teng-li

Differential Revision: D12811647

fbshipit-source-id: 90fe8af8c390090b7d4ef43aa74f4e3e67ab9d0b
2018-11-05 13:51:59 -08:00
9cfe9418e6 Add scatter support in c10d/gloo (#13422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13422

This adds support for the scatter collective call in the gloo backend. This is the first of the new style collectives that do not expect to be created once and used many times. This commit contains some shortcuts to make this new style work side by side with the existing implementations (such as the std::tuple with nullptr's). These shortcuts are temporary until we have moved over all collectives to this new style.

Reviewed By: teng-li

Differential Revision: D12310219

fbshipit-source-id: 32e68717f819d5980f0e469d297204948351cefc
2018-11-05 13:51:57 -08:00
98f5c005da Speed up CPU threshold and relu implementation (#13182)
Summary:
```
The previous threshold implementation was not vectorized or parallelized.
This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms

CPU timings:
https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8

1 thread (before vs. after)
10240:  17.4 us vs. 6.9 µs per loop
102400: 141 us vs. 39.8 µs per loop

16 threads (before vs. after)
10240:  17.4 us vs. 6.7 µs per loop
102400: 141 us vs. 14.3 µs per loop

CUDA timings are not measurably different.

[1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions
https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182

Reviewed By: soumith

Differential Revision: D12825105

Pulled By: colesbury

fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15
2018-11-05 12:51:29 -08:00
b2127cfa9a Make the inception onnx test more stable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13563

Differential Revision: D12924968

Pulled By: houseroad

fbshipit-source-id: ba43c88aabee749cb1e1307a412eacda4b8870b0
2018-11-05 12:39:00 -08:00
5f514a483c Move Half.{h, cpp} and Half-inl.h to c10 (#13361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13361

att

Reviewed By: Yangqing

Differential Revision: D12853472

fbshipit-source-id: ad3b96cbc6904435553a6c9e58aa158ec77a2961
2018-11-05 12:32:12 -08:00
e06f92785c Move ATen/core/Macros.h to c10/macros/Macros.h
Summary:
EXT=h,cc,cpp,hpp,cxx,cu,cuh
d=caffe2/aten/
codemod -m -d $d --extensions $EXT 'AT_HOST_DEVICE' 'C10_HOST_DEVICE'
codemod -m -d $d --extensions $EXT 'AT_DEVICE' 'C10_DEVICE'
codemod -m -d $d --extensions $EXT 'AT_HOST' 'C10_HOST'
codemod -m -d $d --extensions $EXT 'AT_ANDROID' 'C10_ANDROID'
codemod -m -d $d --extensions $EXT 'AT_IOS' 'C10_IOS'
codemod -m -d $d --extensions $EXT 'AT_MOBILE' 'C10_MOBILE'
codemod -m -d $d --extensions $EXT 'ATen/core/Macros.h' 'c10/macros/Macros.h'
codemod -m -d $d --extensions $EXT 'HIP_HOST_DEVICE' 'C10_HIP_HOST_DEVICE'

Reviewed By: dzhulgakov

Differential Revision: D12851341

fbshipit-source-id: 7d540530ef779e16ddf2b4cdda9dcc85a61410c3
2018-11-05 12:32:11 -08:00
8c182cd89e Add overload of ProcessGroup.allreduce with list of tensors (#13576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13576

TSIA

Reviewed By: SsnL

Differential Revision: D12923457

fbshipit-source-id: 7824490548edbacac3cda81c7500bd1f851c6093
2018-11-05 11:56:49 -08:00
482b1366e6 Remove half_support.* (#13534)
Summary:
These two files are unused. I think at the time I moved the code into an inline extension (https://github.com/pytorch/pytorch/blob/master/test/test_cpp_extensions.py#L288) and forgot to delete the files.

soumith ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13534

Differential Revision: D12924365

Pulled By: goldsborough

fbshipit-source-id: 050dd7da267008ea58a5dcc8febee7d7e443bc3d
2018-11-05 10:04:21 -08:00
f0ed927b62 Add diag_embed to ATen and torch (#12447)
Summary:
Fixes: #12160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447

Differential Revision: D12916234

Pulled By: SsnL

fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da
2018-11-05 08:55:28 -08:00
07f8b61cc6 Roll operator t32802531 (#13261)
Summary:
Adding a roll operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261

Differential Revision: D12922575

Pulled By: nairbv

fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017
2018-11-05 08:33:36 -08:00
e7242cbaf2 Rename dim(i) -> size(i) - 1/2
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12896712

fbshipit-source-id: 909731691fab7799efbcfc3b5dcc9e531831c2d4
2018-11-05 07:27:04 -08:00
3ea64bd80b fbgemm submodule update (#13562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13562

submodule update for fbgemm. This version of fbgemm has cmake minimum required same as pytorch. Without this, OSS build fails.

Reviewed By: jianyuh

Differential Revision: D12920951

fbshipit-source-id: 9ef532e715e3f7612fecc8430736633cf6b17f34
2018-11-05 07:22:34 -08:00
e988dc621b Stop depending on static analysis of tensor types in graph fuser (#13387)
Summary:
Built on top of #13108, so please review only the last commit.

This makes the graph fuser ignore input types (device/scalar type) when considering graphs for fusion, making it much more robust to shape-prop failures. Those properties are now checked at run time, as part of the kernel validation. This should enable graph fusions in `jit_premul` and `jit_multilayer` timelines in our benchmarks.

One regression is that I've disabled fusions of comparison ops (and `type_as`). That's because there's really no good way to ensure that those are really valid, and are a source of bugs (I filed #13384).

cc ngimel mruberry zdevito zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13387

Differential Revision: D12888104

Pulled By: zou3519

fbshipit-source-id: c233ea599679c34ac70fb4d8b8497c60aad9e480
2018-11-05 06:32:08 -08:00
505f9b4d63 Add Int8BatchPermutation op in DNNLOWP (#13539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13539

This is used by OCR's FPN model to detect small/dense text. Just a simple
permutations along the batch dim based on the input indices, and we can avoid
the unnecessary quantize/dequantize ops.

Reviewed By: csummersea

Differential Revision: D12894055

fbshipit-source-id: d25639a5ffc2c490a0ee7ef307302eb2953c307e
2018-11-05 01:57:50 -08:00
54e8623d26 3D Conv in NHWC layout (#12733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733

Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D10333829

fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
2018-11-04 21:50:09 -08:00
274f3c0951 add explicit fpga context (#13318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13318

add a context to describe fpga

this will remove the need of having opencl with fpga engine

the next step is to change the opencl implementation to explicitly use the fpga context

Reviewed By: soumith

Differential Revision: D12828795

fbshipit-source-id: 0700a83672d117d7aa3d941cd39c2ae627cb6e5f
2018-11-04 21:47:45 -08:00
246d5282b3 fix handling of single input in gradcheck (#13543)
Summary:
Now gradcheck properly accept a single Tensor as input. It was almost supported already but not completely.
Should fix the confusion from #13540
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13543

Differential Revision: D12918526

Pulled By: soumith

fbshipit-source-id: a5bad69af0aea48c146f58df2482cabf91e24a01
2018-11-04 20:28:34 -08:00
fdf34c8da8 Kill more weird constructors on Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13433

Reviewed By: jerryzh168

Differential Revision: D12874599

fbshipit-source-id: 0c262fda72cbc4f3ea80df790cc8e95140bdc7e0
2018-11-04 16:54:49 -08:00
f000101b81 add a few comments on layout after im2col (#12429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12429

Comments to clarify layout after NHWC im2col for group convolution.

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D10233284

fbshipit-source-id: 996a69f2f932e02c978abaade7571b00741b6ae8
2018-11-04 11:02:58 -08:00
6b578cd388 update fbgemm submodule (#13547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13547

update fbgemm submodule

Reviewed By: jspark1105, jianyuh

Differential Revision: D12917297

fbshipit-source-id: ad9b2c7f119ca159af3826266b59ec26fc54911c
2018-11-04 09:15:17 -08:00
c1ed1b4779 Duplicate bias blobs shared by different conv ops to handle scale correctly (#13538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13538

In architectures such as FPN (https://arxiv.org/abs/1612.03144), few Conv
ops share the same weight and bias and are run at different scales of
the input. Since 'bias_scale = input_scale * weight_scale', sharing
the same bias blob among multiple Conv ops means that we need
different bias scale for each of the ops. To achieve this, we just
duplicate those bias blobs that are used by multiple Conv ops before performing
int8 rewrite.

Reviewed By: csummersea

Differential Revision: D12854062

fbshipit-source-id: 42a2951877819339b117f13f01816291a4fa6596
2018-11-04 04:15:28 -08:00
2a6850bf73 remove unnecessary files (#13537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13537

plot_hist.py and dnnlowp_fc_perf_comparison.py were not supposed to be in operators/quantized/server

Reviewed By: hx89

Differential Revision: D12916259

fbshipit-source-id: f5bc0c01a4924cad6f82eff624ba5f79becbea33
2018-11-04 01:01:28 -07:00
8be0efaa8c omit group conv NHWC test for HIP (#13554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554

D10233252 broke ROCM test.
We don't have group conv in NHWC for hip yet and this diff omits related tests.

Reviewed By: hyuen

Differential Revision: D12917880

fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8
2018-11-03 21:18:23 -07:00
9e432b593d Include caffe2 proto headers in pytorch package data (#13217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13217

Caffe2 proto headers are not included in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1180). However, they are required for building custom Caffe2 ops living outside PyTorch/Caffe2 repo (e.g. custom Detectron ops).

Reviewed By: pjh5

Differential Revision: D12815881

fbshipit-source-id: 4d1aaa6a69a2193247586e85e4244fbbdb3e8192
2018-11-03 16:19:39 -07:00
149afef5c4 Include lib subdir in caffe2 include dirs path (#13216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13216

Caffe2 headers are placed under `lib/include` in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1201). However, `CAFFE2_INCLUDE_DIRS` path is set to `"${_INSTALL_PREFIX}/include"` which does not exist in package data. This results in issues when trying to build custom Caffe2 ops living outside Caffe2/PyTorch repo (e.g. custom Detectron ops).

Reviewed By: pjh5

Differential Revision: D12815878

fbshipit-source-id: 7cb1b4a729f8242b7437e3f30dace3b9cf044144
2018-11-03 16:19:38 -07:00
d40b23e750 remove unused use_scratch argument from batch_matmul (#11745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11745

use_scratch was introduced in D5834868 but D8944686 refactored GemmStridedBatched and use_scratch is not used anywhere and not documented as far as I can tell.

Reviewed By: BIT-silence

Differential Revision: D9846488

fbshipit-source-id: 915d92aa57bc211888dfb09ad657f7c2b4f4b71c
2018-11-03 15:31:24 -07:00
2bc6a7a260 enable group conv test in NHWC layout in CPU (#12428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428

Group conv in NHWC layout was enabled in CPU after D7547497.
In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py .

Reviewed By: BIT-silence

Differential Revision: D10233252

fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde
2018-11-03 11:58:51 -07:00
2b280c6b74 minor build fixes for incremental builds (#13293)
Summary:
Workaround a cmake-ninja bug, which doesn't track the dependency between xxx-generated-xxx.cu and updating the timestamp of build.ninja, (the consequence being cmake is rerun on a next rebuild).
This was surfaced after analyzing the outputs of `ninja -d explain install`

Now, compared to https://github.com/pytorch/pytorch/pull/11487#issue-214450604 we're seeing:

```
python setup.py rebuild develop    # first time - ~1m 42s
python setup.py rebuild develop    # second time - ~12 s
```

This gets even faster if we replace the default linker with multithreaded linkers like `lld` or `gold`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13293

Differential Revision: D12916346

Pulled By: soumith

fbshipit-source-id: 3817c09a9a687fa2273f90444e5071ce1bb47260
2018-11-03 09:53:04 -07:00
0479517325 Add modernize-* checks to clang-tidy (#13196)
Summary:
Enables almost all `modernize-*` checks in clang-tidy. This warns against things such as:

- Use of `const std::string&` instead of new-style `std::string` + move,
- Using old-style loops instead of range-for loops,
- Use of raw `new`
- Use of `push_back` instead of `emplace_back`
- Use of `virtual` together with `override` (`override` is sufficient)

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13196

Differential Revision: D12891837

Pulled By: goldsborough

fbshipit-source-id: 4d0f782a09eb391ee718d3d66f74c095ee121c09
2018-11-02 20:30:40 -07:00
4bca51e3e7 unify BLAS check between Caffe2 and ATen (#13514)
Summary:
This PR is unifying BLAS check between Caffe2 and ATen. It skips redundant BLAS check for ATen in certain conditions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13514

Reviewed By: orionr

Differential Revision: D12905272

Pulled By: mingzhe09088

fbshipit-source-id: 05163704f363c97a762ff034f88a67bd32ac01d0
2018-11-02 18:40:10 -07:00
8fc63e523e Reslove lint and infer warning (#13520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13520

Reslove lint and infer warning shown in the dnnlowp migration diff.

Reviewed By: dskhudia

Differential Revision: D12905972

fbshipit-source-id: b07400e25b80ea656795b005b91ac1438abe2695
2018-11-02 17:43:49 -07:00
f74fa91b8e Fix EraseListConstruct pass during ONNX export (#13195)
Summary:
There should really be a single place to erase or do special treatment to the prim::ListConstruct during ONNX export, this will make it consistent across different calls. e.g it will give a correct output graph in the following case:
```python
class Test(torch.nn.Module):
    def forward(self, input):
        return torch.cat([input, torch.zeros(input.size(0), 1).type_as(input)], dim=1)
```
Before this PR, we have the onnx graph as:

```
graph(%0 : Byte(2, 3)) {
  %1 : Long() = onnx::Constant[value={0}](), scope: Test
  %2 : Dynamic = onnx::Shape(%0), scope: Test
  %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test
  %4 : Long() = onnx::Constant[value={1}](), scope: Test
  %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3)
  %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4)
  %7 : int[] = onnx::Concat[axis=0](%5, %6)
  %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test
  %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test
  %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test
  return (%10);
}

```
Which is wrong since onnx does not have a concept of `int[]`, here is the onnx graph after this PR:
```
graph(%0 : Byte(2, 3)) {
  %1 : Long() = onnx::Constant[value={0}](), scope: Test
  %2 : Dynamic = onnx::Shape(%0), scope: Test
  %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test
  %4 : Long() = onnx::Constant[value={1}](), scope: Test
  %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3)
  %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4)
  %7 : Dynamic = onnx::Concat[axis=0](%5, %6)
  %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test
  %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test
  %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test
  return (%10);
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13195

Differential Revision: D12812541

Pulled By: wanchaol

fbshipit-source-id: db6be8bf0cdc85c426d5cbe09a28c5e5d860eb3e
2018-11-02 15:09:06 -07:00
519570def8 Rename dim(i) -> size(i) - 2/2
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: salexspb

Differential Revision: D12896721

fbshipit-source-id: deb0290354a1ffd69d080f0f126479844bf04e3c
2018-11-02 14:29:06 -07:00
7b48a7c3f6 Bump gloo (#13513)
Summary:
Included math.h changes needed in #13422 and later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13513

Differential Revision: D12906653

Pulled By: pietern

fbshipit-source-id: 4d4ec7566bf07925b4ce86eb0c63d784cb6b9992
2018-11-02 12:16:17 -07:00
da029ca042 Skip Conv1D tests for MIOPEN (#13512)
Summary:
miopen currently only supports 2d

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13512

Differential Revision: D12903307

Pulled By: bddppq

fbshipit-source-id: a8b0f0580a1859f1e0c1518907406abf013c4c8c
2018-11-02 11:38:26 -07:00
34dd831dc2 Revert MKL rowwise moments (#13480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13480

Revert D12845220 since the MKL functions are using multi-thread while the single-thread run is slower than eigen version.

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D12891751

fbshipit-source-id: 2a61727b269a304daeee2af6ff7fee7820cb5344
2018-11-02 11:31:43 -07:00
cc3cecdba0 Fix the bug when compile using nvcc compiler. (#13509)
Summary:
I found a bug about compiling the cuda file when I install maskrcnn-benchmark lib.

`python setup.py build develop` will throw the error:
```
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 214, in unix_wrap_compile
    original_compile(obj, src, ext, cc_args, cflags, pp_opts)
  File "/usr/lib/python2.7/distutils/unixccompiler.py", line 125, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] +
TypeError: coercing to Unicode: need string or buffer, list found
```

For more information, please see [issue](https://github.com/facebookresearch/maskrcnn-benchmark/issues/99).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13509

Differential Revision: D12902675

Pulled By: soumith

fbshipit-source-id: b9149f5de21ae29f94670cb2bbc93fa368f4e0f7
2018-11-02 11:09:43 -07:00
2827fc7681 Add native wrappers for inplace bitwise operators.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13490

Differential Revision: D12894826

Pulled By: gchanan

fbshipit-source-id: bd7a0a50e824d92f8ad39e159c1c10318741191d
2018-11-02 11:03:24 -07:00
9f2b2cac37 Fix handling all empty bags in CUDA embedding bag (#13483)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11847
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483

Differential Revision: D12902914

Pulled By: SsnL

fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca
2018-11-02 10:21:14 -07:00
3d392cc5ec Migrate dnnlowp code to open source directory (#13500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13500

This diff migrate dnnlowp related files and operators from deeplearning/quantization/caffe2 and deeplearning/quantization/dnnlowp to the open source directory.

Reviewed By: jspark1105

Differential Revision: D10842192

fbshipit-source-id: 53d0666d0ae47a01db9c48114345d746b0a4f11f
2018-11-02 09:36:59 -07:00
bcb851a3d6 Write gesv derivatives in terms of native function.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13469

Reviewed By: ezyang

Differential Revision: D12889116

Pulled By: gchanan

fbshipit-source-id: 1a25dd6ec3fda5897c5cabbb9a62423b50bfda36
2018-11-02 08:30:24 -07:00
1e1dd88c4a Add Linux ppc64le CPU/GPU CI build status
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13507

Differential Revision: D12902281

Pulled By: soumith

fbshipit-source-id: d2c89dcf08dcbe1e451ae52e85256f658155a0e1
2018-11-02 07:51:40 -07:00
2f82a06826 Fix half_tensor.bernoulli_(double) (#13474)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/12431
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13474

Differential Revision: D12897834

Pulled By: SsnL

fbshipit-source-id: 598250fd7b9f1d2509ec0e5012724d7895a62daf
2018-11-02 07:46:46 -07:00
61a2d47ec6 Special handling for 1D covolutional kernels in cuDNN flavor of conv_op. (#12902)
Summary:
Essentially makes cuDNN to think of those kernels like of Nx1 ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902

Reviewed By: BIT-silence

Differential Revision: D10852862

Pulled By: soumith

fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c
2018-11-02 07:08:23 -07:00
86192301b3 Fix a few bugs in format and vararg handling (#13492)
Summary:
There are a couple subtle bugs in the way varargs is implemented:

1. it fails if you pass 0 arguments, because it doesn't handle the case when there are 0 varargs, and because Operator::matches was not updated.
2. it breaks all the named-based lookups on nodes. For instance node->get<int>(attr::value)
   will return a single entry of the varargs if you look it up by name.

Furthermore it complicates some assumptions about the positional arguments (e.g. they use to be
1-to-1 with node inputs but with varargs they are not).

Because varargs are only being used for format, this diff instead
just allows format to take any value as input, regardless of type. It just provides a way to set is_vararg
from the schema but does not restrict the type of the varargs things. This is inline with
the pre-existing behavior for is_vararg so it doesn't require Operator::matches changes.

This also keeps format inline with how print works, and is closer to the python implementation of format. Note that the implementation
of format already worked with arbitrary IValues so restricting to strings was just making it more conservative than needed.

This also fixes the implementation of format to work when there are 0 arguments or text before and after a format string, where it would not print things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13492

Differential Revision: D12896989

Pulled By: zdevito

fbshipit-source-id: 21425bac8edc81709030a7408180494edea0a54b
2018-11-02 00:07:00 -07:00
5fbaf0eaf8 add augmented assignment ops (#13364)
Summary:
This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends).
- To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments.
- Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types.
- In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364

Differential Revision: D12899734

Pulled By: suo

fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146
2018-11-02 00:01:07 -07:00
a0e783768f Do not fill in new data in every iteration if the input data only has one entry (#13495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13495

If the user has one data file to put in, the data is filled up in every iteration,
which actually flushes the caches. The retrieved latency is larger than the latency
when the caches are warm. Instead of doing that, we should only rely on wipe_cache
variable to wipe the caches.

The change is to skip filling the data if the input only has one size and it is
not the first iteration

Reviewed By: hl475

Differential Revision: D12897946

fbshipit-source-id: ee54ed09b8ec85fcefe930858420b90d494ad972
2018-11-01 22:06:09 -07:00
57e162da56 Switch mutable lists to new mutable schema (#13406)
Summary:
Goodbye, World! This PR removes the world tokens and associated pass and switches lists over to the new mutability/aliasing annotations.

Should resolve #12780 since we are disabling optimization pending alias analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13406

Differential Revision: D12886463

Pulled By: suo

fbshipit-source-id: e64e55905aebdcad273b39862df3209f823f5408
2018-11-01 19:41:04 -07:00
6d2b3cc869 Fix pytest, make it work with run_test.py (#13416)
Summary:
Fixes #13326

Also now you can use `run_test.py` with `pytest`. E.g.,
```
python run_test.py -vci distributed -pt
```

Yes it works with `distributed` and `cpp_extension`.

cc zou3519 vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13416

Differential Revision: D12895622

Pulled By: SsnL

fbshipit-source-id: 2d18106f3a118d642a666bfb1318f41c859c3df7
2018-11-01 19:08:06 -07:00
0fd176fea4 Add operator is, not, is not to script (#13336)
Summary:
As titled, this PR is a part of tasks to unblock exporting the standard library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13336

Differential Revision: D12888912

Pulled By: wanchaol

fbshipit-source-id: 6213a17a75a593ae45999994fd9562f29b7d42df
2018-11-01 16:55:28 -07:00
24839aac59 Link libgloo.a after libc10d.a to resolve remaining symbols (#13462)
Summary:
libcaffe2.so depends on libgloo.a for the ops in caffe2/contrib/gloo.
Symbols in libgloo.a that are not used are ignored and don't end up in
libcaffe2.so. libc10d.a depends on the caffe2 target, which in turn
depends on the gloo target, and it expects all libgloo.a symbols to be
part of libcaffe2.so. Symbols from libgloo.a that are not used in
libcaffe2.so remain undefined in libc10d.a.

To fix this, we link to libgloo.a when linking _C.so, such that any
gloo symbols in libc10d.a are resolved when linking _C.so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13462

Differential Revision: D12892830

Pulled By: pietern

fbshipit-source-id: 7560b3899b62f76081b394498480e513a84cefab
2018-11-01 16:03:33 -07:00
e6b6cc06ee caffe2/core hipify (#13457)
Summary:
Small edits to caffe2/core hipify to make it compile in fbcode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13457

Reviewed By: bddppq

Differential Revision: D12883472

Pulled By: xw285cornell

fbshipit-source-id: 1da231d721311d105892db13ed726240398ba49e
2018-11-01 15:49:56 -07:00
421f3f3e52 add npair builtins (#13473)
Summary:
Add npair builtins to unblock standard library. As with broadcasting list, the only occurrences are with int/floats.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13473

Differential Revision: D12890844

Pulled By: eellison

fbshipit-source-id: c360bb581d0f967cb51b858b6f964c300992d62a
2018-11-01 15:42:52 -07:00
27002e3fd5 Enable a few hicpp (#13189)
Summary:
Enabling three checks from ["High Integrity C++"](https://www.perforce.com/blog/qac/high-integrity-cpp-hicpp)

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13189

Differential Revision: D12859779

Pulled By: goldsborough

fbshipit-source-id: 8ec22370dcf88618dae749a8dae0e82678e68b0e
2018-11-01 15:19:17 -07:00
d843f63f2a optimization on cpu conv3d (#11884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11884

In cpu mode, current convNd uses Im2ColNdNCHWImpl, which is generic implementation to handle convolutional layer for arbitrary number of dimensions. In video modeling, we use convNd for filter dimension=3.

The problem of current convNd is that Im2ColNdNCHWImpl is much slower than Im2Col used by conv2d for the filters with same Flops. For example, a (1, 7, 7) 3d filter takes 5 times longer than a (7, 7) 2d filter at inference time.

This diff extends Im2Col to 3d case (Im2Col3dNCHWImpl), and this optimization for 3d convolution gives 4~5 times faster inference time on cpu for various video models:

{F128300920}

i-am-not-moving-c2-to-c10

Reviewed By: BIT-silence

Differential Revision: D8245940

fbshipit-source-id: 75231d65c9dd56059dfe31701e26021fd1ff2a85
2018-11-01 15:13:26 -07:00
d714ecf879 Rename potrf to cholesky (#12699)
Summary:
This PR performs a renaming of the function `potrf` responsible for the Cholesky
decomposition on positive definite matrices to `cholesky` as NumPy and TF do.

Billing of changes
- make potrf cname for cholesky in Declarations.cwrap
- modify the function names in ATen/core
- modify the function names in Python frontend
- issue warnings when potrf is called to notify users of the change

Reviewed By: soumith

Differential Revision: D10528361

Pulled By: zou3519

fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88
2018-11-01 15:10:55 -07:00
26a8bb62ee Re-enabled mm+add tree batching in the JIT (#13228)
Summary:
I've had to generously increase the range of the CreateADSubgraphs pass, because even though it collapses the RNN loop to a single differentiable subgraphs and a few other nodes, the range uses the distances in the original graph...

cc zdevito zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13228

Differential Revision: D12871316

Pulled By: zou3519

fbshipit-source-id: 32da6f30f7821e4339034f1a4dec41ed0849abfb
2018-11-01 14:50:17 -07:00
81438f1220 Add transpose network pass (#13437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13437

revert
transform the NCHW Convolution operators to NHWC and the tensors around these operators

Reviewed By: bwasti

Differential Revision: D12871789

fbshipit-source-id: 6509a29fa1654424d22904df0d3e60f8cd9c0ec7
2018-11-01 14:27:07 -07:00
a1728602da Convert Arguments to dictionary (#13436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13436

revert
Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.

Reviewed By: bwasti

Differential Revision: D12871811

fbshipit-source-id: 486ad09f3f37723c92a946c486ce3e24a649b4e6
2018-11-01 14:27:05 -07:00
469c6b0539 Replace tmpnam usage (#13289)
Summary:
Fix
```
/torch_shm_manager#compile-manager.cpp.oc089dac2,gcc-5-glibc-2.23-clang/manager.cpp.o:manager.cpp:function main:
warning: the use of `tmpnam' is dangerous, better use `mkstemp`
```

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13289

Differential Revision: D12873282

Pulled By: goldsborough

fbshipit-source-id: fc64b59403d52eb271744378ef4ee8338c79312c
2018-11-01 13:50:43 -07:00
edc6d721e0 fix flake (#13463)
Summary:
fix flake on test/test_jit.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13463

Differential Revision: D12886532

Pulled By: eellison

fbshipit-source-id: 1cd2a736663d5037bb4bdcd1d8ca1f201cf6a1cf
2018-11-01 13:39:39 -07:00
99ce499bfe Revert D12852205: [pytorch][PR] [jit] Add str() builtin
Differential Revision:
D12852205

Original commit changeset: 3e0e9218afdf

fbshipit-source-id: 114b4873504109394fe9d489200d39764ecc638e
2018-11-01 12:48:48 -07:00
e2e560d9c8 Improved the caffe2 to ONNX export (#13429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13429

Made the SSA transformation idempotent. This ensures that if a caffe2 graph is already in SSA form, the name of the ONNX models inputs/outputs match these of the caffe2 graph.
Avoid evaluating the model by running it if the shapes of all the blobs are present in the value_info map. This speeds up the conversion and decrease its memory usage in the case of medium to large nets.

Reviewed By: abadams

Differential Revision: D12873354

fbshipit-source-id: d695b28e610562afa9a41c2d4da05be212ccb488
2018-11-01 12:40:24 -07:00
54d63c5752 added fbgemm as submodule (#13354) 2018-11-01 15:35:02 -04:00
c2dd0b9fad Put torch/csrc/jit/fuser/config.h in gitignore
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13461

Differential Revision: D12886222

Pulled By: goldsborough

fbshipit-source-id: f7cfb65f671129f46b5eafd75a6b00fa996371ac
2018-11-01 12:27:57 -07:00
de0d85ba98 Remove getTHCudaHostAllocator in favor of getPinnedMemoryAllocator (#13451)
Summary:
```
Both allocate "pinned" memory on the host (CPU). The allocator returned
by at::cuda::getPinnedMemoryAllocator caches allocations, while
getTHCudaHostAllocator would synchronize on frees.
```

This is super minor, but I want to avoid people grabbing getTHCudaHostAllocator by accident. (It's not currently used anywhere).

We still need a better API for allocating pinned memory from both C++ and Python. (See https://github.com/pytorch/pytorch/issues/2206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13451

Differential Revision: D12883037

Pulled By: colesbury

fbshipit-source-id: 5d327e715acc1ded9b19660f84ecd23c8334d1c1
2018-11-01 12:18:29 -07:00
8f2bc1bc56 Add str() builtin (#13278)
Summary:
Allow casting to string from any IValue type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13278

Differential Revision: D12852205

Pulled By: driazati

fbshipit-source-id: 3e0e9218afdf27569da3ebf155f25e77e9f12984
2018-11-01 12:01:50 -07:00
70db53661b expose fixed length list argument (#13142)
Summary:
Arguments have an optional fixed length list field which allows either a list or a single element that will be broadcast to a fixed length.

This PR exposes that as a denotable argument, mostly to cover the many instances in which this used in the standard library. It appears in the standard library with ints & floats. Since this is not really a pattern we want to promote moving forward, I did not expose this for booleans or tensors.

We could consider making the optional static length part of the list type, instead of the argument, which would make some of this code much nicer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13142

Differential Revision: D12876047

Pulled By: eellison

fbshipit-source-id: e7359d2a878b4627fc2b9ebc090f9849ee524693
2018-11-01 10:34:52 -07:00
99a5d19591 Rename elementwise_mean to mean (#13419)
Summary:
Closes #12459
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419

Differential Revision: D12883299

Pulled By: SsnL

fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7
2018-11-01 10:31:26 -07:00
a5b627a0bf add assert statements (#13408)
Summary:
Adding assert statements to unblock standard library.

The same limitations that apply to the existing implementation of Exceptions apply to this as well
(No control-flow logic, & we ignore the specific Exception thrown).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13408

Reviewed By: driazati

Differential Revision: D12876451

Pulled By: eellison

fbshipit-source-id: 767ba5a50ba7c5dd6a857ed4845ac076a81cf305
2018-11-01 10:01:07 -07:00
004fc2f430 Stop unnecessarily setting storage in as_strided. (#13411)
Summary:
As per ezyang's suggestion

Previously, tensor.as_strided would:
- allocate a tensor `result` and a storage
- throw away that storage in favor of the input tensor's storage.

This PR makes tensor.as_strided not allocate a storage just to throw it
away. This speeds up as_strided from 770ns to 344ns.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13411

Reviewed By: ezyang

Differential Revision: D12870309

Pulled By: zou3519

fbshipit-source-id: 1415e656f4d1931585c9a6006dcd4670123352d0
2018-11-01 08:32:53 -07:00
c0e24443f7 Revert D10459665: [c10] Redo jit/type and utils/functional to ATen/core
Differential Revision:
D10459665

Original commit changeset: 563dec9987aa

fbshipit-source-id: bea1dac93ebe73c9e09753d641f04f722d80aef7
2018-11-01 07:26:54 -07:00
8444ed951d add sleep time between runs (#12347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12347

add sleep time between net and operator runs, and between each iteration.

Reviewed By: sf-wind

Differential Revision: D10209308

fbshipit-source-id: 9a42b47e1fdc14b42dba6bb3ff048fe8e2934615
2018-11-01 00:25:22 -07:00
86e1009497 Make ATen core HIP compatible (#13343)
Summary:
So caffe2 can include aten core files without hipifying aten

cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13343

Reviewed By: xw285cornell

Differential Revision: D12853162

Pulled By: bddppq

fbshipit-source-id: f9402691292180dde110a58ea3b1cedc62aab0ba
2018-10-31 21:08:54 -07:00
10a6a3e404 Redo jit/type and utils/functional to ATen/core (#12862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12862

This is a redo of the previous move in a way that doesn't migrate the namespace -- also will check for the windows cudnn build failure

Reviewed By: Yangqing

Differential Revision: D10459665

fbshipit-source-id: 563dec9987aa979702e6d71072ee2f4b2d969d69
2018-10-31 19:57:43 -07:00
c76fc75292 Implementation copy operator for mkl-dnn (#12820)
Summary:
It is a operator to copy blob from ideep device to ideep device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12820

Reviewed By: ezyang

Differential Revision: D10850956

Pulled By: yinghai

fbshipit-source-id: f25bff6238cefe847eb98277979fa59139bff843
2018-10-31 19:35:53 -07:00
96ab7cbe5c Make gels error message nicer (#13421)
Summary:
cc vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13421

Differential Revision: D12875237

Pulled By: SsnL

fbshipit-source-id: 889a9820be77bb8055d41e395d7bf55d092b35d7
2018-10-31 19:25:57 -07:00
6fe089c6ea Hierarchical device independent -> device specific architecture (#13108)
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.

The fusion flow is now:

- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.

This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.

This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.

Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)

Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code

apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108

Reviewed By: gchanan, fmassa

Differential Revision: D12850608

Pulled By: soumith

fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
2018-10-31 18:13:00 -07:00
2df6d3e3c7 Fix allocator handling in raw_mutable_data (#13349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13349

When we get a Tensor that was created in ATen, it will have an allocator set.
Such tensors, before, crashed when you called raw_mutable_data on them.
This diff fixes that.

Reviewed By: ezyang, teng-li

Differential Revision: D12850833

fbshipit-source-id: 51a5f7030afc4854b439cb3698d0ccd8dd101e2c
2018-10-31 18:04:41 -07:00
a682ce9144 Add back HIP support to async net (#13400)
Summary:
We lost HIP support in last refactoring 620ece2668
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13400

Differential Revision: D12868211

Pulled By: bddppq

fbshipit-source-id: 72dbfda105b826bee28ddf480e88fca7d63f93d8
2018-10-31 17:52:36 -07:00
eaf141dd64 Enable opencv and lmdb in ROCM CI (#13430)
Summary:
They are needed to run resnet50_trainer when using datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13430

Differential Revision: D12876593

Pulled By: bddppq

fbshipit-source-id: 912943d1d84d165ad396c8a99d2b948d933e12f2
2018-10-31 17:50:33 -07:00
2e1b7a6f4f Renaming dim() to size() - 1/3 (#13434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13434

Codemod generated with clangr shard mode, 50 files per diff,
clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12867223

fbshipit-source-id: 3e05be1a370ebd1a273bd4c70499d019fd056ac4
2018-10-31 17:43:52 -07:00
edd902594a Renaming meta() to dtype() - 1/2 (#13333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13333

Codemod generated with clangr shard mode, 50 files per diff,
clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12845168

fbshipit-source-id: 492091963d2211ea80215200e981965767566135
2018-10-31 17:14:08 -07:00
470bfaa586 int8 sigmoid op (#13298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13298

Int8 sigmoid ops, test provided. Only supports first axis now

Reviewed By: newstzpz

Differential Revision: D12837824

fbshipit-source-id: 2a9f1739813fe7b48f841ae15e0206768e57cd3e
2018-10-31 16:22:45 -07:00
48db74ea03 net_simple_refcount type to help experimentation with dynamic allocation. (#13370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13370

This diff adds a new net type (simple_refcount) that does one thing: for all
intermediate results produced by a net, it will keep refcount about internal
usage, and when it finishes its consumption, the net will delete the blob
content to mimic the case of dynamic allocation. In fact, this would also be
the behavior when we go functional: anything that is not explicitly marked as
input or output will be up to the executor for lifetime management.

See the comments in net_simple_refcount.cc for details.

Reviewed By: dzhulgakov

Differential Revision: D12855489

fbshipit-source-id: 594a47a786305d595fd505b6700864dd1d9c72aa
2018-10-31 15:59:16 -07:00
479b8266bf Back out "[pytorch][PR] Support upsample" (#13413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13413

Original commit changeset: d5db200365f1

Reviewed By: houseroad

Differential Revision: D12870356

fbshipit-source-id: be115d2370636786901c822895664ccace2a9bc2
2018-10-31 15:51:41 -07:00
a4778862c7 Docs/cpp misc features and fixes (#12914)
Differential Revision: D10502199

Pulled By: ezyang

fbshipit-source-id: ec7523caf37d2c92a0e7a2981e1badf51b93dd05
2018-10-31 15:22:45 -07:00
7b47262936 Use names instead of indices in format (#13266)
Summary:
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13266

Differential Revision: D12841054

Pulled By: goldsborough

fbshipit-source-id: 7ce9f942367f82484cdae6ece419ed5c0dc1de2c
2018-10-31 15:17:47 -07:00
a376f3a53f Revert "Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL" (#13407)
Summary:
This reverts commit b1fe541de35381e3a31a9e71db2be4b3af59dbcc.

some CI confusion made it look like this diff needed to be reverted; however the actual issue was elsewhere
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13407

Differential Revision: D12869650

Pulled By: anderspapitto

fbshipit-source-id: 3a436d41fc8434f9aa79b145f20904c99093eef4
2018-10-31 14:02:25 -07:00
f9c0a08eed Fix len() for tensors (#13398)
Summary:
Fixes #13376, `len(tensor)` was converting tensor to a 1 element list and returning 1 every time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13398

Differential Revision: D12867630

Pulled By: driazati

fbshipit-source-id: 28f3580a072d763df0980b3149c49d1894842ec9
2018-10-31 13:13:21 -07:00
9577811908 Using pip --user in test.sh script breaks ppc64le builds (#13388)
Summary:
Recent PR #13366 added --user to pip install breaks ppc64le testing when using test.sh. This fix makes it not be used for ppc64le builds/test as both ninja and hypothesis are already in ppc64le docker images.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13388

Differential Revision: D12870164

Pulled By: soumith

fbshipit-source-id: b66bafc06ad2c5116bb5ef5e4681cf9c776084aa
2018-10-31 13:09:26 -07:00
08b7c791ff Windows CI hotfix: Pin Python version to 3.6.7 (#13410)
Summary:
The newest version of `mkl` in conda only supports Python 3.6.7, and installing it as dependency will automatically downgrade Python from 3.7 to 3.6.7, which creates environment divergence between Windows CI build and test jobs. This PR pins Python version to 3.6.7, so that Windows CI build and test jobs have the same conda environment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13410

Differential Revision: D12870201

Pulled By: yf225

fbshipit-source-id: 2c5a41ad4bcc72e02d12ea6529550d5e1cdd45ef
2018-10-31 13:02:18 -07:00
404f8660e7 Add string.format() (#13157)
Summary:
This PR adds `aten::format` as a builtin op for strings with the basic formatting semantics of Python.

It also adds varargs to the schema parser (with the limitation that the varargs item is the last argument, i.e. `(*args, **kwargs)` is not supported) and to the compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13157

Differential Revision: D12832537

Pulled By: driazati

fbshipit-source-id: 17c1a5615bb286c648fc9e38f2ebe501b064c732
2018-10-31 12:50:56 -07:00
b3ef98450b Use non-th versions of some functions when defining backwards. (#13394)
Summary:
In these cases, the native function doesn't do anything different besides checking so there is no semantic change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13394

Differential Revision: D12861272

Pulled By: gchanan

fbshipit-source-id: ef7403ef3ce0326ccb12178434ce0cf14b28426e
2018-10-31 12:42:03 -07:00
f30c74558c Revert D10861211: Convert Arguments to dictionary
Differential Revision:
D10861211

Original commit changeset: da2fcc3e3b4d

fbshipit-source-id: 7243cb340920cf0acb57420bb5de908acd02a064
2018-10-31 12:38:43 -07:00
93b16b6422 Revert D10519758: [nomnigraph] Add transpose network pass
Differential Revision:
D10519758

Original commit changeset: a268374fb0b1

fbshipit-source-id: 4de4c99a185c4083665226af94312b38dd0f6820
2018-10-31 12:34:14 -07:00
b1fe541de3 Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL
Differential Revision:
D12858091

Original commit changeset: 1cc91bb3b82e

fbshipit-source-id: a9b55ea8c138f939af71caefdfe7d4bccf0cd331
2018-10-31 11:32:46 -07:00
a43c6385f1 When looking for pybind11, do not attempt to get properties from pybind11:pybind11. (#12188)
Summary:
There is no property name "INTERFACE_INCLUDE_DIRECTORIES" for pybind11::pybind11. This will cause cmake error if there exists a system installation of pybind11. In addition, pybind11_INCLUDE_DIRS is already set once "find_package(pybind11 CONFIG)" finds pybind11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12188

Differential Revision: D10362655

Pulled By: soumith

fbshipit-source-id: 9c5d13295c4a2cf9aacd03e195994287d06ed15c
2018-10-31 11:23:01 -07:00
f5b34e3446 Handle exceptions in at::parallel_for() (#13393)
Summary:
Currently, exceptions thrown in at::parallel_for() will cause a hard crash
if the code is executed by a background thread. This catches the exception
and re-throws it in the main thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13393

Differential Revision: D12861142

Pulled By: colesbury

fbshipit-source-id: d53f5ff830ef8c11f90477eb63e5016f7ef1a698
2018-10-31 11:22:59 -07:00
a4f00c3d1e Fix error message in tensorlist()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392

Differential Revision: D12860921

Pulled By: colesbury

fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa
2018-10-31 11:19:56 -07:00
cda44ffa81 Add transpose network pass (#13396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13396

stub for bootcamp task

Reviewed By: bwasti

Differential Revision: D10519758

fbshipit-source-id: a268374fb0b119c5d1960a4382e51c5e1ca240ba
2018-10-31 11:16:41 -07:00
04e8a6d9ef Convert Arguments to dictionary (#13332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13332

Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary.

Reviewed By: bwasti

Differential Revision: D10861211

fbshipit-source-id: da2fcc3e3b4dbf8decbe14a8e2d5621b3fcc377f
2018-10-31 11:16:39 -07:00
2cebcbae8c createUniqueDataNode
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13395

Reviewed By: bwasti

Differential Revision: D12831584

fbshipit-source-id: a349dfe7a1da0d90e62b47e1b917f358275007be
2018-10-31 11:16:38 -07:00
a25d3b4d8c Use byte tensor for mnist labels. (#13363)
Summary:
The C++ mnist example https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp
 does not work because the labels are not correctly loaded.  Currently it achieves 100 % accuracy.  Specifying byte dtype fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13363

Differential Revision: D12860258

Pulled By: goldsborough

fbshipit-source-id: ad7b9256e4fc627240e25c79de9d47b31da18d38
2018-10-31 11:05:40 -07:00
488d393ea6 Fix pointwise loss broadcast (#12996)
Summary: Fixes #12129 , #12327

Differential Revision: D10513781

Pulled By: ailzhang

fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec
2018-10-31 10:17:25 -07:00
27ccc8787f Implement data_ptr as a native function.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13367

Reviewed By: ezyang

Differential Revision: D12855339

Pulled By: gchanan

fbshipit-source-id: da5d75ab38e01365717eed9a676dcbb22ac89fe7
2018-10-31 09:51:04 -07:00
cb87319eb0 restore USE_C10D_NCCL
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13371

Differential Revision: D12858091

Pulled By: anderspapitto

fbshipit-source-id: 1cc91bb3b82ec075481353e6f58dfe4e802fee5d
2018-10-31 09:46:45 -07:00
4c06f1f2bb CircleCI: enable all flaky tests (#13356)
Summary:
A few Caffe2 tests are currently disabled in `py2-gcc4.8-ubuntu14.04` test job because they are known to be flaky. https://github.com/pytorch/pytorch/pull/13055 likely had fixed the flakiness, and this PR tests it.

Fixes https://github.com/pytorch/pytorch/issues/12395.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13356

Differential Revision: D12858206

Pulled By: yf225

fbshipit-source-id: 491c9c4a5c48ac1b791fdc9d78acf66091e80457
2018-10-31 09:34:49 -07:00
bc74ec80d0 Add support for torch.backends.cudnn.enabled (#13057)
Summary:
This is used commonly in `nn` functions. This PR adds it as a weak
module (and also alters the conversion of weak modules to strong modules
to accept ordinary `object`s)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057

Differential Revision: D10846618

Pulled By: driazati

fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b
2018-10-31 09:31:09 -07:00
b200b51602 Give _dirichlet_grad a native wrapper.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13368

Reviewed By: ezyang

Differential Revision: D12855461

Pulled By: gchanan

fbshipit-source-id: a220ff464ef09e4efcd9da296fa8b6839b94c337
2018-10-31 07:57:32 -07:00
0aaff5eaf9 Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275

This resulted in a bunch of knock-on changes, which I will now
describe:

- s/original_index/original_device/
- s/last_index/last_device/
- A bunch of places that used set_index, now use CUDAGuard (which does have
  set_index) because they were CUDA-specific code.

Major caveat: DeviceGuard doesn't *actually* work non-CUDA/CPU devices, To make
that happen, I plan on totally replacing the implementation of DeviceGuard; what
I mostly care about here is wrangling the API into an acceptable state.

Reviewed By: gchanan

Differential Revision: D12832080

fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d
2018-10-31 07:55:13 -07:00
e5d56659ec Delete DeviceGuard(int64_t) constructor. (#13232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232

DeviceGuard should be device agnostic, which means that it shouldn't
assume that int64_t means select the CUDA device.

Reviewed By: gchanan

Differential Revision: D10858024

fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe
2018-10-31 07:55:11 -07:00
e93c721da1 Add c10::Stream, make at::cuda::CUDAStream use it. (#13133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13133

c10::Stream is a device agnostic object which represents a stream
on some device (defined as c10::Device).  The primary benefit of
introducing this object is that we can easily refer to it from code
in the non-CUDA library (since it doesn't actually refer to any
CUDA specific bits.)

Streams are identified by an ID into an appropriate pool.  There's
some work to translate to and from pointers to the pool; see inline
comments.

Reviewed By: gchanan

Differential Revision: D10855883

fbshipit-source-id: cc447f11a528432e41c2edc789f40e7a6f17bdd3
2018-10-31 07:55:10 -07:00
a3410f7994 Give addbmm a native wrapper.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13316

Reviewed By: ezyang

Differential Revision: D12840406

Pulled By: gchanan

fbshipit-source-id: ebcc495f2437da71778001971c32ad6074cf98b7
2018-10-31 07:28:46 -07:00
e6ace54840 Move underscore prefixed th functions _th prefix.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13308

Differential Revision: D12839464

Pulled By: gchanan

fbshipit-source-id: ceb5913cd154de301d0d476d70b3a4fc62eb319c
2018-10-31 07:03:34 -07:00
e475d3ede3 DDP multi-GPU segfault fix (#13291)
Summary:
Fix https://github.com/pytorch/pytorch/issues/13200

Tested on 8 GPU machines since CI doesn't have this many GPUs, so multi-GPU test won't be triggered

```
tengli@learnfair096:~/pytorch/test$ python run_test.py -i distributed --verbose
Selected tests: distributed
Running test_distributed ... [2018-10-29 20:32:46.355858]
/public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec
Running distributed tests for the gloo backend
test_DistBackend (__main__.TestDistBackend) ... ok
test_DistributedDataParallel (__main__.TestDistBackend) ... ok
test_DistributedDataParallelCPU (__main__.TestDistBackend) ... ok
```

Also I would like to bump up the bucket size of broadcast to higher for performance reasons
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13291

Differential Revision: D12842840

Pulled By: teng-li

fbshipit-source-id: e8c50f15ebf2ab3e2cd1b51d365e41a6106b98fe
2018-10-31 00:43:42 -07:00
dc854c0ee6 Add --user to pip install in pytorch test scripts (#13366)
Summary:
caffe2 docker images uses native system python, which requires sudo to do pip install.
In pytorch rocm Ci we use caffe2 docker image
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13366

Differential Revision: D12855748

Pulled By: bddppq

fbshipit-source-id: 3e53fa203fa6bb3c43d4065c38c2b61e47f45f1e
2018-10-30 23:09:00 -07:00
44d2ca660a Disable CCACHE while building NCCL (#13340)
Summary:
I don't have a full analysis, but ccache appears to often fail while
nccl. To work around this, run the NCCL build with CCACHE_DISABLE.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13340

Differential Revision: D12855467

Pulled By: anderspapitto

fbshipit-source-id: 63eb12183ab9d03dd22090f084688ae6390fe8bd
2018-10-30 22:19:21 -07:00
bfe7df2211 Optimize rowwise_moments by MKL (#13329)
Summary:
i-am-not-moving-c2-to-c10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13329

Optimize rowwise_moments by MKL

Reviewed By: houseroad

Differential Revision: D12845220

fbshipit-source-id: b047e52ba82ed184bd322680fbf96306dfbb9867
2018-10-30 21:43:36 -07:00
865a10feba Update NCCL to 2.3.7-1 (#13353)
Summary:
Including some hang fixes. Tested locally and distributed works fine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13353

Differential Revision: D12853714

Pulled By: teng-li

fbshipit-source-id: be72b9ffb48cffdb590e5452b0a4ec597f052685
2018-10-30 21:34:59 -07:00
265c97decf nomnigraph - More operator definitions (#13358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13358

More operator definitions changes

Reviewed By: itomatik

Differential Revision: D12852403

fbshipit-source-id: 0a69d9c6b55ab48344521ab9dba1de003dfc0714
2018-10-30 20:59:42 -07:00
59f8e8ada7 First step at adding exceptions (#12789)
Summary:
This is a first step towards adding exceptions. We need minimal support in order to begin converting the torch library to weak script mode (which is the main goal here).

Some limitations (that are documented in the tests & compiler):
1. Cannot assign exceptions to variables
2. Any name after raise is being treated as a valid Exception
3. No control flow analysis yet. Below a will be undefined:

if True:
     a = 1
else:
     raise Exception("Hi")
return a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12789

Differential Revision: D12848936

Pulled By: eellison

fbshipit-source-id: 1f60ceef2381040486123ec797e97d65b074862d
2018-10-30 20:25:50 -07:00
c7027a511f In pytorch CI install ninja via pip instead of building it from source
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13042

Differential Revision: D12854708

Pulled By: bddppq

fbshipit-source-id: 2693d8c9818782cb9f0c958dee8f77a1c131e32d
2018-10-30 20:05:40 -07:00
3c66520dd8 Remove aten/src/ATen/CUDAStream.cpp from hipify script (#13357)
Summary:
Deleted in https://github.com/pytorch/pytorch/pull/13251
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13357

Differential Revision: D12852983

Pulled By: bddppq

fbshipit-source-id: 0816a14188590e1971fabefcd575489c7339e122
2018-10-30 19:48:07 -07:00
13b9fd3e05 Renaming meta() to dtype() - 2/2 (#13334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13334

Codemod generated with clangr shard mode, 50 files per diff,
clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

i-am-not-moving-c2-to-c10

Reviewed By: ezyang

Differential Revision: D12845197

fbshipit-source-id: f87eb575d3c31593ca76b70780cc4fca888e706b
2018-10-30 18:24:30 -07:00
cb5f374f6c More functions moved to native, use _th_ prefix more consistently.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13262

Reviewed By: ezyang

Differential Revision: D12827704

Pulled By: gchanan

fbshipit-source-id: c910c069200c0766dd6d5f998d341124d560e80d
2018-10-30 17:41:55 -07:00
7d9ab140bf Fix aten::to symbolic + add expand_as (#13325)
Summary:
https://github.com/pytorch/pytorch/pull/13146 broke some cases of ONNX export, this fixes them
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13325

Differential Revision: D12844294

Pulled By: jamesr66a

fbshipit-source-id: f98dd0685820b2a1e5fcd49733cfa5c19c48a4e7
2018-10-30 17:28:15 -07:00
4d141bee98 Skip test_sum_noncontig in ROCm (#13341)
Summary:
Since it fails due to insufficient precision for DoubleTensor .sum() on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13341

Differential Revision: D12851335

Pulled By: bddppq

fbshipit-source-id: e211c3868b685aa705160ce98a2a18a915ad493f
2018-10-30 16:54:44 -07:00
f1d02f6d1c Move underscore prefixed linear algebra TH functions to _th prefix.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13309

Reviewed By: ezyang

Differential Revision: D12839533

Pulled By: gchanan

fbshipit-source-id: 27bdc5254d2529269b705c2c057826a44297a34b
2018-10-30 16:31:53 -07:00
11a16961a5 Fix "CUDA Tensor __rsub__ breaks when device is not 0" (#12956)
Summary:
Currently, `a = 1 - torch.tensor([1]).to('cuda:1')` puts `a` in `cuda:1` but reports `a.device` as `cuda:0` which is incorrect, and it causes illegal memory access error when trying to access `a`'s memory (e.g. when printing). This PR fixes the error.

Fixes https://github.com/pytorch/pytorch/issues/10850.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12956

Differential Revision: D12835992

Pulled By: yf225

fbshipit-source-id: 5737703d2012b14fd00a71dafeedebd8230a0b04
2018-10-30 16:29:19 -07:00
d2659f6689 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13346

Differential Revision: D12850686

Pulled By: michaelsuo

fbshipit-source-id: b7474d0a3f3347034592bef45125610c040cff6a
2018-10-30 16:22:58 -07:00
f58e4fbc45 Remove redundant array-gen loop in gather_ops_test.py (#13338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13338

Remove unnecessary [r for r in []] statements.

Reviewed By: ezyang

Differential Revision: D12848907

fbshipit-source-id: 256551b286ac6801585acf9bb0b2644ef0b7ed58
2018-10-30 16:20:22 -07:00
77b8aade58 Revert D12809293: Kill more weird constructors on Tensor
Differential Revision:
D12809293

Original commit changeset: 5eb663fe8182

fbshipit-source-id: 709a5378fdbbb3fcfaacef8fc48b6530afbbc28f
2018-10-30 16:01:51 -07:00
ed60f94dba hipify caffe2 script in fbcode (#13265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13265

Make changes to make hipify_python script to work with fbcode.

1. Add TARGETS file
2. Make hipify_python a module as well as a standalone script.

Reviewed By: bddppq

Differential Revision: D10851216

fbshipit-source-id: cacd04df6fe2084832256d1916d62dccea86baa9
2018-10-30 15:51:28 -07:00
9ca8a76645 Rename Type.tensor to Type._th_tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13313

Reviewed By: ezyang

Differential Revision: D12840136

Pulled By: gchanan

fbshipit-source-id: 896d705eb5091f7677d6d91dbd50629343dfa24d
2018-10-30 15:34:06 -07:00
c68b82ebc8 don't expand cmake variable in IF
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13331

Differential Revision: D12849306

Pulled By: anderspapitto

fbshipit-source-id: 2f1f72a44ed3a176be8c7490652e49771c3fadbf
2018-10-30 15:20:43 -07:00
cc3618ce36 Move _cumsum and _cumprod to _th_ prefixes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13311

Reviewed By: ezyang

Differential Revision: D12839706

Pulled By: gchanan

fbshipit-source-id: 79e20b31c6ca2f22229ad3903aacf70dc674c25c
2018-10-30 15:01:14 -07:00
ce469e6c71 dims() to sizes() remaining part
Summary: Made the clangr rule more robust and it discovered more callsites.

Reviewed By: smessmer

Differential Revision: D12825017

fbshipit-source-id: 3be1eeb7ea697b36ef89e78ba64c0ee1259439c4
2018-10-30 14:56:21 -07:00
9af18d847a Fix accesses to uninitialized memory when running sum() within an OMP… (#13274)
Summary:
```
… parallel region.

The two_pass_reduction code allocates a buffer of size at::max_threads().
When called within a parallel region, at::parallel_for only uses 1 thread
so some of this buffer is not written.

This makes two changes:

1) two_pass_reduction is not called when already in a parallel region
2) two_pass_reduction fills unwritten buffer elements with the identity
   (the value in dst)
```

cc The controller you requested could not be found. SsnL: I think this should fix the NaNs in BatchNorm when calling sum() within a parallel region.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13274

Differential Revision: D12840034

Pulled By: colesbury

fbshipit-source-id: d32e80909a98a0f1bb1c80689fe5089b7019ef59
2018-10-30 14:17:35 -07:00
f04a705cb2 Remove assertions in conv modules (#13283)
Summary:
These assertions aren't necessary because these conditions are checked inside the ATen ops, and right now they're not very user-friendly because they don't have an error message or reference the dimension of the tensor being checked. Let's just remove them (the error then comes from ATen with a friendlier message).

ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13283

Differential Revision: D12840730

Pulled By: goldsborough

fbshipit-source-id: 1902056c7d673f819c85f9164558e8d01507401c
2018-10-30 13:51:12 -07:00
c0411719fc Rename th_addmm to _th_addbmm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13317

Reviewed By: ezyang

Differential Revision: D12840603

Pulled By: gchanan

fbshipit-source-id: 10ead96cd181535cbd4dfe84be813375024dbd2c
2018-10-30 13:48:49 -07:00
3a81984bde Make Stat put ops accept empty tensors safely (#13178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13178

Add default value option to stats put ops

Reviewed By: mlappelbaum

Differential Revision: D10858564

fbshipit-source-id: cc9b3e621abf3fc21821b73f354bebdcd35e477e
2018-10-30 13:28:58 -07:00
ce51e3fe55 Move the Test conversion script to main repo (#13287)
Summary:
Better to keep it in the main repo, so we will have the correct dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13287

Reviewed By: zrphercule

Differential Revision: D12834665

Pulled By: houseroad

fbshipit-source-id: 3a0afaa705a9b8f4168fcd482123bcabcf083579
2018-10-30 13:25:22 -07:00
3cb2470bb3 add __deepcopy__ back to Parameter (#12886)
Summary:
- fix https://github.com/pytorch/pytorch/issues/315
- add `__deepcopy__` back to Parameter class
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12886

Differential Revision: D12838771

Pulled By: weiyangfb

fbshipit-source-id: b2ce12244e36f981d89f6c7cdead63237dd820ea
2018-10-30 12:56:26 -07:00
a35162f1bc Remove net_simple_async (#13320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13320

simple_async has been deprecated via the network override rule for a while,
and we should be able to safely remove it.

This also clears up 2 tech debts:
(1) in rnn executor, rely on the executor override to get the right net.
(2) clearly mark checkExecutorOverride as a potential change to net_type by making it c++ style guide compliant.

Reviewed By: dzhulgakov

Differential Revision: D12840709

fbshipit-source-id: 667702045fa024f5bdc87a9c28ea1786c78432b3
2018-10-30 12:36:38 -07:00
0db505bf27 Made docstrings for Embedding more accurate. (#13310)
Summary:
Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310

Differential Revision: D12840813

Pulled By: SsnL

fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5
2018-10-30 12:25:38 -07:00
264deae5da Improve visual representation of NQL subgraphs (#13143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13143

For primary review (will likely delete to keep commit chain cleaner)

Retain operator input order in DOT string conversion on NQL tool.

Assumptions
* No API to discern input graph node type
* Graph is bipartite
* No generative operators; i.e. operator w/o input but creates output
* Not supporting subgraph

Mocks (from input P60154484)
Old: https://pxl.cl/j4mV (DOT string P60154515)
New: https://pxl.cl/j0wd (DOT string P60154461)

Reviewed By: bwasti

Differential Revision: D10224942

fbshipit-source-id: 8b0ce2f1f9248dfaa89aa01a3fd77e327de16ea4
2018-10-30 12:22:37 -07:00
017b91f861 Optimize channel_shuffle_op on GPU (#13066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13066

Optimize channel_shuffle_op on GPU

Reviewed By: houseroad

Differential Revision: D10639281

fbshipit-source-id: 394b937403e5d4e9df93548bbf87285bffaa55a9
2018-10-30 12:18:27 -07:00
518b0d0600 Fix add out=None to digamma docstring (Fixes #13225) (#13307)
Summary:
Fixes #13225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13307

Differential Revision: D12840231

Pulled By: SsnL

fbshipit-source-id: 2732a2466ac1d2f3fdabfd1eaccddec96e89ba1b
2018-10-30 11:52:35 -07:00
5ba952afcc use topological move in graph fuser (#13271)
Summary:
Turns out that getting rid of the multiple passes in fusion is a little more involved, so leaving it off for another day.

Expect test changes are just things moving around with new orders, but I would appreciate if someone glanced at them for something crazy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13271

Differential Revision: D12832752

Pulled By: michaelsuo

fbshipit-source-id: 55f16c80a97601744a06df2ead45cef7b3a19c08
2018-10-30 11:10:28 -07:00
5b15a501da Refactor & unit test feed predictor
Summary:
1. Refactor DDPG predictor.  Merge the critic predictor with ParametricDQNPredictor since they are the same
2. Fix bug where loss was multiplied by the batch size
3. Create DDPGFeedPredictor which uses the feed predictor output format
4. Add support for gridworld simulation memoization to DDPG.  Also memoize normalization tables.

Reviewed By: kittipatv

Differential Revision: D10161240

fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12
2018-10-30 10:27:47 -07:00
ec754adb14 Kill more weird constructors on Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13190

Reviewed By: ezyang

Differential Revision: D12809293

fbshipit-source-id: 5eb663fe818276d97cf31d1ed1e7f025d2b69851
2018-10-30 10:25:40 -07:00
10de2c1187 CircleCI: fix test timeout by running CPU build and test on different machines (#13284)
Summary:
It seems that we can fix the test timeout issue by running CPU build and test on different machines (I manually ran this patch through the CI 50 times to confirm this). The actual reason of timeout is still unknown, but I suspect it has to do with memory / disk space.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13284

Differential Revision: D12840371

Pulled By: yf225

fbshipit-source-id: af326f0358355602ee458696c3ffb325922e5289
2018-10-30 10:22:57 -07:00
ac64724ed9 Add support for tuple constants (#13086)
Summary:
Depends on #13072

Adds support for tuples as variables instead of just as literals. Before, tuples would give the error `python value of type 'tuple' cannot be used as a value`. This PR adds a flag on `SugaredValue` to determine in a value is a tuple or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13086

Differential Revision: D10846632

Pulled By: driazati

fbshipit-source-id: 7b5d6ae9426ca3dd476fee3f929357d7b180faa7
2018-10-30 09:01:17 -07:00
f06b70a6e9 Fix memory leak during packing in tuples (#13305)
Summary:
Verified on python 3.6 that it fixes #13243
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13305

Differential Revision: D12838764

Pulled By: soumith

fbshipit-source-id: 206a8b22d1d05e5f156f1db1baaa82358f3eaa83
2018-10-30 08:32:26 -07:00
8a888c48da Reimplement as_strided in ATen. (#13185)
Summary:
This moves away from using tensor.set_(...) for as_strided, which went
through TH and was weirdly slow/complicated. The new as_strided has a
new invariant that it will never resize the storage to a larger size
(the previous as_strided allowed that behavior but it seemed weird and
none of our code relied on it.)

This offers a small speedup on as_strided: it went from 1300ns to
1100ns although the benchmarks get a little noisy here.

Also on the changelog is a quick fix to resize_ code to avoid unsigned
underflow. I'll rewrite the resize_ zero dim logic in a future diff, it
doesn't make sense the way it is written right now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13185

Reviewed By: ezyang

Differential Revision: D12809160

Pulled By: zou3519

fbshipit-source-id: 3885df9d863baab2b2f8d8e2f8e2bfe660a49d85
2018-10-30 07:52:50 -07:00
8c2d0c831f Speed up tensor.storage_offset (#13267)
Summary:
This PR special cases tensor.storage_offset to avoid dispatches in the
common case. tensor.storage_offset is important for torch.as_strided
performance, because as_strided(sizes, strides) shares an implementation
with as_strided(sizes, strides, storage_offset) and it might not be the
best if there were two separate implementations (including backward
implementations).

This PR reduces times on a tensor.storage_offset
microbenchmark from 22ns to 2ns (these numbers are pretty stable). For
a torch.as_strided benchmark, this PR reduces numbers from 1042 to
928ns, a 100ns improvement, but this number is noisy and goes up and
down.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13267

Reviewed By: ezyang

Differential Revision: D12829828

Pulled By: zou3519

fbshipit-source-id: df907731e2398ce2baf1c8b1860a561ccc456f78
2018-10-30 07:36:21 -07:00
ee010a2bee Operators that never (re)allocate memory do not need DeviceGuard (#13269)
Summary:
This PR removes DeviceGuard for the following native function tensor reshaping operations:
- broadcast_tensors
- chunk
- expand
- expand_as
- narrow
- reshape
- reshape_as
- select
- slice
- split
- split_with_sizes
- squeeze
- squeeze_
- transpose
- transpose_
- unsqueeze
- unsqueeze_

There are probably more but I'm putting this out for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13269

Reviewed By: ezyang

Differential Revision: D12830317

Pulled By: zou3519

fbshipit-source-id: 466a1bbd835aa708fe72c3c620e07fed3f85661f
2018-10-30 07:13:15 -07:00
47c0d88739 Bring back warning for dtype uninitialized in serialization (#13239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13239

Previous diff missed the if (dtype_initialized) check, duh.

Also, for safety of spamming - using LOG_EVERY_MS if it's available

Reviewed By: kennyhorror

Differential Revision: D12818938

fbshipit-source-id: 76590bd1b28010fb13f5d33423c8eac1395e9f76
2018-10-29 22:09:54 -07:00
bb703b1ff5 Remove defunct ATen/CUDAStream.h,cpp (#13251)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13251

Differential Revision: D12823807

Pulled By: ezyang

fbshipit-source-id: 7fa1ecc8058f3b0dacf5d3a4054f10422832599d
2018-10-29 21:08:10 -07:00
91e87c0395 Renaming size() to numel() - 2/2
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

i-am-not-moving-c2-to-c10

Reviewed By: ezyang

Differential Revision: D12833748

fbshipit-source-id: 98dc2d3abc23c177c2c9e457b81499952d4b690c
2018-10-29 18:59:29 -07:00
c82e8bf988 bump gloo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13286

Differential Revision: D12835150

Pulled By: anderspapitto

fbshipit-source-id: 4e3bbca077447ef0c007568a359f2260229c2a51
2018-10-29 18:56:21 -07:00
4a3baec961 Hub Implementation (#12228)
Summary:
[Edit: after applied colesbury 's suggestions]
* Hub module enable users to share code + pretrained weights through github repos.
Example usage:
```
hub_model = hub.load(
     'ailzhang/vision:hub', # repo_owner/repo_name:branch
     'wrapper1', # entrypoint
      1234, # args for callable [not applicable to resnet18]
      pretrained=True) # kwargs for callable
```
* Protocol on repo owner side: example https://github.com/ailzhang/vision/tree/hub
     * The "published" models should be at least in a branch/tag. It can't be a random commit.
     * Repo owner should have the following field defined in `hubconf.py`
        * function/entrypoint with function signature `def wrapper1(pretrained=False, *args, **kwargs):`
        * `pretrained` allows users to load pretrained weights from repo owner.
        * `args` and `kwargs` are passed to the callable `resnet18`, repo owner should clearly specify their help message in the docstring

```
def wrapper1(pretrained=False, *args, **kwargs):
    """
    pretrained (bool): a recommended kwargs for all entrypoints
    args & kwargs are arguments for the function
    """
    from torchvision.models.resnet import resnet18
    model = resnet18(*args, **kwargs)
    checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
    if pretrained:
        model.load_state_dict(model_zoo.load_url(checkpoint, progress=False))
    return model
```
* Hub_dir
    * `hub_dir` specifies where the intermediate files/folders will be saved. By default this is `~/.torch/hub`.
    * Users can change it by either setting the environment variable `TORCH_HUB_DIR` or calling `hub.set_dir(PATH_TO_HUB_DIR)`.
    * By default, we don't cleanup files after loading so that users can use cache next time.

* Cache logic :
    * We used the cache by default if it exists in `hub_dir`.
    * Users can force a fresh reload by calling `hub.load(..., force_reload=True)`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12228

Differential Revision: D10511470

Pulled By: ailzhang

fbshipit-source-id: 12ac27f01d33653f06b2483655546492f82cce38
2018-10-29 18:43:14 -07:00
955a01562d Removes debug spew in test_jit.py (#13280)
Summary:
Looks like a print() snuck in by accident with a recent PR and it's printing a lot of spew when the tests are run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13280

Differential Revision: D12833449

Pulled By: michaelsuo

fbshipit-source-id: 5b50fd4b03bb73e5ca44cabdc99609c10017ff55
2018-10-29 18:25:30 -07:00
6071389a90 Enable cppcoreguidelines checks in clang-tidy (#12959)
Summary:
Enables most of `cppcoreguidelines-*` checks for clang-tidy. Major fixes included:

- Uninitialized members,
- Use of `const_cast`,
- Use of raw `new`

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12959

Differential Revision: D11349285

Pulled By: goldsborough

fbshipit-source-id: 9e24d643787dfe7ede69f96223c8c0179bd1b2d6
2018-10-29 18:23:35 -07:00
8260441b45 Renaming size() to numel() - 1/2
Summary:
Codemod generated with clangr shard mode, 50 files per diff,
clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp

Reviewed By: ezyang

Differential Revision: D12833710

fbshipit-source-id: aef469b7b6d7715dada593f0f55e5813fbd963ac
2018-10-29 18:01:01 -07:00
fbd497f169 Fix initialization order in MIOpen file (#13264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13264

Simply change the initialization order to make hcc happy. Otherwise will have to add -Wno-error=reorder.

Reviewed By: bddppq

Differential Revision: D12827635

fbshipit-source-id: 6f4cd67209f2aa8ae85cfbdc53df0efb3b3cc473
2018-10-29 16:48:54 -07:00
d8dab6ffa8 Add tensor.to(options) (#13146)
Summary:
ezyang on the template hack
smessmer on SFINAE of the `TensorOptions(Device)`
goldsborough on the C++ API test changes
zdevito on the `jit` codegen changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146

Reviewed By: ezyang

Differential Revision: D12823809

Pulled By: SsnL

fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc
2018-10-29 16:26:06 -07:00
3365d74df9 Fix refcounting in anomaly metadata
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13249

Differential Revision: D12823875

Pulled By: soumith

fbshipit-source-id: a0857a7cc8a4888aff99991fbae6bdd7a49d1ac4
2018-10-29 15:55:08 -07:00
50a8f8531b Updated for for arbitrary command line arg ordering
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13253

Differential Revision: D12829884

Pulled By: soumith

fbshipit-source-id: 9d8abcdf635e2daffce80ddf1e0e418a1e4c337d
2018-10-29 15:52:03 -07:00
sli
9d9e5f8d1e Solve bug of DistributedDataParallel (#13248)
Summary:
Fixed bug [https://github.com/facebookresearch/maskrcnn-benchmark/issues/52](https://github.com/facebookresearch/maskrcnn-benchmark/issues/52)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13248

Reviewed By: pietern

Differential Revision: D12830451

Pulled By: teng-li

fbshipit-source-id: ab33faf3f6f4545f8fe07da7ecbeb2f0a2ea23f0
2018-10-29 15:19:55 -07:00
33b00bdbb8 cwd arg in shell function of run_test set to optional (#13247)
Summary:
Tiny fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13247

Differential Revision: D12830311

Pulled By: soumith

fbshipit-source-id: 405620e3a1de5bfc7e039f9aaf2f7cb7a3bca1b1
2018-10-29 15:17:00 -07:00
7956e9718b Add name for required optimizer parameter. (#13202)
Summary:
Small change -- the benefit is that the docs will show
``<required parameter>`` instead of ``<object object>``
for these required parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13202

Reviewed By: SsnL

Differential Revision: D12826252

Pulled By: jma127

fbshipit-source-id: 5f2c8495e5c56920377e4e012b8711e8f2a6e30e
2018-10-29 15:02:21 -07:00
2e19529bd1 Add HasDeviceOption [nomnigraph] (#13206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13206

Add has device option for checking if a node has a device option set

Reviewed By: bwasti

Differential Revision: D12815365

fbshipit-source-id: 58477df93777f470cfb30cd75f02a659a7017b7c
2018-10-29 14:25:40 -07:00
2cfe439cc7 Turn off tests for Travis-derived Python jobs. (#13252)
Summary:
They appear to timeout 30% of the time when run on CircleCI.

Long term plan is to switch to using some binaries which
are not provided by Travis.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13252

Differential Revision: D12828812

Pulled By: ezyang

fbshipit-source-id: 7189e2a3200ae08c4ece16a27357ff0fd06f3adb
2018-10-29 14:04:57 -07:00
3c78cc6c2b Remove Tensor(const Tensor&, BaseContext*, type)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13204

Reviewed By: ezyang

Differential Revision: D11915764

fbshipit-source-id: baf883b3095bc9d5adf0b942eb874eaa7c1f45e5
2018-10-29 13:57:43 -07:00
5a2b2aa6af Remove calls to CopyFrom that can be sync (#13205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13205

CopyFrom without context argument does the sync copy on the current gpu - exactly what most of the places need.

This diff kills about 60% of CopyFrom usages. Most common pattern is gpu->cpu copy with further FinishDeviceComputation - the latter can be just killed.

Reviewed By: Yangqing

Differential Revision: D11236076

fbshipit-source-id: eb790ca494dfc5d5e3a7d850b45d6f73221bb204
2018-10-29 13:57:42 -07:00
8ad69a80e3 Test scripts only run cases defined in the running script (#13250)
Summary:
1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it.
2. Adds an assertion in `load_tests` that each script only runs cases defined in itself.

cc yf225 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250

Differential Revision: D12823734

Pulled By: SsnL

fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b
2018-10-29 13:57:40 -07:00
db0b5c7ab7 ArgumentStash for int64_t arguments (#12939)
Summary:
Closes https://github.com/pytorch/pytorch/issues/12906. https://github.com/pytorch/pytorch/issues/12580 is still open because the schema is marked as `traceable=false` in the arg parser constructor, I think.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12939

Differential Revision: D10492031

Pulled By: jamesr66a

fbshipit-source-id: ca5376de3997b5fb62b493e2e6a9bb0d6c3b9687
2018-10-29 13:55:24 -07:00
aabdcaa8fa No tmp install (#13215)
Summary:
This is a small patch on top of https://github.com/pytorch/pytorch/pull/13150 - please review only the top commit here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13215

Differential Revision: D12827675

Pulled By: anderspapitto

fbshipit-source-id: adb01d72a827b6dbffc25f7f99fdc3129906b1ca
2018-10-29 12:59:44 -07:00
a69af69ffc remove vestigial logic related to onnxbot tracking PRs (#13260)
Summary:
onnx always has a million branches so this is noisy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13260

Differential Revision: D12827640

Pulled By: anderspapitto

fbshipit-source-id: 55eced08970cc0a888bd8f7bc8670eea48deb288
2018-10-29 12:49:11 -07:00
380d2dfb27 absorb nccl (#13150)
Summary:
always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150

Differential Revision: D12815674

Pulled By: anderspapitto

fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa
2018-10-29 12:04:32 -07:00
1c8a823b3b More robust ABI compatibility check for C++ extensions (#13092)
Summary:
This PR makes the ABI compatibility check for C++ extensions more robust by resolving the real path of the compiler binary, such that e.g. `"c++"` is resolved to the path of g++. This more robust than assuming that `c++ --version` will contain the word "gcc".

CC jcjohnson

Closes #10114

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13092

Differential Revision: D12810448

Pulled By: goldsborough

fbshipit-source-id: 6ac460e24496c0d8933b410401702363870b7568
2018-10-29 11:56:02 -07:00
48b98d2f7f Expose nn:: namespace to python (#13132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13132

Expose more of the C++ API to python

Reviewed By: duc0

Differential Revision: D10855086

fbshipit-source-id: 98cc89bc72ef91ed1c59c1a19688e047765cf90b
2018-10-29 11:36:51 -07:00
62b27d27b7 Re-enable experimental ops build (#12821)
Summary:
The experimental ops for the c10 dispatcher have accidentally been disabled in the oss build when the directory changed from `c10` to `experimental/c10`. This PR re-enables them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12821

Differential Revision: D10446779

Pulled By: smessmer

fbshipit-source-id: ac58cd1ba1281370e62169ec26052d0962225375
2018-10-29 11:28:54 -07:00
b818d31a3e use TypeMeta instead of ScalarType in TensorOptions (#13172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13172

reland D10419671

Reviewed By: ezyang

Differential Revision: D12143282

fbshipit-source-id: 43504d06a901af30130ebe97fb0b33def45cdc9a
2018-10-29 11:15:37 -07:00
dcbca53e58 Renaming size() to numel() - 1/6
Summary: Codemod generated with clangr shard mode, 50 files per diff

Reviewed By: li-roy

Differential Revision: D10866373

fbshipit-source-id: 589194164d4fea93b74d83fa7fc4c59558c41f4a
2018-10-29 11:11:19 -07:00
b1cf3ad1c2 More Declarations.cwrap functions moved to native, mainly LAPACK, sim… (#13194)
Summary:
…ple math.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13194

Reviewed By: ezyang

Differential Revision: D12811972

Pulled By: gchanan

fbshipit-source-id: 461beb5efa2b6aba0808d2419eb7eb3153d18d15
2018-10-29 11:03:04 -07:00
dbab9b73b6 seperate mkl, mklml, and mkldnn (#12170)
Summary:
1. Remove avx2 support in mkldnn
2. Seperate mkl, mklml, and mkldnn
3. Fix convfusion test case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170

Reviewed By: yinghai

Differential Revision: D10207126

Pulled By: orionr

fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51
2018-10-29 10:52:55 -07:00
bb96b6635c Support upsample (#13152)
Summary:
This will enable the updated attribute and input format of operator upsample.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13152

Reviewed By: houseroad

Differential Revision: D12812491

Pulled By: zrphercule

fbshipit-source-id: d5db200365f1ab2bd1f052667795841d7ee6beb3
2018-10-29 10:40:35 -07:00
5be20f92ca Towards a quieter CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13210

Differential Revision: D12824924

Pulled By: anderspapitto

fbshipit-source-id: 76dc9d43a1b5c57eca1051ce6c92200b5fbda7ae
2018-10-29 10:35:40 -07:00
1032cf9fe4 Support for zero-length sequences in RNN executor (#13244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13244

Adding support for zero-length sequences into RNN executor

Reviewed By: dzhulgakov

Differential Revision: D10848803

fbshipit-source-id: f2994ee28c09fb30146243bb300ae7205024dd17
2018-10-29 10:32:42 -07:00
52b6460d3a Fix bug in some reductions that use global memory (#13211)
Summary:
Reductions that used global memory, but didn't reduce
across threads in a warp did not have enough global memory
allocated for their intermediate results. These reductions
that were non-contiguous in their reduced dimension and
large enough to benefit from reducing across blocks in a
grid.

Fixes #13209
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13211

Differential Revision: D12815772

Pulled By: colesbury

fbshipit-source-id: f78be2cb302e7567a76097ca3ba1e7b801c0cdad
2018-10-29 10:23:30 -07:00
9e6a695116 Add string equality test, string concat (#12992)
Summary:
Adding string equality comparison, and concat. Both are used in the standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12992

Differential Revision: D10513681

Pulled By: eellison

fbshipit-source-id: 1f845ef50be7850fdd3366951b20dc2a805c21fd
2018-10-29 10:13:21 -07:00
74ac86d2fe Show demangled names on nvtx ranges (#13154)
Summary:
AsyncDBConnMarkedDownDBException As we discussed, this changes the backward pass profiler annotations such that 1. they're demangled and 2. if they came from a custom Python-side autograd function, they show a unique name based on the name of that Python-side function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13154

Differential Revision: D12808952

Pulled By: colesbury

fbshipit-source-id: 4119dbaed7714b87c440a81d3a1835c5b24c7e68
2018-10-29 08:45:54 -07:00
277b637811 Delete default constructor from CUDAStream. (#13021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13021

Let's make nullptr CUDAStream an illegal state.

Reviewed By: gchanan

Differential Revision: D10520421

fbshipit-source-id: 723c1f5130b2c92ec97411a958707fac4a90173f
2018-10-29 08:27:24 -07:00
1a4473bbd7 Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125

Previously, it returned a vector of THCStream*, which we eventually turned
into CUDAStream.  No need to spatter the conversion code everywhere: just
do it correctly to begin with.  An important side effect of doing it this
way is that we no longer pass nullptr to CUDAStream; instead, we create
the default stream.  I will rely on this in a later patch.

Reviewed By: gchanan

Differential Revision: D10853224

fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a
2018-10-29 08:27:23 -07:00
175f248310 Reduce sizes in TestUncoalescedSparse.test_to_sparse (#13236)
Summary:
The old test took 2min to run.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

See #13233
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13236

Differential Revision: D12823474

Pulled By: ezyang

fbshipit-source-id: c800492a96e41a4cd18d41901f411d9d4e978613
2018-10-29 08:01:58 -07:00
71113c6b9e Respect kwarg-only of native functions moved from Declarations.cwrap.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13237

Reviewed By: ezyang

Differential Revision: D12818917

Pulled By: gchanan

fbshipit-source-id: 0ff55ccac3459edd3b28068a0378e9dae085eda0
2018-10-29 07:48:48 -07:00
4276fe7867 Support for saving exceptions in async CPU ops (#12904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12904

Enabling support for saving exceptions in async parts of CPU ops via
event().SaveException(). The error contract for CPU ops becomes:
 - return false in sync part -> net->Run() returns false
 - throw in sync part -> net->Run() rethrows the same exception
 - SetFinished("error msg") in async part -> net->Run() returns false
 - event().SetFinishedWithException() in async part -> net->Run() rethrows the same
   exception

Reviewed By: andrewwdye

Differential Revision: D10479130

fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0
2018-10-29 04:57:40 -07:00
4fe8ca74af Test if GCC 7 fixes timeout problem. (#13230)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13230

Differential Revision: D12818863

Pulled By: ezyang

fbshipit-source-id: 371337ca4b9d8f8e71eb78d6a53085e1c3619631
2018-10-28 20:53:07 -07:00
34799faccd Fix move constructor on c10d::CUDAEvent (#13183)
Summary:
Previously, the move constructor performed a swap
between the item being moved in, and the uninitialized
garbage from the object itself.

I didn't bother adding a test because I shortly intend
to kill this class entirely.  But the fix is so easy that
I wanted to put it in in case I don't get around to doing
this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13183

Reviewed By: pietern

Differential Revision: D12809062

Pulled By: ezyang

fbshipit-source-id: 0d94bb9796fb7d30621256bfb401a4f89ba8ddc8
2018-10-28 17:47:12 -07:00
1fe8278559 Batched Inverse (#9949)
Summary:
Complete billing of changes:

Related to Batch Inverse:
- [x] Add batched inverse (CPU)
- [x] Add batched inverse (CUDA)
- [x] Modify autograd entry
- [x] Add tests
  - [x] test_autograd
  - [x] test_cuda
  - [x] test_torch
- [x] Modify docs
- [x] Remove `_batch_inverse` in `MultivariateNormal`.
- [x] Allow batch matrices as inputs for negative powers in `matrix_power`

Miscellaneous modifications:
- [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
- [x] Add a RAII structure for MAGMA queue management.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949

Differential Revision: D10559089

Pulled By: zou3519

fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
2018-10-27 23:42:46 -07:00
4d62eef505 Add Future to IValue (#12976)
Summary:
Future now is an IValue. prim::Wait now is replaced by aten::wait

This PR is built on top of #12925
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976

Differential Revision: D10861483

Pulled By: highker

fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
2018-10-27 10:00:35 -07:00
0f261ee359 Fix performance regresion introduced in D10524381 (#13199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13199

D10524381 removed inclusion of int8_simd.h in Caffe2 Int8 operators, and although the resuling code still compiles and works, it is up to 50% end-to-end slower (no SIMD!) on some models

Reviewed By: bertmaher

Differential Revision: D12813095

fbshipit-source-id: 03a713a4c070c0ad1e79e71e91d09eaddc0751eb
2018-10-27 08:16:49 -07:00
df8c5a3572 Refactoring MIOpen activation ops (#13187)
Summary:
This pull request contains changes for:
1. Adding a generalized MIOpen activation class to be used by activation operators
2. Refactoring MIOpen ReLU op to use the new class
3. Adding ELU, Tanh and Sigmoid MIOpen ops

Differential Revision: D12810112

Pulled By: bddppq

fbshipit-source-id: 9519b3a0cd733b906bcba5d8948be089029c43ac
2018-10-27 00:22:54 -07:00
f8864f0505 Revert "Move batch_norm to ATen/native, speed up (#12368)" (#13191)
Summary:
Revert #12368 since it's causing onnx related test cases failing.
https://github.com/pytorch/pytorch/pull/12368

SsnL The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191

Reviewed By: BIT-silence

Differential Revision: D12810778

Pulled By: houseroad

fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
2018-10-26 23:05:50 -07:00
bc352ace7c dense.to_sparse() re: #8853 (#12171)
Summary:
Here is my stab at ```dense.to_sparse```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12171

Differential Revision: D10859078

Pulled By: weiyangfb

fbshipit-source-id: 5df72f72ba4f8f10e283402ff7731fd535682664
2018-10-26 21:48:52 -07:00
5182fdad0b Compute the offset to make sure the order in InlineContainer test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13198

Reviewed By: bddppq

Differential Revision: D12812909

Pulled By: houseroad

fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59
2018-10-26 21:32:25 -07:00
7a6e0bd77e Skip ROCm tests that fail as per #12824 (#13181)
Summary:
For attention: bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13181

Differential Revision: D12811207

Pulled By: bddppq

fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3
2018-10-26 21:06:20 -07:00
723f40d94e video model test workflow on CPU (#13203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13203

Minor changes in the test workflow to run the model on CPUs

Reviewed By: stephenyan1231

Differential Revision: D9925797

fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32
2018-10-26 20:48:18 -07:00
dae7616078 Shard all of tests based on how many tests exist. (#13160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160

Reduces pytorch_core build from 2 hours to 30 minutes

Reviewed By: soumith, dzhulgakov

Differential Revision: D10524261

fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
2018-10-26 18:20:34 -07:00
7637b7c966 Opitmize LayerNormOp (#13173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13173

Opitmize LayerNormOp

Reviewed By: houseroad

Differential Revision: D12398163

fbshipit-source-id: 6b76bc4bd9f34e623f8e385dd07d4ce99490badf
2018-10-26 17:00:18 -07:00
537d671829 Renaming size() to numel() - 4/6
Summary: Codemod generated with clangr shard mode, 50 files per diff

Reviewed By: li-roy

Differential Revision: D10866391

fbshipit-source-id: 3badc4e86edaac376918fca8d09dbfa396ac3a2c
2018-10-26 16:47:36 -07:00
3ca272cf5a Topologically-safe node moves (#13026)
Summary:
Add new methods to move a node before/after another node while preserving data data dependencies.

Any suggestions for a pithier name for the methods would be appreciated 😃
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13026

Differential Revision: D10854574

Pulled By: QueryConnectionException

fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e
2018-10-26 16:29:03 -07:00
620ece2668 Simplify thread pool creation logic (#13114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13114

Using one thread pool creator for all device types

Reviewed By: manojkris, wesolwsk

Differential Revision: D10851533

fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157
2018-10-26 16:02:08 -07:00
63ce3fbde8 Created a transformer to convertr caffe2 NetDef into ONNX models.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13167

Reviewed By: abadams

Differential Revision: D11296189

fbshipit-source-id: 7e49c7a78d26f4af39d50b40f70372272debb34a
2018-10-26 15:57:53 -07:00
9e6bb605f6 Native wrappers for many Declarations.cwrap entries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13003

Differential Revision: D10515654

Pulled By: gchanan

fbshipit-source-id: c3f2809fdb7daeea2209ef1bcdea60266dc4854d
2018-10-26 15:55:15 -07:00
80f766e5cd Create FAQ (#13129)
Summary:
Creates a FAQ. https://github.com/pytorch/tutorials/pull/345 now just links to this page.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13129

Differential Revision: D10854264

Pulled By: goldsborough

fbshipit-source-id: 6e57574ffa61409d4d9d1750aa618893b897ad41
2018-10-26 15:44:51 -07:00
eea2ee6d29 Renaming size() to numel() - 1/17
Summary: Codemod generated with clangr shard mode, 25 files per diff

Reviewed By: li-roy

Differential Revision: D10866237

fbshipit-source-id: 020fcfdf52083430c5b674eda8e07ad3adfcc838
2018-10-26 15:36:59 -07:00
06392bd6a3 Renaming size() to numel() - 3/6
Summary: Codemod generated with clangr shard mode, 50 files per diff

Reviewed By: li-roy

Differential Revision: D10866389

fbshipit-source-id: 65489f7b3439ff9a62a5a09b77112f0f4931c609
2018-10-26 15:30:11 -07:00
883da952be Hipify caffe2/core (#13148)
Summary:
petrex ashishfarmer iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148

Reviewed By: xw285cornell

Differential Revision: D10862276

Pulled By: bddppq

fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4
2018-10-26 15:27:32 -07:00
1bec8f773b Move ConstantPadNd into ATen (#10885)
Summary:
Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885

Differential Revision: D9643786

Pulled By: SsnL

fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac
2018-10-26 15:25:27 -07:00
e13e86724e Renaming size() to numel() - 2/6
Summary: Codemod generated with clangr shard mode, 50 files per diff

Reviewed By: li-roy

Differential Revision: D10866381

fbshipit-source-id: 2fabf78dfea262e0c789cf24cd3ca6191852983b
2018-10-26 15:21:50 -07:00
b090a54a38 Enable MKLDNN in PyTorch in fbcode (#13165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13165

Also mark conflicting functions `static` to avoid duplicate symbol errors

Reviewed By: orionr

Differential Revision: D10998641

fbshipit-source-id: b93aab99b91daa1e082cc778abb28bf9d33c21d5
2018-10-26 14:52:19 -07:00
e6ce9f303f Check that QNNPACK directory exists in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13174

Differential Revision: D12808599

Pulled By: colesbury

fbshipit-source-id: 2548a024043f32ee570378dfead8880b00608478
2018-10-26 14:37:11 -07:00
f282fa1afe Comment out LOG(ERROR) for legacy no-dtyle serialization behavior
Reviewed By: wylqc

Differential Revision: D12569279

fbshipit-source-id: 46def8ca163bcf9070a1179166fd8970e07ee229
2018-10-26 13:18:27 -07:00
0687f58441 Fix broken master (#13171)
Summary:
Fixes colliding changes in #12766 and #12368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13171

Differential Revision: D12109430

Pulled By: li-roy

fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237
2018-10-26 12:30:55 -07:00
c21471c77f Sampler serialization and deserialization (#12999)
Summary:
Implements serialization and deserialization for samplers in the C++ frontend dataloader.

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999

Differential Revision: D10859676

Pulled By: goldsborough

fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d
2018-10-26 12:20:51 -07:00
9f9f06c937 Improve inline container and add some test (#12993)
Summary:
Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning.

Added gtest to cover reader and writer's code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993

Reviewed By: yinghai

Differential Revision: D10860086

Pulled By: houseroad

fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29
2018-10-26 12:06:47 -07:00
7ca995c815 Add optional default type annotation to support JIT None default value (#13161)
Summary:
As titled, this PR is a part of tasks to unblock exporting the standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13161

Differential Revision: D10866927

Pulled By: wanchaol

fbshipit-source-id: 50038dbe6840b097b98cbed9d46a189a64e82302
2018-10-26 11:38:50 -07:00
8797bb1d30 Revert D10419671: use TypeMeta instead of ScalarType in TensorOptions
Differential Revision:
D10419671

Original commit changeset: 9cc8c5982fde

fbshipit-source-id: c870ecdd3730cf695007ebb110d362996da05e5d
2018-10-26 11:09:58 -07:00
ce0d3e9b35 Bind inplace and _out variants into JIT (#13093)
Summary:
This commit is a minimial initial pass at adding inplace and _out variants to the JIT.
It changes gen_jit_dispatch.py to add bindings for these operators, and it also
supplements the FunctionSchema with alias information for these operators and for
viewing operators.

Tests are very minimal and will need to be improved in future commits.

Notes:

* Custom operator tests needed to be changed since _out variants add overloads, which
  the custom operator pipeline does not handle when called from python. This commit
  registers special test ops in the _test namespace for this purpose.
* Extends the schema parser to parse alias annotations more robustly.
* Extends FunctionSchema with `writes()` a set of alias set names that the op will write to,
  and `annotatedType()` which will return AnnotatedType objects which contain the alias_set
  information that was parsed from the schema.
* Disables all optimizations in graph executor when a mutable operator is found. This
  is something that will be improved in the future but is necessary for correctness now.
* Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the
  aten ops.
* Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types
  with their alias_set. These types only appear in schema when you call annotatedType
  and are erased from types in normal use.
* Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained
  within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new
  version of the same type, replacing the contained types with new_types. This simplifies
  a lot of logic for recursively cleaning up types.
* Refactor List[T] into a common part that is shared with Annotated[T] and can be shared
  with Optional[T] and Future[T] when they are merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093

Differential Revision: D10848176

Pulled By: zdevito

fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d
2018-10-26 10:37:20 -07:00
a70573b589 use TypeMeta instead of ScalarType in TensorOptions (#12768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12768

Note: DefaultTensorOptions no longer fits in 64-bits.

I kept functions that take ScalarType as input to minimize changes for now.

Reviewed By: ezyang

Differential Revision: D10419671

fbshipit-source-id: 9cc8c5982fde9ff243e03d55c0c52c2aa2c7efd8
2018-10-26 09:27:12 -07:00
2f1542839f reduce Device to 32bits (#12767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12767

In preparation of using TypeMeta in TensorOptions. We need TensorOptions to fit in 128-bits, this isn't possible if both TypeMeta and Device are 64-bit.

Reviewed By: ezyang

Differential Revision: D10416051

fbshipit-source-id: 23c75db14650f7f3045b1298977f61a0690a8534
2018-10-26 09:27:11 -07:00
a7ba4cb383 Change return type of Tensor::dtype() from ScalarType to TypeMeta (#12766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12766

In preparation of using TypeMeta in TensorOptions.

Reviewed By: ezyang

Differential Revision: D10232118

fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63
2018-10-26 09:27:09 -07:00
46ef2b2898 Ignore flake8 warnings in test_c10d.py (#13159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13159

These lint violations are intentional.

Reviewed By: ezyang

Differential Revision: D10862131

fbshipit-source-id: 70ad4b0a360cb12d050805fd7b1080dfe4566e86
2018-10-26 09:17:57 -07:00
435228508e Remove test_distributed_trap.py (#13151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13151

No longer needed.

Reviewed By: ezyang

Differential Revision: D10862319

fbshipit-source-id: 01405d7cf2553f59ff7d3dce33755a5fdd8a8f05
2018-10-26 09:15:27 -07:00
929bffe020 Turn some th_ prefixes into _th_ prefixes for conformity. (#13128)
Summary:
This is the same as https://github.com/pytorch/pytorch/pull/12889 with the addmm changes stripped out, since that appears to cause onnx broadcasting issues I don't understand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13128

Reviewed By: ezyang

Differential Revision: D10853911

Pulled By: gchanan

fbshipit-source-id: 08ec8629331972f0c332ccd036980fd9c87562b0
2018-10-26 08:08:09 -07:00
c95fa4b904 fix dtype uninitialized tensor serialization
Summary:
See D10380678 for the discussion.

Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O.

For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test.

Reviewed By: kennyhorror

Differential Revision: D10866562

fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408
2018-10-26 01:30:47 -07:00
8e1e3ba7b8 Hide c10::optional and nullopt in torch namespace (#12927)
Summary:
Does

```cpp
namespace torch {
using c10::optional;
using c10::nullopt;
}
```

So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing).

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12927

Differential Revision: D10510630

Pulled By: goldsborough

fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4
2018-10-26 00:08:04 -07:00
f72f91610f Move stream to thread local (#13080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13080

This is the first step to untangle this logic:
- moves stream id to thread local mechanically
- relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs

Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom.

The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much.

Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse.

Reviewed By: ezyang

Differential Revision: D10525134

fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e
2018-10-26 00:04:32 -07:00
dc211c7de4 Move batch_norm to ATen/native, speed up (#12368)
Summary:
- Speed up the case of #12006 in the forward
- The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case).
- More extensive benchmarking shows not so great performance compared
  to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024.
- We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to
  maintain reasonable precision.

Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368

Differential Revision: D10559696

Pulled By: SsnL

fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419
2018-10-25 23:41:10 -07:00
5e73b828bd CMake integration for Int8 ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13145

Differential Revision: D10860849

Pulled By: Maratyszcza

fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5
2018-10-25 22:25:10 -07:00
4870b1b68f Speed up tensor.resize_(sizes) when tensor has correct size (#12824)
Summary:
While using gbenchmark, I found `tensor.resize_({0})` would take 300ns
if tensor already has the correct size. This is important for
`at::empty({0})` perf because `at::empty` always calls `resize_`, which
in turn is a important for JIT perf: the fusion compiler creates empty
tensors and then `resize_`s them to computed sizes. Most of the 300ns is
due to DeviceGuard (200ns)

Summary of findings:
- `at::empty({0}, cuda)`: 851ns
- `empty_tensor.resize({0})`: 308ns
- `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this
  next because it impacts `resize_` perf).
- vdispatch overhead (`tensor.resize_()` vs
  `at::native::resize__cuda(tensor)`): ~10ns

This PR rips out the TH `resize_` implementation and adds it to ATen
with the following modifications:
- DeviceGuard used only after the same-size check.
- Same-size check rewritten for simplicity. The new check doesn't
affect perf.
- empty_cpu / empty_cuda avoid the dispatch overhead to
tensor.resize_.

Timing with this PR:
- `at::empty({0}, cuda)`: 363ns
- `empty_tensor.resize_({0})`: 17ns

Future:
- Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes`
- Should tell resize_as_ to use the new resize_ implementation...
(because resize_as_ is in TH, it is calling the old TH resize_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12824

Differential Revision: D10449209

Pulled By: zou3519

fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6
2018-10-25 21:09:41 -07:00
60c0508d96 Use CAFFE_ENFORCE instead of CHECK in caffe2 rnn executor (#13144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13144

The intention of this diff is to prevent prevent predictor service from crashing by the "Check failed: timestep >= 0 && timestep < _T" error, as a bandage, before D10848803 can be landed (assuming D10848803 replaces the CHECKs into CAFFE_ENFORCEs, too).

Reviewed By: ilia-cher

Differential Revision: D10857963

fbshipit-source-id: bb56ad83aa867a2d25953aa7ffd84b078f8bf84a
2018-10-25 20:58:13 -07:00
5cbb33f939 Disable upsample optest (#13135)
Summary:
Temporarily disable upsample tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13135

Reviewed By: bddppq

Differential Revision: D10859926

Pulled By: houseroad

fbshipit-source-id: 9eb068198d43ba0939d81a9e41eb6f24ff19cb6d
2018-10-25 20:37:09 -07:00
efab8e8fdf Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches (#12841)
Summary:
`tensor.get_device()` went through two dispatches: once to the native
function
`get_device()`, and another when `get_device` calls `_th_get_device()`.
This PR avoids the dispatch by directly implementing the `get_device`
function
as a method on Tensor.

Future Work:
- Investigate caching Device on TensorImpl. This will probably bring the
  tensor.get_device down to 2ns, but I'm not sure it's worth it.

before:
```
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                             8 ns          8 ns   89407911
BM_TensorIsCuda                          24 ns         24 ns   29313017
BM_TensorIsSparse                        27 ns         27 ns   26083160
BM_TensorTypeIsCuda                      11 ns         11 ns   65128120
BM_TensorNumel                           11 ns         11 ns   68314492
BM_TensorGetDevice                       71 ns         71 ns    9633125
BM_DeviceGuardCtor                      173 ns        173 ns    4067173
BM_DeviceGuard                          232 ns        232 ns    3009690
```

after:
```
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BM_TensorTypeId                           0 ns          0 ns 1000000000
BM_TensorType                            10 ns         10 ns   69803872
BM_TensorIsCuda                           2 ns          2 ns  321626683
BM_TensorIsSparse                         6 ns          6 ns  177045382
BM_TensorNumel                           12 ns         12 ns   58770533
BM_TensorGetDevice                        4 ns          4 ns  128113396
BM_DeviceGuardCtor                       52 ns         52 ns   14997278
BM_DeviceGuard                          158 ns        158 ns    5767248

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12841

Differential Revision: D10489353

Pulled By: zou3519

fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e
2018-10-25 19:57:52 -07:00
b827a40880 Implement bucket-based attention pooling for IdScoreList features (#13004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13004

Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket.

We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids.

Reviewed By: chonglinsun

Differential Revision: D10413186

fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130
2018-10-25 18:04:08 -07:00
3ac9a9577c Remove optional from caffe2 utils (#12965)
Summary:
Now we have everything from c10::optional, we can delete this and keep a single version in c10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12965

Differential Revision: D10504042

Pulled By: wanchaol

fbshipit-source-id: c0ec3892e92968cca264ae8924c19111674631ba
2018-10-25 17:29:04 -07:00
99d24aefc3 Move a number of ATen checks out of Dependencies.cmake (#12990)
Summary:
cc Yangqing mingzhe09088 anderspapitto mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12990

Differential Revision: D10862301

Pulled By: orionr

fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b
2018-10-25 17:26:25 -07:00
852d6e8b65 Fix python2 and python 3 compatibility found by lint. (#13140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13140

This is an example about the benefit of proper facebook linter. The old code
was not python 2.x (actually, pre-python 3.3) compatible. Note that FileExistsError
is added in Python 3.3:

https://stackoverflow.com/questions/20790580/python-specifically-handle-file-exists-exception

Reviewed By: mingzhe09088

Differential Revision: D10858804

fbshipit-source-id: a4c995aef9f720cb8b0ce463f0a51db667fc42f2
2018-10-25 17:20:11 -07:00
defe96eb6c add topology index check in Graph::lint() (#13037)
Summary:
just a sanity check to make sure everything is in order
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13037

Differential Revision: D10854563

Pulled By: michaelsuo

fbshipit-source-id: 409303c4cbf058b75e24bf2213b49e9d79cb862e
2018-10-25 17:02:38 -07:00
526460fc8b Use default timeout of 30 minutes for gloo backend (#13056)
Summary:
The existing default timeout was set at 10 seconds, which is too low
for asynchronous tasks that depend on a barrier to resynchronize.
Having a single timeout for all operations is not ideal and this will
be addressed in future commits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13056

Reviewed By: teng-li

Differential Revision: D10558746

Pulled By: pietern

fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb
2018-10-25 16:35:53 -07:00
4e1c64caee Add c10::optional to type syntax (#12582)
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes #9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.

Follow up:

remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582

Differential Revision: D10417423

Pulled By: wanchaol

fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
2018-10-25 16:08:29 -07:00
569a29b81a Make chunk size configurable in SaveOp (#12949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12949

Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp.

Reviewed By: mraway, xsh6528

Differential Revision: D10454037

fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104
2018-10-25 15:47:34 -07:00
f6ccb6a0f9 bring caffe2::Tensor API closer to aten/pytorch (#13134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13134

For tensor, we plan to do the following renaming:
```
* t.ndim() → t.dim()
* t.size() → t.numel()
* dims() → t.sizes()
* t.meta() → t.dtype()
* t.dim(d) → t.size(d)
```
This diff adds new APIs in caffe2::Tensor so we can start codemod,
we'll remove old API after the codemod

Reviewed By: ezyang

Differential Revision: D10856028

fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7
2018-10-25 15:45:09 -07:00
49046239f2 Change explicit usages of at::optional to c10::optional (#13082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082

Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.

Reviewed By: ezyang, Yangqing

Differential Revision: D10844117

fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
2018-10-25 15:17:53 -07:00
be99eff75a Back out "Revert D10494123: [c10] Remove at::Optional" (#12991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12991

Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage

Reviewed By: ezyang, Yangqing

Differential Revision: D10511254

fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202
2018-10-25 15:17:51 -07:00
c47f680086 arc lint torch/utils (#13141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141

This is an example diff to show what lint rules are being applied.

Reviewed By: mingzhe09088

Differential Revision: D10858478

fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
2018-10-25 14:59:03 -07:00
4f94d82c7f clang-format on c10d and THD (#13138)
Summary:
clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13138

Differential Revision: D10857742

Pulled By: teng-li

fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52
2018-10-25 14:16:47 -07:00
c6defa0847 Add randn in onnx symbolic (#12880)
Summary:
In this pr we added operator randn in onnx symbolic. Also, related tests are added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12880

Reviewed By: houseroad

Differential Revision: D10501788

Pulled By: zrphercule

fbshipit-source-id: ba8bb00ca848c4b95decabf638a1bc13fe11d03e
2018-10-25 14:11:23 -07:00
979560c9fc Include c10 namespace into caffe2 and at namespaces. (#12950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950

For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten.
When we move classes from at/caffe2 to c10, this
 1. allow keeping backwards compatibility with third paty code we can't control
 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces.

Reviewed By: ezyang

Differential Revision: D10496244

fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e
2018-10-25 14:08:47 -07:00
d6fe812187 Fix TensorList ambiguity (#13024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13024

There's a TensorList type in ivalue.h and one in ScalarType.h, and they are different.
This diff moves IValue types into an ivalue namespace so we can merge the namespaces without conflicts.

Reviewed By: ezyang

Differential Revision: D10518929

fbshipit-source-id: cb760b6804a399880d2bff3acf9a3422d99fc0b8
2018-10-25 14:08:45 -07:00
14ea4bf0d1 Make 7 nn modules into weak modules (#12966)
Summary:
Depends on #12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1))

* Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph
* Adds `torch._jit_internal.weak_module` tags to modules that already work
  * `Sigmoid`
  * `Tanh`
  * `Hardshrink`
  * `PReLU`
  * `Softsign`
  * `Tanhshrink`
  * `PairwiseDistance`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12966

Differential Revision: D10559557

Pulled By: driazati

fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0
2018-10-25 13:59:34 -07:00
e07e63f0b3 Absorb shm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13088

Differential Revision: D10856067

Pulled By: anderspapitto

fbshipit-source-id: cfbf0f6cad3953e1ee1c55482c00a3db9f140594
2018-10-25 13:55:23 -07:00
175e553974 Do a better job of checking registered names (#13016)
Summary:
We currently don't check names in `register_module` and `register_parameter` as thoroughly as we do in Python. This PR fixes this.

Python checks are e.g. in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L108

ezyang ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13016

Differential Revision: D10853800

Pulled By: goldsborough

fbshipit-source-id: 765357875e90a5046e72351a7a47a86511633ab6
2018-10-25 13:52:08 -07:00
c91d982691 Improve expand error message by including complete sizes rather than … (#13124)
Summary:
…size at dimension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13124

Reviewed By: ezyang

Differential Revision: D10853167

Pulled By: gchanan

fbshipit-source-id: 76eeb922304bf19243d9bc52da87f2be8d1700ae
2018-10-25 13:37:25 -07:00
9cb4bce847 Open-source Caffe2 Int8 ops (#13065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13065

- Open-source Caffe2 Int8 (quantized) operators

Reviewed By: Yangqing

Differential Revision: D10524381

fbshipit-source-id: 6daa153dc247572900c91e37262d033c368b382d
2018-10-25 12:43:00 -07:00
faa354e102 Commentary about size constraints on TensorImpl. (#13126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13126

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D10454455

Pulled By: ezyang

fbshipit-source-id: 7018a41b94e316305751f2f8ad2c2d049799f5d4
2018-10-25 12:24:49 -07:00
cb15c7615a Documentation on TensorImpl.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12713

Reviewed By: li-roy, dzhulgakov

Differential Revision: D10404407

fbshipit-source-id: cbc6be2172af068c3fc96e1f6da0b04b6f29ad4b
2018-10-25 12:24:48 -07:00
ae44627661 Rm test_jit.cpp (#12988)
Summary:
Removes test_jit.cpp, which was supposed to have been deleted in https://github.com/pytorch/pytorch/pull/12030

I had to move zou3519's dynamic DAG tests into `test/cpp/jit/tests.h` too. No other changes to `test_jit.cpp` seem to have happened in the meantime.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12988

Differential Revision: D10854320

Pulled By: goldsborough

fbshipit-source-id: 7ab533e6e494e34a16ce39bbe62b1150e48fcb58
2018-10-25 12:18:15 -07:00
314d95a5f2 Renaming dims() to sizes() (caffe2/caffe2) - 3/4 (#13096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13096

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10842875

fbshipit-source-id: 1784859735ed4d1bd5ccd7ca56e289498374a68f
2018-10-25 12:14:21 -07:00
557db18c85 Enable MIOpen properly (#13048)
Summary:
* Disable MIOpen convolution on double tensors
* MIOpen: set group count in convolution descriptor
* MIOpen: Honor Max Dim (ROCm 222)
* MIOpen: Batchnorm - Allow half/half and half/float, disallow double
* Limit MIOpen batchnorm to same-precision
* Fix maxdim check. (ROCm 246)
* Fix reversed logic in DISABLE_MIOPEN (ROCm 253)
* Export LANG/LC_ALL also for the test step.
* Make tensors contiguous before calling MIOpen batch norm
* Actually pass dilation to MIOpen.
* Do not use MIOpen if there is dilation and the group size is > 1. - This is officially not supported currently.
* Fixes for miopenforward bias call
* Modified init conv descriptor param values and used same value for dilation
* MIOpen: disable transposed convolutions

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13048

Differential Revision: D10785250

Pulled By: bddppq

fbshipit-source-id: f9d9797de644652280d59308e5ea5cc07d177fd4
2018-10-25 11:32:49 -07:00
ab40eff5dd caffe2: UpsampleBilinear CUDA implementation (#12843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843

This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp.

The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation.

Reviewed By: houseroad

Differential Revision: D10453776

fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec
2018-10-25 11:10:04 -07:00
796181d762 Fix UB in CPU_tensor_apply (#13121)
Summary:
std::memcpy has UB when either of src or dest are NULL, even if length
is 0. This can and does happen when the input tensors are scalar tensors.

This triggered UBSAN on #12824 but it is strange that it has not
been triggered before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13121

Differential Revision: D10853113

Pulled By: zou3519

fbshipit-source-id: c4b4ad5e41de6f73dc755e0c25bc9947576a742d
2018-10-25 10:58:06 -07:00
eac3e7ab7c improve constants error message (#13072)
Summary:
Adds the attribute name to the error message and fixes the corresponding
test to actually run
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13072

Differential Revision: D10846622

Pulled By: driazati

fbshipit-source-id: a7eee6320c28140c4937ede3d4e4685cfce08d84
2018-10-25 10:45:42 -07:00
9fefab5ac6 Add support for reductions to TensorIterator (#11908)
Summary:
This adds support for reductions like sum() and mul() to TensorIterator.
Performance is similar to existing optimized code for CPU, and generally
better than existing code for CUDA kernels.

The templatized CUDA kernel requires fewer instantiations than the
existing THCReduce/THCReduceAll code. For example, sum() previously
generated 43 CUDA kernels, while it now requires only one (larger)
CUDA kernel. I suspect this should reduce code-size and
compilation time, but I haven't measured it.

Below are timings for sum() on [CPU](https://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz) (12 threads and 1 thread) and CUDA with various tensor sizes.

CPU

| Reduction (dim)      | Master  | PR      | Master (1 thread) | PR (1 thread) |
|----------------------|---------|---------|-------------------|---------------|
| 1024x1024 (all)      | 22 us   | 34 us   | 136 us            | 147 us        |
| 1024x1024 (0)        | 30 us   | 28 us   | 160 us            | 160 us        |
| 1024x1024 (1)        | 25 us   | 25 us   | 171 us            | 146 us        |
| 1024x10x1024 (all)   | 542 us  | 550 us  | 4.14 ms           | 3.11 ms       |
| 1024x10x1024 (0)     | 658 us  | 690 us  | 6.80 ms           | 5.93 ms       |
| 1024x10x1024 (1)     | 761 us  | 757 us  | 3.34 ms           | 3.52 ms       |
| 1024x10x1024 (2)     | 538 us  | 545 us  | 3.73 ms           | 3.04 ms       |
| 1024x1024x1024 (all) | 72 ms   | 71 ms   | 364 ms            | 357 ms        |
| 1024x1024x1024 (0)   | 94 ms   | 90 ms   | 935 ms            | 927 ms        |
| 1024x1024x1024 (1)   | 80 ms   | 86 ms   | 881 ms            | 688 ms        |
| 1024x1024x1024 (2)   | 71 ms   | 71 ms   | 456 ms            | 354 ms        |

CUDA

| Reduction (dim)      | M40 base | M40 PR  | P100 base | P100 PR   |
|----------------------|----------|---------|-----------|-----------|
| 1024x10x1024 (all)   | 238 us   | 182 us  | 136 us    | 97 us     |
| 1024x10x1024 (0)     | 166 us   | 179 us  | 105 us    | 84 us     |
| 1024x10x1024 (1)     | 181 us   | 182 us  | 89 us     | 91 us     |
| 1024x10x1024 (2)     | 180 us   | 168 us  | 88 us     | 79 us     |
| 1024x1024x1024 (all) | 17.5 ms  | 16.4 ms | 8.23 ms   | 7.48 ms   |
| 1024x1024x1024 (0)   | 27.2 ms  | 28.6 ms | 7.63 ms   | 7.38 ms   |
| 1024x1024x1024 (1)   | 16.5 ms  | 16.3 ms | 7.66 ms   | 7.40 ms   |
| 1024x1024x1024 (2)   | 17.8 ms  | 16.4 ms | 8.37 ms   | 7.31 ms   |

Timings were generated with this script:
https://gist.github.com/colesbury/d3238b266d8a9872fe6f68f77619b379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11908

Differential Revision: D10071760

Pulled By: colesbury

fbshipit-source-id: 40e37a0e6803f1628b94cc5a52a10dfbb601f3d6
2018-10-25 09:42:55 -07:00
e5752f2cb4 Renaming dims() to sizes() (fbcode)
Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10848643

fbshipit-source-id: ac75833be8be9162e35b00dcd352f616bc7bbafe
2018-10-25 09:32:18 -07:00
1720757220 added submodules for int8 ops (#13106) 2018-10-25 09:11:11 -07:00
2a6431ba2d Use fixed MASTER_PORT in test_distributed (#13109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13109

The "right" strategy of creating a socket, binding to an undefined port, closing the socket, and reusing the port it was bound to, was subject to a race condition. Another process could bind to that same port sooner than the tests would, causing an "Address already in use" failure when rank 0 would try and bind to that same port. The THD tests have been using a fixed port since forever. Time will tell if this fixes #12876.

Differential Revision: D10850614

fbshipit-source-id: c19f12bb4916141187ee8ddb52880f5f418310dc
2018-10-25 08:51:34 -07:00
956e620c64 Eliminate numel == -1 state, delete Storage-only constructor (#12656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12656

I originally wanted to do this in two steps, but deleting the Storage-only
constructor also changes the default numel state (which breaks tests),
so easiest to do it all in one go.)

- I still need a way to compute the correct TensorTypeId for all of the
  Caffe2 constructors; rather than hard-code it, I wrote a function
  in at::detail::computeTensorTypeId() to do this calculation.  Maybe
  this function could be used more widely, but for now, it's used
  by Caffe2 only.
- Added a pile more TensorTypeId for all of Caffe2's supported DeviceTypes
- Because I still can't put arbitrary TypeMeta in TensorOptions, the
  TensorTypeId() calculation doesn't respect dtype.  For now, this is
  not a problem, but this might block work to split non-POD dtypes
  into their own TensorTypeId.

Reviewed By: li-roy

Differential Revision: D10380678

fbshipit-source-id: 10c5d12020596fc9f27d5579adffad00513af363
2018-10-25 08:44:05 -07:00
c368f26f88 Disable CircleCI merging to master. (#13074)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13074

Differential Revision: D10852728

Pulled By: ezyang

fbshipit-source-id: 6b96c941f4655ba240adaa0678844efa2af81d06
2018-10-25 08:07:45 -07:00
e8613d99b5 Delete ATen/CUDAGuard.h (#13078)
Summary:
It's empty.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13078

Differential Revision: D10843892

Pulled By: ezyang

fbshipit-source-id: 39e6f73b3a8be3e7573c1af727b65da246d4515b
2018-10-25 07:52:38 -07:00
6995b84d45 Make SparseToDense handle empty outputs properly. (#13043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13043

memset on nullptr is undefined-behavior and as a result filament_test is failing in dev build. This diff is making operator to handle empty output properly, so we can return that test back.

I'm not sure either this is even valid to call this op with input that would require empty memset (empty batch?). Will leave this to ninghz and sunnieshang to decide.

Reviewed By: xianjiec

Differential Revision: D10525605

fbshipit-source-id: a911cdbd62fc3d948328981fd01cd205ec2ad99f
2018-10-25 00:27:52 -07:00
f1e4304d19 Add operator_def property to annotation (#13094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13094

Expose operator_def property

Reviewed By: duc0

Differential Revision: D10847125

fbshipit-source-id: 67a066555b690715e1f5f04125fd446ab197f45a
2018-10-24 23:42:35 -07:00
b883afc928 Absorb c10d into the main cmake build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12953

Differential Revision: D10850274

Pulled By: anderspapitto

fbshipit-source-id: 42296e6e49ad8c1845040e031eab95ddbaf58ae4
2018-10-24 22:34:00 -07:00
c250f6f3d5 DDP perf improvement: move sync_reduction to C++, dedicated CUDA streams for memcpy (#12954)
Summary:
- Moved sync_reduction to C++
- Use a dedicated CUDA stream for memcpy
- Also use a dedicated CUDA stream for memcpy in queue_reduction

Added test as well.

CI should cover both DDP and unittest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12954

Differential Revision: D10520069

Pulled By: teng-li

fbshipit-source-id: 64348e4e43c15f9695a4c28b036c232587ecfb65
2018-10-24 21:37:13 -07:00
69906afaee absorb THD into main cmake build (#12775)
Summary:
We want to move _C into the same cmake invocation that builds
libcaffe2 and libtorch. However, _C depends on THD and c10d, which in
turn depend on libcaffe2. That means that we can't move _C into that
cmake file unless we do these two first. This change does so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12775

Differential Revision: D10457374

Pulled By: anderspapitto

fbshipit-source-id: 2c1aa3b8a418a73d2112e93c7da53a2e70cf7bba
2018-10-24 21:28:37 -07:00
2d9b1fcd09 Make c10d support MPICH and further (#13083)
Summary:
Fixed issue:
https://github.com/pytorch/pytorch/issues/12921

Build and works with mpich, all test passed.

We should add MPICH to CI at one point of time alter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13083

Reviewed By: soumith

Differential Revision: D10844833

Pulled By: teng-li

fbshipit-source-id: e8cdc866ee1ee7a33e469017ea562a08da119d53
2018-10-24 20:11:56 -07:00
b4d0dc77be Eliminate CUDAStream nullptr in NCCL (#13089)
Summary:
As the title says, we should always use the current stream on device in NCCL.

This can unblock ezyang on his further work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13089

Reviewed By: ezyang

Differential Revision: D10847172

Pulled By: teng-li

fbshipit-source-id: 7fc7c4248b5efa1971d2af4d43f62d3379debfe4
2018-10-24 20:04:41 -07:00
fc1c8f8b5b Enable test_nn embedding tests and use correct warp size in Embedding.cu (#13046)
Summary:
* Enable test_nn embedding tests and use correct warp size in Embedding.cu
* Fix embedding_backward_feature_kernel kernel for HIP

For attention: bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13046

Differential Revision: D10560721

Pulled By: bddppq

fbshipit-source-id: e6c3cbeb980a34ff52a92dba8bde745a2e03f2fd
2018-10-24 19:43:37 -07:00
444cc0ee0a Back out "[pytorch][PR] added gemmlowp module" (#13090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13090

Original commit changeset: 7f8a649c739c

Reviewed By: Maratyszcza

Differential Revision: D10846367

fbshipit-source-id: a5a5aad29b51287dc1cb80c707eb5a0008ec78f5
2018-10-24 19:41:15 -07:00
478886be30 Fix print precision and match numpy behavior (#12746)
Summary:
Fixes #12578 #9395.

* Fix and simplify print logic

* Follow numpy print rule eb2bd11870/numpy/core/arrayprint.py (L859)
> scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3

I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks.
```
In [5]: torch.tensor(1)
Out[5]: tensor(1)
Out[2]: array(1) # numpy

In [6]: torch.tensor(10)
Out[6]: tensor(10)
Out[3]: array(10) # numpy

In [8]: torch.tensor(99000000)
Out[8]: tensor(99000000)
Out[5]: array(99000000) # numpy

In [9]: torch.tensor(100000000)
Out[9]: tensor(100000000)
Out[6]: array(100000000) # numpy

In [10]: torch.tensor(100000001)
Out[10]: tensor(100000001)
Out[7]: array(100000001) # numpy

In [11]: torch.tensor(1000000000)
Out[11]: tensor(1000000000)
Out[8]: array(1000000000) # numpy

In [12]: torch.tensor([1, 1000])
Out[12]: tensor([   1, 1000])
Out[9]: array([   1, 1000]) # numpy

In [13]: torch.tensor([1, 1010])
Out[13]: tensor([   1, 1010])
Out[10]: array([   1, 1010]) # numpy
```
For floating points, we use scientific when `max/min > 1000 || max > 1e8 || min < 1e-4`
Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy
```
In [14]: torch.tensor(0.01)
Out[14]: tensor(0.0100)
Out[11]: array(0.01) # numpy

In [15]: torch.tensor(0.1)
Out[15]: tensor(0.1000)
Out[12]: array(0.1) # numpy

In [16]: torch.tensor(0.0001)
Out[16]: tensor(0.0001)
Out[14]: array(0.0001) # numpy

In [17]: torch.tensor(0.00002)
Out[17]: tensor(2.0000e-05)
Out[15]: array(2e-05) # numpy
Out[5]: tensor(0.0000) # old

In [18]: torch.tensor(1e8)
Out[18]: tensor(100000000.)
Out[16]: array(100000000.0) # numpy

In [19]: torch.tensor(1.1e8)
Out[19]: tensor(1.1000e+08)
Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print
Out[10]: tensor(110000000.) # old

In [20]: torch.tensor([0.01, 10.])
Out[20]: tensor([ 0.0100, 10.0000])
Out[18]: array([  0.01,  10.  ]) # numpy

In [21]: torch.tensor([0.01, 11.])
Out[21]: tensor([1.0000e-02, 1.1000e+01])
Out[19]: array([  1.00000000e-02,   1.10000000e+01]) # numpy
Out[7]: tensor([ 0.0100, 11.0000]) # old
```
When print floating number in int mode, we still need to respect rules to use scientific mode first
```
In [22]: torch.tensor([1., 1000.])
Out[22]: tensor([   1., 1000.])
Out[20]: array([    1.,  1000.]) # numpy

In [23]: torch.tensor([1., 1010.])
Out[23]: tensor([1.0000e+00, 1.0100e+03])
Out[21]: array([  1.00000000e+00,   1.01000000e+03]) # numpy
Out[9]: tensor([   1., 1010.]) # old
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12746

Differential Revision: D10443800

Pulled By: ailzhang

fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5
2018-10-24 18:12:51 -07:00
3761adc889 C++ API Cleanup Extension (#13087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13087

API changes that simplify subgraph replacement drastically

Reviewed By: duc0

Differential Revision: D10444011

fbshipit-source-id: 22c699bb5bc0f21538c70fe9401899d4f7e1b055
2018-10-24 18:06:50 -07:00
3fa9ccf1ba Add new NeuralNetOps for fusion (#13068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13068

Basic ops.def update and converter.cc updates
This is the standard way to ingest networks into nomnigraph

redo of D10412639

Reviewed By: ZolotukhinM

Differential Revision: D10560324

fbshipit-source-id: c8ccb0aabde6ee8f823657ee5cd3ed9ed6c45549
2018-10-24 18:06:49 -07:00
e0a8665d03 Converter fix to allow unimplemented convertToOperatorDef (#13069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13069

simply a new fallback

Reviewed By: ZolotukhinM

Differential Revision: D10591414

fbshipit-source-id: 1ad8f16135a6c68b2df889101f06b736a3e4f7da
2018-10-24 18:06:48 -07:00
ef019a2d18 Improve the C++ API (#13067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13067

Cleaning up the interface for nomnigraph in C++ world

redo of D10438090

Reviewed By: ZolotukhinM

Differential Revision: D10560323

fbshipit-source-id: e4e084284615e813836a7d031b5a71e8d80b0e62
2018-10-24 18:06:46 -07:00
3b919a6f82 Renaming dims() to sizes() (caffe2/caffe2) - 1/4
Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10842786

fbshipit-source-id: 551421a2cb4d2f2fc7f43775d4554643de0f0694
2018-10-24 17:36:08 -07:00
9573ecefe3 Back out "[pytorch][PR] Add sse2neon tp" (#13091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13091

Original commit changeset: 8b4f9f361cc1

Reviewed By: Maratyszcza

Differential Revision: D10846301

fbshipit-source-id: 2798f1fca5c1a2362979977ef5eb724dd37c4e6d
2018-10-24 17:17:34 -07:00
e290a9d2fd Back out "Migrate DeviceOption.numa_node_id to DeviceOption.device_id"
Summary: Original commit changeset: 82583d0ad4b8

Reviewed By: enosair, ilia-cher

Differential Revision: D10560741

fbshipit-source-id: e289a37d441bd2243b369810abf451292891d9ee
2018-10-24 17:11:25 -07:00
ccfaf46431 Make CUDNN an alias of MIOPEN for HIP ops (#12278)
Summary:
This is mostly for reusing all the cudnn test cases in our python operator_tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278

Differential Revision: D10842592

Pulled By: bddppq

fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898
2018-10-24 17:07:31 -07:00
e1243cef88 fixed docs for Student-T distribution (#13044)
Summary:
added loc and scale args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13044

Differential Revision: D10560762

Pulled By: ezyang

fbshipit-source-id: 6c98ecc04975df8993364b06c480d015a25e2061
2018-10-24 16:59:23 -07:00
86881cdb39 MNIST images should have an extra dim (#13060)
Summary:
Our convolution ops and such expect three dimensional images, but the images in the MNIST dataset of the C++ frontend currently only have two.

apaszke ebetica soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13060

Differential Revision: D10560754

Pulled By: goldsborough

fbshipit-source-id: a2cc877b4f43434482bec902c941fafb7a157d5d
2018-10-24 16:53:37 -07:00
6727133f3d Support warnings.warn (#12964)
Summary:
`warnings.warn` is used commonly thoughout `nn.functional`, so this adds
support for it by forwarding its arguments to `print`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12964

Differential Revision: D10559427

Pulled By: driazati

fbshipit-source-id: 5b591f6f446c906418f9fc7730c17e301f263d9b
2018-10-24 16:48:02 -07:00
b790fcaf39 Renaming dims() to sizes() (caffe2/caffe2) - 4/4
Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10842900

fbshipit-source-id: 8d58ed4d403fb0308a8fa286659f8e830b040bec
2018-10-24 16:32:51 -07:00
a4475d529d Use GetFetchStackTrace for the AT_* error macros too. (#13007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13007

No reason to use the hook if it's set, this helps fbcode traces.

This slightly pessimizes the stack trace for ATen functions,
because we are no longer skipping all of the frames we should.
This is probably OK.

Reviewed By: Yangqing

Differential Revision: D10518499

fbshipit-source-id: be54e490df3c3fde7ff894b5b1473442ffc7ded3
2018-10-24 16:18:25 -07:00
917b203b01 Assert spawned processes terminating in distributed tests (#13071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13071

In the case where a process got stuck and timed out on joining, we would see a None != 1 assertion error in the code path where the exit statuses are compared. This implies that the first process exited with exit code 1 and another one didn't exit at all. With this commit the error message is more descriptive.

Differential Revision: D10785266

fbshipit-source-id: c8cc02d07ea4fdc6f5374afd9a0aac72218fe61d
2018-10-24 16:03:36 -07:00
2ac7b6b683 Tensor dims() -> sizes() (caffe2/operators) - 5/5 (#13032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13032

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10476235

fbshipit-source-id: 263ad75689d864b414dae63cb9a30cb3285dae31
2018-10-24 15:07:43 -07:00
cccd457a1e Tensor dims() -> sizes() (caffe2/operators) - 4/5 (#13031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13031

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10476232

fbshipit-source-id: cb4ad76be068065eb2c5e7d87f33d04423cf93c4
2018-10-24 15:07:42 -07:00
ab253c2bf1 Tensor dims() -> sizes() (caffe2/operators) - 3/5 (#13030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13030

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10476226

fbshipit-source-id: 757583e3bde8d5246565433883bd328ab34f3e09
2018-10-24 15:02:40 -07:00
b55dc8d971 Add sse2neon tp (#12948)
Summary:
Adding sse2neon in thrid-party as dependencies
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12948

Differential Revision: D10801574

Pulled By: harouwu

fbshipit-source-id: 8b4f9f361cc1722f631830f7675b9d209a9f22ef
2018-10-24 14:56:24 -07:00
be43a0faa9 Tensor dims() -> sizes() (caffe2/operators) - 2/5 (#13029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13029

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10476225

fbshipit-source-id: 5e63ca80b3843967ea1661ada447bbc18661378d
2018-10-24 14:34:45 -07:00
07c0f4a097 Tensor dims() -> sizes() (caffe2/operators) - 1/5 (#13028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13028

Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes()

Reviewed By: ezyang

Differential Revision: D10476220

fbshipit-source-id: 3c3b3d5e2082cd6a1f0ff4a3c8641b30e6f16896
2018-10-24 14:18:18 -07:00
4b5d13abab Use cmake3 if it exists and cmake isn't sufficient (#12972)
Summary:
A tweak to https://github.com/pytorch/pytorch/pull/12916 that only uses cmake3 when cmake isn't good enough. Hopefully fixes the issue zdevito saw.

cc zdevito SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12972

Differential Revision: D10560674

Pulled By: orionr

fbshipit-source-id: 90c71929630bb8167a3ee2cc6f306eefe5b85445
2018-10-24 14:14:39 -07:00
10046c2b2b nomnigraph - (easy) Expose operators (#13063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13063

Expose the following operators
GatherRanges
Slice
MergeIdLists

Reviewed By: itomatik

Differential Revision: D10560138

fbshipit-source-id: 90f74d7d4c2bfca40788a5fcec4c73d71b156d3b
2018-10-24 14:09:27 -07:00
c64a65c977 added gemmlowp module (#12947)
Summary:
Adding gemmlowp dependency in thrid-party folder
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12947

Differential Revision: D10794559

Pulled By: harouwu

fbshipit-source-id: 7f8a649c739ccb6c307327080711379b1db8c3e0
2018-10-24 13:53:58 -07:00
0f5cee2f6b Convert some docstrings from char* to char[] (#13062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13062

Gold (the linker) isn't able to gc unreferenced string constants, but
converting these to arrays puts them in their own data sections and reduces
(Android) binary size as a result.

I'm told even in server builds, this reduces binary size by a few dozen bytes
and speeds up startup by a few hundred ns. :-P

Reviewed By: Yangqing

Differential Revision: D10510808

fbshipit-source-id: 247ba9574e7a9b6a8204d33052994b08c401c197
2018-10-24 13:48:18 -07:00
97b6a25329 Use REGISTER_CPU_GRADIENT_OPERATOR for many operators (#12616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12616

Focusing on operators in common use on mobile.

Also use GRADIENT_OPERATOR_SCHEMA.

Reviewed By: Yangqing

Differential Revision: D10245216

fbshipit-source-id: 5cc023da170149b637fe3c729d3756af948aa265
2018-10-24 13:48:17 -07:00
df47bbe9c1 Fix test_glu_old HealthCheck with smarter generation strategy. (#12975)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12975

Differential Revision: D10513493

Pulled By: ezyang

fbshipit-source-id: ac183aeb4ae7f0a5f91f1a369b595ae92c3e844d
2018-10-24 13:45:19 -07:00
2dacf28b66 link libgloo_cuda.a explictly from setup.py (#12951)
Summary:
rather than pass a list through a text file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12951

Differential Revision: D10528309

Pulled By: anderspapitto

fbshipit-source-id: d94befcd61b6304815859694b623046f256462df
2018-10-24 13:19:46 -07:00
dd7c2d4284 Change the function signature for caffe2::empty (#13015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13015

att

Reviewed By: ezyang

Differential Revision: D10469310

fbshipit-source-id: f4621fe5d17bb4663192860f81effe6bdfe21bea
2018-10-24 13:14:24 -07:00
1bea5fc3ad Fix UpsampleNearest op CPU impl batch handling (#13002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13002

Batch dim wasn't handled in the CPU impl (will fail for inputs with N > 1).
Fixing that here.

Differential Revision: D10515159

fbshipit-source-id: ee7e4f489d2d4de793f550b31db7c0e2ba3651e8
2018-10-24 13:10:53 -07:00
353fdefdd6 dims() -> sizes() (caffe2/core) (#13014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13014

Tensor method renaming using clangr

Reviewed By: ezyang

Differential Revision: D10467556

fbshipit-source-id: 7d7eaf5fc59bbb493c057d5b8bfdda03b140c97e
2018-10-24 12:49:28 -07:00
0a190c8869 Move the location of annotation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12969

Differential Revision: D10560824

Pulled By: ezyang

fbshipit-source-id: 86c21149682db5ebfd9610df9e9845688a3db3b0
2018-10-24 12:35:08 -07:00
fcf801f061 Support building binary on windows machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13059

Reviewed By: llyfacebook

Differential Revision: D10560147

Pulled By: sf-wind

fbshipit-source-id: c8f38b30c9acdf6ae494e56a5876fd4493696e5d
2018-10-24 12:24:42 -07:00
8355219e68 CircleCI: turn off OSX jobs temporarily
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13064

Differential Revision: D10561008

Pulled By: yf225

fbshipit-source-id: c48364662efa82865a1bc1a7e2db3a9fb8af10d5
2018-10-24 12:22:05 -07:00
85273acca8 fix pinning of hypothesis (#13055)
Summary:
tested manually that this works

fixes https://github.com/pytorch/pytorch/issues/12395
obviates https://github.com/pytorch/pytorch/pull/12774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13055

Differential Revision: D10559788

Pulled By: anderspapitto

fbshipit-source-id: 5cd8bac6eff548280c8742f36a5e7f2748a24623
2018-10-24 11:46:28 -07:00
448a32e0ee Adding timestamps to the beginning of every test file in run_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12994

Reviewed By: anderspapitto

Differential Revision: D10515291

Pulled By: pjh5

fbshipit-source-id: 191054cdacff308b63e9063d22d62314398e4f88
2018-10-24 11:42:31 -07:00
6c8d47f2af Add methods to FunctionSchema (#12967)
Summary:
We are beginning to use this class in a wider reaching set of use-cases. This PR refactors it so that we always access schema properties through methods. This will make adding extra information like alias information easier (i.e. we can a version of `type()` that returns the type with alias information and another version that returns a type without that information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12967

Differential Revision: D10502674

Pulled By: zdevito

fbshipit-source-id: a88783ed8f20ab3be6460c12da95f9f940891c44
2018-10-24 10:32:27 -07:00
52beb338ab Add Modules_CUDA_Fix folder to installed folder (#13013)
Summary:
This is used to patch our cmake cuda scripts - should be in the installation script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13013

Reviewed By: ir413

Differential Revision: D10519104

Pulled By: Yangqing

fbshipit-source-id: 542049224ea41068f32d4c0f6399c7e8b684f764
2018-10-24 10:16:18 -07:00
46162ccdb9 Autograd indices/values and sparse_coo ctor (#13001)
Summary:
Reopen of #11253 after fixing bug in index_select
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13001

Differential Revision: D10514987

Pulled By: SsnL

fbshipit-source-id: 399a83a1d3246877a3523baf99aaf1ce8066f33f
2018-10-24 10:00:22 -07:00
e0f21a4977 restore caffe2 strides (#12883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12883

Attempting to do this again. last try broke oss ci: D10421896

Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that.

Reviewed By: ezyang

Differential Revision: D10469960

fbshipit-source-id: 478186ff0d2f3dba1fbff6231db715322418d79c
2018-10-24 09:45:46 -07:00
88f70fcef9 remove progress from git operations in CI builds (#13017)
Summary:
these are pretty spammy - unless we have a reason to keep them, let's not
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13017

Differential Revision: D10528295

Pulled By: anderspapitto

fbshipit-source-id: 5514371a6e61e13ec070cc5517488523d42f2935
2018-10-24 09:26:05 -07:00
7863c17b26 Fix convtranspose3d output_size calculation (#12952)
Summary:
Closes #2119.

There was a small bug where the output_size got sliced with `[-2:]`
where we really meant to slice it as `[2:]` (to remove the batch and
channel dimensions).

Added a new test for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952

Differential Revision: D10510678

Pulled By: zou3519

fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2
2018-10-24 09:23:05 -07:00
046672eed5 Set proper scope on nodes added by JIT (#12400)
Summary:
In order to support tensorboardX and other visualization tools, we need to make sure a non-empty scope is set on all nodes added by the JIT. This attempts to do this, but is still a WIP.

This is a new version of https://github.com/pytorch/pytorch/pull/10749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12400

Reviewed By: ezyang

Differential Revision: D10224380

Pulled By: orionr

fbshipit-source-id: d1bccd0eee9ef7c4354112c6a39a5987bfac2994
2018-10-24 09:05:46 -07:00
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
d72de9fb1e Replace direct use of int32_t with an alias DeviceIndex (#13019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13019

It just makes the semantic meaning of the int32_t a little
bit clearer.

Reviewed By: zou3519

Differential Revision: D10520295

fbshipit-source-id: 45b0bd1b6afddee17072b628d8e9b87d7c86e501
2018-10-24 08:27:45 -07:00
34cca9f05b Move Device and DeviceType to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12995

Reviewed By: Yangqing

Differential Revision: D10513246

fbshipit-source-id: 0c6d52e09166d7e8a786c1a0e21685ec9c35b12a
2018-10-24 08:27:44 -07:00
ca03c10cef Rename createCUDAStream() to getStreamFromPool() (#12940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12940

Dmytro was reading this code and requested that we rename the interface
to something that made it more obvious that pooling was going on.
Seems reasonable to me! Final name is a suggestion from Pieter.

Reviewed By: dzhulgakov

Differential Revision: D10492071

fbshipit-source-id: b1c2cac760f666968d58166be649dabfe1127c5e
2018-10-24 07:23:31 -07:00
924326e171 Revert D10438090: [nomnigraph] Improve the C++ API
Differential Revision:
D10438090

Original commit changeset: 6b4309b8a4b3

fbshipit-source-id: 5f6a28cf032e0be2544f0b33508148f4f49e10c5
2018-10-24 07:04:33 -07:00
97d4c05566 Revert D10412639: [nomnigraph] Add new NeuralNetOps for fusion
Differential Revision:
D10412639

Original commit changeset: a4c523fda96b

fbshipit-source-id: 973b6dd30b63b9a08069275278b0780b65067635
2018-10-24 07:04:31 -07:00
17c6d168de Attach Shape node if Concat node has 2 outputs (#13006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13006

In Caffe2, Concat can have 2 outputs. The second being the output shape of the 1st output. In ONNX, Concat only has 1 output. So when we do the exporting, we need to add a `Shape` to the first output and generate the second output from it.

Differential Revision: D10517698

fbshipit-source-id: 38e974423e2506b16d37b49d51c27ad87b73e63a
2018-10-23 22:56:48 -07:00
53ac4de79d Expose basic transformation API to Python (#13033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13033

Basic graph manipulation exposed to python

Reviewed By: ZolotukhinM

Differential Revision: D10519720

fbshipit-source-id: 0f9a494d122289a3a9e23d4cff99ac0a21382ec6
2018-10-23 20:54:54 -07:00
4e0b6c8500 Speed up resolution callback creation (#12859)
Summary:
`inspect.stack()` calls are slow since they access a bunch of extra info about the frame. This PR instead uses `inspect.currentframe()` and goes up the stack until it reaches the correct frame. [Context](stackoverflow.com/questions/17407119/python-inspect-stack-is-slow)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12859

Differential Revision: D10509912

Pulled By: driazati

fbshipit-source-id: b85325adf1b3c85a1a3a82e96e567b8be498531b
2018-10-23 20:40:04 -07:00
08d99c4486 Add new NeuralNetOps for fusion
Summary:
Basic ops.def update and converter.cc updates

This is the standard way to ingest networks into nomnigraph

Reviewed By: duc0

Differential Revision: D10412639

fbshipit-source-id: a4c523fda96bbe0e31de0d9fcf795ae9c7377c90
2018-10-23 19:27:10 -07:00
9c1195fe61 Improve the C++ API
Summary: Cleaning up the interface for nomnigraph in C++ world

Reviewed By: duc0

Differential Revision: D10438090

fbshipit-source-id: 6b4309b8a4b3730f3309edf0047d4006a001895b
2018-10-23 19:27:09 -07:00
f9b7ce9c99 Add tuple indexing support for constant integers (#11492)
Summary:
Add support indexing tuples with constant integers by creating a new prim::TupleIndex operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11492

Differential Revision: D9811996

Pulled By: eellison

fbshipit-source-id: a458c2522b3c81476252d920e27a8d6c7b9a036b
2018-10-23 17:52:03 -07:00
ff508c91a1 Remove numba dependency
Summary:
TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers.

Converted the old test to a non-numba regular python op test.

Reviewed By: xw285cornell

Differential Revision: D10519910

fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833
2018-10-23 17:03:47 -07:00
a6949abb15 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (fixed reverted bug) (#12848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Original commit changeset: c0760e73ecc7

Reviewed By: dzhulgakov

Differential Revision: D10453456

fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070
2018-10-23 16:21:26 -07:00
dd00c2997f fix expect tests (#13005)
Summary:
the topological index shuffled arguments around, updating expect files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13005

Differential Revision: D10517246

Pulled By: michaelsuo

fbshipit-source-id: 8f95e4e4ca8ff51da0507f9b0eb838c23ddaa821
2018-10-23 15:53:16 -07:00
821b04e819 Nomnigraph: Remove Copy constructor and copy assign operator from BasicBlock, add move constructor.
Summary:
We cannot use copying as it loses recorded callbacks and thus after copying
tracked values are no longer tracked.

Reviewed By: bwasti, duc0

Differential Revision: D10510057

fbshipit-source-id: b64fdef3fb28fc26fe55eba41f4b5007ba6894de
2018-10-23 15:41:48 -07:00
83f788d088 Fix MSVC build for Python 3.6 (#12878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12878

Python 3.6 headers define their own ssize_t, which clashes with our definition.
Luckily, they also define a `HAVE_SSIZE_T` macro we can use to check for this case.

Reviewed By: ezyang

Differential Revision: D10467239

fbshipit-source-id: 661675ad1e30a6ca26d6790eaa75657ef6bf37c2
2018-10-23 15:30:01 -07:00
b8a11cffdb Minor improvements cherry-pick (#12973)
Summary:
* Enable disabled functions for ROCm (ROCm 252)
* fixes for topk fp16 (ROCm 270)
* HIP needs kernel invocation to be explicitly templated to be able to take non-const arg as const kernel arg (ROCm 281)

For attention: bddppq ezyang

Full set of PyTorch/Caffe2 tests on ROCm here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/283
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12973

Differential Revision: D10516072

Pulled By: bddppq

fbshipit-source-id: 833b3de1544dfa4886a34e2b5ea53d77b6f0ba9e
2018-10-23 15:03:47 -07:00
223a96a9a0 Add missing NCHW2NHWC symbols for HIP (#13000)
Summary:
petrex ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13000

Differential Revision: D10516020

Pulled By: bddppq

fbshipit-source-id: 017bd393da3d97fbae3f0227ad01977c5c0744c6
2018-10-23 14:20:33 -07:00
470e766062 Fix illegal code in rocblas_handle rocblas_handle() that causes failure w/ gcc as base compiler (#12957)
Summary:
The legal function cublasHandle_t cublas_handle() was hipified to the
clearly illegal rocblas_handle rocblas_handle(). It should not work and
correctly fails with gcc as the host compiler as it induces an
ambiguity.

Function now hipifies to rocblas_handle rocblashandle()

Fixes long standing issue we've observed in PyTorch when base compiler is gcc.

For attention: bddppq ezyang

Tests on ROCm PyTorch/Caffe2: https://github.com/ROCmSoftwarePlatform/pytorch/pull/284
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12957

Differential Revision: D10501227

Pulled By: bddppq

fbshipit-source-id: 568cb80801c0d14c9b1b61e3a7db387a5c21acf4
2018-10-23 13:46:15 -07:00
21285e73da Add Google pixel code
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12998

Differential Revision: D10515096

Pulled By: JoelMarcey

fbshipit-source-id: 7f97014451448a70ea7f91d7d8bd96fbf6e83f7f
2018-10-23 13:26:37 -07:00
8e4bea107a Fix clang-tidy 404 in Travis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12963

Differential Revision: D10510026

Pulled By: goldsborough

fbshipit-source-id: b6b9634a7a2575ff4e2983321d2e4e5829626347
2018-10-23 09:34:43 -07:00
9ea19cb079 Windows CI integration for custom ops (#12928)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/11527

ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12928

Differential Revision: D10501342

Pulled By: goldsborough

fbshipit-source-id: 7ce74795aab2f13efeb38f56ce82f53055f5eade
2018-10-23 09:18:09 -07:00
af78d4cd49 Add weak script modules (#12682)
Summary:
Adds support for weak script modules created that get compiled to `ScriptModule`s once added as a submodule of a `ScriptModule`:

```python
weak_module
class Test(torch.nn.Module):
	...
	weak_script_method
	def forward(self, x):
		...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12682

Differential Revision: D10458626

Pulled By: driazati

fbshipit-source-id: 10ae23cb83cdafc4646cee58f399e14b2e60acd4
2018-10-23 09:06:02 -07:00
3fb3a07f54 Added a default constructor for torch.finfo.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12847

Differential Revision: D10457487

Pulled By: benoitsteiner

fbshipit-source-id: 7d164a71ba52631e5906098f643eecb0630879d1
2018-10-23 09:03:24 -07:00
Jat
1b07eb7148 torch.utils.cpp_extension.verify_ninja_availability() does not return True as documented
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12922

Differential Revision: D10502167

Pulled By: ezyang

fbshipit-source-id: 2e32be22a310e6e014eba0985e93282ef5764605
2018-10-23 07:38:08 -07:00
428300d318 Revert D10494123: [c10] Remove at::Optional
Differential Revision:
D10494123

Original commit changeset: 761bdf7359d6

fbshipit-source-id: 552fb4ab0dc253b95ce87ec6a1c65aba4b07e84a
2018-10-23 07:18:54 -07:00
d401dc4374 Remove at::Optional (#12958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12958

TSIA - this is an ongoing diff to fully move to c10 namespace.

Reviewed By: dzhulgakov

Differential Revision: D10494123

fbshipit-source-id: 761bdf7359d62ef4503ecb1b8d0ae1c0762e073c
2018-10-23 00:03:20 -07:00
27af265a5e Index to track topological order within a block (#12748)
Summary:
Simple index to track topological order. Replaced `topological_index` in the graph fuser with this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12748

Differential Revision: D10502983

Pulled By: michaelsuo

fbshipit-source-id: 5855e5add3c9742fe07e86d854260baa34beab3b
2018-10-22 23:55:20 -07:00
dd823ccd28 small improvements to torch.nn.normalization docs (#12936)
Summary:
Based on a [discussion at the forums](https://discuss.pytorch.org/t/question-about-functional-normalize-and-torch-norm/27755), it might be worthwhile to clarify the documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12936

Differential Revision: D10502139

Pulled By: ezyang

fbshipit-source-id: 480c3c367f8c685dcde107b3018cb4129032322d
2018-10-22 23:14:47 -07:00
8d7607e346 Add attribute exhaustive_search in _blacklist_caffe2_args (#12805)
Summary:
- exhaustive_search attribute will be blacklisted so it
     will be discarded from the coverted onnx model. At present
     it throws error while verifying the onnx model

Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12805

Differential Revision: D10502374

Pulled By: ezyang

fbshipit-source-id: 0926dfa3237a8a431184e7f7250146e5b0cbfb85
2018-10-22 22:48:31 -07:00
bc1d96ca98 Add support for inline expect tests. (#12825)
Summary:
expecttest and test_expecttest are the implementation and tests
for this functionality.  I wired it up to the --accept flag,
but there's also a new environment variable EXPECTTEST_ACCEPT
which may be more convenient to trigger.  Haven't tested if this
works in fbcode.

There may be a few expect tests which will benefit from inline
treatment, but I just did one to show it works.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12825

Reviewed By: teng-li

Differential Revision: D10448630

Pulled By: ezyang

fbshipit-source-id: 3d339f82e2d00891309620a60e13039fa1ed8b46
2018-10-22 19:29:04 -07:00
952df2ba8f Install torchvision before all tests, tickles #7851 (#8311)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8311

Differential Revision: D10239923

Pulled By: ezyang

fbshipit-source-id: 3f8cdc6229bfbe701c7583cede65435aa952ed85
2018-10-22 18:16:47 -07:00
3894ed22a8 Remove nullopt from native_parse.py (#12961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12961

According to zdevito - this is not used at all, so we are removing it for safety.

It is also possible that this native_parser.py will completely go away in the
near future.

Reviewed By: zdevito

Differential Revision: D10501616

fbshipit-source-id: 3218708e6150d3c94d730fbd25ae1f7abb5718b5
2018-10-22 18:13:37 -07:00
da2da55170 Make sure to update success_ at the end of the run (#12806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12806

Make sure to update success_ status at the end of the run when going through
task statuses

Reviewed By: aazzolini

Differential Revision: D10443704

fbshipit-source-id: 79f8f7fe1eccb78f6e2859f3b1e66dc44347bcc8
2018-10-22 16:58:20 -07:00
8c514627a4 Add C10_LIKELY/C10_UNLIKELY macros (#12932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12932

I was looking at some assembly for some code I was working on,
and felt a desire to have likely()/unlikely() macros.  I checked
if we already had them, and we didn't.  This commit adds them,
and fixes up all known use sites to make use of it.

Reviewed By: Maratyszcza

Differential Revision: D10488399

fbshipit-source-id: 7476da208907480d49f02b37c7345c17d85c3db7
2018-10-22 16:26:19 -07:00
8d3e7e2fcb Move DDP queue_reduction to C++ (#12852)
Summary:
fully working version by using continuing on goldsborough 's initial version.

waiting on the stream guard to be merged before adding more stream perf logics into the c++ version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852

Differential Revision: D10468696

Pulled By: teng-li

fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f
2018-10-22 16:07:46 -07:00
8682999767 Remove trailing whitespace from files in aten/ (#12942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12942

I hate trailing whitespace.

Reviewed By: Yangqing

Differential Revision: D10492507

fbshipit-source-id: 94ed80988670361e9e7e508c3b07c5e5c6e500e7
2018-10-22 16:04:21 -07:00
f575e138d8 Credits to Exhale in cppdocs (#12926)
Summary:
Some creds to svenevs

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12926

Differential Revision: D10498288

Pulled By: goldsborough

fbshipit-source-id: 878d23ebf260dac17871677635a3283eb3a8a423
2018-10-22 15:39:36 -07:00
e64f75a1d8 fix ZeroDivisionError in utils.bottleneck (#11987)
Summary:
**ZeroDivisionError** occurs when `cuda_prof_exec_time` is small enough.
This situation is normal for a project that has little CUDA work.

Or someone does not make his work transferred to CUDA successfully. In this time he profiles the code, this error occurs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11987

Differential Revision: D10488568

Pulled By: soumith

fbshipit-source-id: db8c1e9e88a00943c100958ebef41a1cb56e7e65
2018-10-22 14:00:15 -07:00
95caa37565 Remove CAFFE2_USE_MINIMAL_GOOGLE_GLOG (#12938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12938

We will be using C10_USE_MINIMAL_GLOG. Also, this will be in exported flags,
so dependent libraries won't need to define it.

Reviewed By: smessmer, BIT-silence

Differential Revision: D10468993

fbshipit-source-id: 04ae3ae17122d46b1b512d4202ab014365b87f4a
2018-10-22 13:37:38 -07:00
283d41885d Accept external input hint when doing ONNXIFI transform (#12900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12900

Workspace sometimes will be populated with input tensors for shape inference but net.external_input() is not a reliable way to tell weights from input in the workspace. We say in some usecases where net.external_input() is empty. In this case, we need to give user an option to provide input hint.

Reviewed By: bddppq

Differential Revision: D10476822

fbshipit-source-id: 1a3fa2df69b959d5b952a7824eba9e6c713f4f07
2018-10-22 13:32:33 -07:00
5f37c0afda Fix doxygen check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12920

Differential Revision: D10494081

Pulled By: goldsborough

fbshipit-source-id: c96b9b61cbae39006b48b23b901248e762cbd232
2018-10-22 12:28:17 -07:00
56bf4850cb Clean up of the multithreaded benchmark (#12905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12905

This diff does some clean up of the multithread benchmark code:
1. Split implementation to `.cc` file to separate implementation and improve build
2. Make `MutatingNetSupplier` more generic by providing the mutating function as an argument instead of virtual method.
3. Fix AI benchmark by sticking to the original option names

Reviewed By: highker

Differential Revision: D10479238

fbshipit-source-id: afa201fc287e3fdbb232db24513ecf8024501f66
2018-10-22 12:09:16 -07:00
1b530fdae0 remove the find-package codepath for gloo in caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12893

Differential Revision: D10493310

Pulled By: anderspapitto

fbshipit-source-id: ba5bd375c118b0f0ab7fb7b9fda010fe17a6ac8d
2018-10-22 11:54:53 -07:00
6cc15c1a22 Simplify typeid SFINAE (#12706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12706

If both branches are valid C++ code independent from the type passed in, then we can just use if/else inside of a constexpr function
to decide between the cases. Only if one branch would be invalid code (say because type T doesn't have a default constructor), we'd
need "constexpr if" or SFINAE.

Reviewed By: ezyang

Differential Revision: D10400927

fbshipit-source-id: 16d9855913af960b68ee406388d6b9021bfeb34a
2018-10-22 11:27:10 -07:00
3092a69546 Optimize NCHW2NHWC on GPU (#12910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12910

Optimize NCHW2NHWC on GPU

Reviewed By: houseroad

Differential Revision: D10481163

fbshipit-source-id: 6ddbd0ec9c96965b96aa1b8a006232d6f2b94249
2018-10-22 11:24:29 -07:00
cfb7f0a8f2 remove onnx CODEOWNERS entries (#12941)
Summary:
we don't need these anymore; let's reduce notification spam
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12941

Reviewed By: bddppq

Differential Revision: D10492266

Pulled By: anderspapitto

fbshipit-source-id: 3251b6d0160f773d17b64afc504216323d61276a
2018-10-22 11:09:08 -07:00
8f51c513a6 gloo: build once, share between pytorch/caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12885

Differential Revision: D10492244

Pulled By: anderspapitto

fbshipit-source-id: 79af1ceb9bb0dab4585a728e64554ff4f38d6c32
2018-10-22 11:06:14 -07:00
df06fba1f1 Use the newer one of cmake and cmake3. (#12916)
Summary:
On my devgpu, `cmake` is newer than `cmake3`. Using `cmake3` causes compilation to fail. Instead of blindly using `cmake3`, we pick the newer of the two.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12916

Differential Revision: D10481922

Pulled By: SsnL

fbshipit-source-id: 8340136c459e25da9f5fc4f420c7e67cadc28aff
2018-10-22 10:29:55 -07:00
5e8e199f8d Add note on traced module train/eval behavior
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12903

Differential Revision: D10489090

Pulled By: SsnL

fbshipit-source-id: 13ff5587f53706b360dd0905d0ae97fb16ae2bf0
2018-10-22 10:26:15 -07:00
a022fd2d6b Implement DataLoader (#11918)
Summary:
This PR implements a DataLoader API for the C++ frontend.

The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;

Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.

Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors

The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.

There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.

I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.

There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.

apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918

Reviewed By: ezyang

Differential Revision: D9998881

Pulled By: goldsborough

fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
2018-10-22 10:22:41 -07:00
96d826f635 Define REGISTER_CPU_GRADIENT_OPERATOR (#12588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12588

By default, this is an alias for REGISTER_CPU_OPERATOR.  If gradients are not
required (e.g., on mobile) it can be converted to a no-op by defining
CAFFE2_NO_GRADIENT_OPS, resulting in a smaller build.

GRADIENT_OPERATOR_SCHEMA works similarly.

CAFFE2_NO_GRADIENT_OPS also converts REGISTER_GRADIENT to a no-op.

Use these macros in fully_connected_op.cc as an example.
Follow-up diffs will convert more operators.

I had to introduce MACRO_EXPAND to handle the way Visual Studio expands
VA_ARGS.

Reviewed By: Yangqing

Differential Revision: D10209468

fbshipit-source-id: 4116d9098b97646bb30a00f2a7d46aa5d7ebcae0
2018-10-22 10:01:02 -07:00
da73d709a8 Remove unsafecoalesce op (#12897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897

UnsafeCoalesce Op is used during memonger days when we try to coalesce operators
for better efficienct computation kernels. It creates a little bit of an unsafe
underlying memory storage pattern.

With the new tensor unification I am not sure if it is still safe for us to do
so, so I propose we delete it for the sake of safety.

Reviewed By: bddppq, ilia-cher

Differential Revision: D10475980

fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e
2018-10-22 09:42:26 -07:00
c774cb8913 Rephrase unclear error message for shape mismatch (#12870)
Summary:
I spent a couple of minutes trying to understand which shape corresponds to checkpoint and which one to the model
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12870

Differential Revision: D10466600

Pulled By: SsnL

fbshipit-source-id: 3b68530b1b756462a2acd59e3a033ff633567a6b
2018-10-22 08:57:16 -07:00
25f4b3efe3 Add simple scripts for checking if generated code changed. (#12835)
Summary:
This is designed to make it easier to see how your codegen changes affected actual generated code.

Limitations:
A) This is NOT robust; if new directories are added that include generated files, they need to be added to tools/generated_dirs.txt.  Note that subdirectories of the list are not included.

B) This is particular to my workflow which I don't claim is generally applicable.  Ideally we would have a script that pumped out a diff that could be attached to PRs.

C) Only works on OSS and definitely won't work on windows.

How to use:
1) python setup.py ...
2) tools/git_add_generated_dirs
3) Edit codegen
4) python setup.py ...
4) git diff to see changes
5) If satisfied: tools/git_reset_generated_dirs, commit, etc.
   If not satisfied: Go to 3)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12835

Reviewed By: ezyang

Differential Revision: D10452255

Pulled By: gchanan

fbshipit-source-id: 294fc74d41d1b840c7a26d20e05efd0aff154635
2018-10-22 07:33:32 -07:00
01227f3ba7 Env variable to not check compiler abi (#12708)
Summary:
For https://github.com/pytorch/pytorch/issues/10114

soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12708

Differential Revision: D10444102

Pulled By: goldsborough

fbshipit-source-id: 529e737e795bd8801beab2247be3dad296af5a3e
2018-10-21 20:07:50 -07:00
1e8064dec0 Convert 2 nn.functional functions to weak script (#12723)
Summary:
* Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn`
* Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error
* Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723

Differential Revision: D10452986

Pulled By: driazati

fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36
2018-10-21 14:09:55 -07:00
b357470421 Add DistributedDataParallelCPU to doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12864

Differential Revision: D10481669

Pulled By: SsnL

fbshipit-source-id: 20831af41aaba75546e6ed6a99f011f0447b1acf
2018-10-21 11:20:11 -07:00
ed02619ba0 Add topological sort to nomnigraph (#12790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12790

Add DFS based topological sort to nomnigraph.

Reviewed By: duc0

Differential Revision: D10434645

fbshipit-source-id: aaf106b0cc37806b8ae61f065c1592a29993eb40
2018-10-20 01:07:30 -07:00
a839a67aad Add IDEEP unit test with zero-dim tensors (#8459)
Summary:
This test flushes out the issue that IDEEP cannot handle tensor with dims like (0, 2), which is a valid tensor shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8459

Differential Revision: D10419328

Pulled By: yinghai

fbshipit-source-id: c5efcd152364a544180a8305c47a2a2d126ab070
2018-10-19 23:57:33 -07:00
7dbb38e856 Moving logging from caffe2 to c10. (#12881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881

TSIA. This should not change any functionality.

Remaining work:
- change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead.
- Unify the exception name (EnforceNotMet -> Error)
- Unify the logging and warning APIs (like AT_WARNING)

Reviewed By: dzhulgakov

Differential Revision: D10441597

fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d
2018-10-19 20:22:08 -07:00
d120b9af5a Make c10d pickling/unpickling work (#12694)
Summary:
This fixes the issue for https://github.com/pytorch/pytorch/issues/12168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12694

Differential Revision: D10468717

Pulled By: teng-li

fbshipit-source-id: 3df31d75eea19d6085af665f5350d3cb667a5048
2018-10-19 16:42:36 -07:00
8cb0848bdc expose delete_node (#12840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12840

Add binding for delete_node

Reviewed By: duc0

Differential Revision: D10453555

fbshipit-source-id: cdcaca8420a9a0c61479961d907ef6bb5478a41d
2018-10-19 13:30:50 -07:00
202893fe1a Migrate DeviceOption.numa_node_id to DeviceOption.device_id
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12717

Reviewed By: ilia-cher

Differential Revision: D10408325

fbshipit-source-id: 82583d0ad4b8db094ee4c5c607b52500826328f7
2018-10-19 12:45:48 -07:00
7921e16ca2 Revert D10421896: restore caffe2 strides
Differential Revision:
D10421896

Original commit changeset: b961ea0bca79

fbshipit-source-id: 9d9d2ed0c2cb23a3fdf6bbfc9509539aeeb7e382
2018-10-19 12:15:44 -07:00
bf99ffc4d2 Remove OMP_NUM_THREADS and MKL_NUM_THREADS settings from docker images (#12836)
Summary:
`OMP_NUM_THREADS` and `MKL_NUM_THREADS` are set to 4 by default in the docker images, which causes `nproc` to only show 4 cores in the docker containers by default, and building PyTorch is slow in this default case. We likely don't need these two flags to be set, and this PR tests that hypothesis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12836

Differential Revision: D10468218

Pulled By: yf225

fbshipit-source-id: 7a57962c962e162a8d97f730626825aa1e371c7f
2018-10-19 11:44:22 -07:00
14ff866505 Optimize GroupNormOp (#12844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12844

Optimize GroupNormOp

Reviewed By: houseroad

Differential Revision: D10455567

fbshipit-source-id: aee211badd1e0c8ea6196843e3e77f7c612a74d5
2018-10-19 11:40:12 -07:00
f3e1fe5ca5 add string as supported input / output of script functions (#12731)
Summary:
Add strings to our set of built-in types for annotations. This is used in the the functional library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12731

Differential Revision: D10453153

Pulled By: eellison

fbshipit-source-id: f54177c0c529f2e09f7ff380ddb476c3545ba5b0
2018-10-19 11:17:19 -07:00
186219a643 restore caffe2 strides (#12845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12845

Attempting to do this again.

Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that.

Reviewed By: ezyang

Differential Revision: D10421896

fbshipit-source-id: b961ea0bca79757991013a2d60cfe51565689ee9
2018-10-19 10:00:16 -07:00
68f4a4b3ba Delete THCStreamGuard in favor of CUDAGuard, also c10d code cleanup (#12849)
Summary:
I got annoyed at waiting for OSS to tell me my c10d builds were busted, so
I also added support for building the test scripts in fbcode and fixed the
warnings this uncovered.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12849

Reviewed By: pietern

Differential Revision: D10457671

fbshipit-source-id: 5b0e36c606e397323f313f09dfce64d2df88faed
2018-10-19 09:48:41 -07:00
6ec2f09188 CircleCI: enable OSX jobs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12667

Differential Revision: D10466661

Pulled By: yf225

fbshipit-source-id: a1a150d3b384eb88ba4c7e6d57e59d8ed834e53c
2018-10-19 09:42:06 -07:00
7837ec553c CircleCI: Add doc-push job
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12833

Differential Revision: D10464815

Pulled By: yf225

fbshipit-source-id: 06a6a673b6bb32f7c252a217f9ce59db35c75e9c
2018-10-19 08:58:04 -07:00
6190408e24 caffe2: UpsampleBilinear support for scales (#12736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736

This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720.

Reviewed By: houseroad

Differential Revision: D10416228

fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957
2018-10-19 08:55:55 -07:00
d736f4f0a7 Kill 'python_name' in Declarations.cwrap. (#12832)
Summary:
I'm trying to do some transformations on Declarations.cwrap and this makes things overly difficult and doesn't do anything useful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12832

Reviewed By: ezyang

Differential Revision: D10450771

Pulled By: gchanan

fbshipit-source-id: 1abb1bce27b323dd3e93b52240e7627cd8e56566
2018-10-19 08:47:27 -07:00
31232061aa Use C local in lexer (2) (#12838)
Summary:
trying again without xlocale.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12838

Differential Revision: D10453078

Pulled By: zdevito

fbshipit-source-id: 760852c82e16acee7d1abb8a918822bf5ff59bca
2018-10-19 00:25:35 -07:00
373b5080da Warn that tensor.resize_() resets strides (#12816)
Summary:
As discussed in #1570, this adds a warning to the docstring of `tensor.resize_()` to prevent people from naively using it as an in-place view or reshape.

For your convenience, the updated docstring renders as follows:
![torch_resize_docstring](https://user-images.githubusercontent.com/629706/47148782-f1b57900-d2d1-11e8-9749-e9c7387113ed.png)

Fixes #1570.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12816

Differential Revision: D10457755

Pulled By: ezyang

fbshipit-source-id: dd4b3a821e8c76dc534d81c53084abdb336e690a
2018-10-18 22:47:30 -07:00
d783249674 Revert D10457796: [pytorch][PR] fix typo
Differential Revision:
D10457796

Original commit changeset: 9d1582c11c2e

fbshipit-source-id: 9be38e999a2783dae4a387821806e6850b6a3671
2018-10-18 21:48:14 -07:00
ca5dc9f13a Add py2 compatibility for builtins import (#12784)
Summary:
Testing if this is a solution for the issue reported at https://github.com/pytorch/pytorch/pull/12504#issuecomment-430758448
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12784

Differential Revision: D10454398

Pulled By: jamesr66a

fbshipit-source-id: a0304acde5df438c08cceb2d5280933de24664c4
2018-10-18 20:54:23 -07:00
aa6f47e229 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12814

Differential Revision: D10457796

Pulled By: ezyang

fbshipit-source-id: 9d1582c11c2e6dec5ff1c87525fac127a7e77273
2018-10-18 20:42:08 -07:00
f47d12b0ef shape_as_tensor should return a CPU tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12846

Differential Revision: D10456885

Pulled By: jamesr66a

fbshipit-source-id: fa66d0736cfb0ed09e566ae7c2eaeac37f8bb0e4
2018-10-18 20:20:00 -07:00
40ff69b796 Add attribute exhaustive_search in caffe2 blacklist args (#12815)
Summary:
Currently while converting from caffe2 to onnx it doesn't
    blacklist the exhaustive_search attribute in support_onnx_export.
    So conversion fails when onnx model is verified using C.check_model.

Signed-off-by: Parth Raichura <parth.raichura@softnautics.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12815

Differential Revision: D10457777

Pulled By: ezyang

fbshipit-source-id: dc2183d8abef8cd753b348f2eaa62c952a058920
2018-10-18 19:53:40 -07:00
8a35aafca6 Try to fix randomness.rst formatting again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12853

Differential Revision: D10458439

Pulled By: SsnL

fbshipit-source-id: ebd259e598327b0c5d63de6b7c182781fe361fbd
2018-10-18 19:18:49 -07:00
0fa69c0276 Remove the protobuf library in pytorch linking list. (#12451)
Summary:
There will be a link error when the caffe2 doesn't use its protobuf under third_party. The pytorch will always link that protobuf. The pytorch doesn't use the protobuf directly. We could remove it from
the list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12451

Differential Revision: D10262676

Pulled By: ezyang

fbshipit-source-id: c2ff3fdf757fc21ed689e7f663c082064b1a0bca
2018-10-18 18:31:51 -07:00
a85174b46a Fix randomness.rst formatting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12850

Differential Revision: D10457694

Pulled By: SsnL

fbshipit-source-id: fa64964ff6d41625d9383ca96393017230e4ee0f
2018-10-18 18:26:26 -07:00
87d3d209a6 Enable JIT tests in fbcode (#12777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12777

Enables JIT tests in FBCode. Changes pybind11 code to avoid mixing py::args with positinally matched arguments because old versions of PyBind11 leak memory in this case.

Reviewed By: jamesr66a

Differential Revision: D10419708

fbshipit-source-id: 74bc466001b5d363132d1af32e96841b38601827
2018-10-18 18:18:37 -07:00
99bc541b5b size_from_dim(0) is like numel() but worse. Don't do it. (#12729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12729

This may have a dependency on D10380678 if size_from_dim(0)
was required because numel() used to return -1 in some cases.
This is no longer true.

Reviewed By: li-roy, dzhulgakov

Differential Revision: D10415069

fbshipit-source-id: 39f46f56249ecaf3533f62a0205b3a45d519d789
2018-10-18 18:06:37 -07:00
89bf98ac4c Update '__all__' in '__init.py__' (#12762)
Summary:
It's the best coding practice to always include dynamically declared module level methods in the "__all__" field. Otherwise,  IDEs (such as PyCharm) with referenced module inspectors will complain  "Cannot find reference ..." .

This PR adds 'rand' and 'randn' in __init.py__' .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12762

Differential Revision: D10427541

Pulled By: ezyang

fbshipit-source-id: ec0704dfd91e78d7ad098b42cfd4bd1ad0e119df
2018-10-18 17:52:10 -07:00
a223c5ed2c Extend ONNX while op by x2, rather than x1.02
Summary:
I think the original author wrote 2.0f in attempt to double in size, but this argument takes a percentage increase, not a factor increase.

Created from Diffusion's 'Open in Editor' feature.

Reviewed By: jamesr66a

Differential Revision: D10412946

fbshipit-source-id: 95eb3d284255f232b7782bb1d2c9c2ef8aa6f8a7
2018-10-18 17:49:51 -07:00
f9d1b63d18 Automatic update of fbcode/onnx to f8828e532da4795e8ea15f5850a37c5179917b9b (#12823)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12823

Previous import was 1cbe2743cda739ff752d6ce79553b0ef8ad49783

Included changes:
- **[f8828e5](https://github.com/onnx/onnx/commit/f8828e5)**: Use vector instead of set to keep the order of the opt passes (#1524) <Lu Fang>
- **[b5a37c4](https://github.com/onnx/onnx/commit/b5a37c4)**: Pin awscli to last known good version (#1518) <bddppq>
- **[3e219f6](https://github.com/onnx/onnx/commit/3e219f6)**: ONNX Optimization Rewrite (#1452) <Armen>
- **[96758c9](https://github.com/onnx/onnx/commit/96758c9)**: Add MaxUnpool op to ONNX. (#1494) <Spandan Tiwari>
- **[c4f7043](https://github.com/onnx/onnx/commit/c4f7043)**: Update docker image version used in CircleCI (#1511) <bddppq>

Differential Revision: D10447573

fbshipit-source-id: 8748ba6e3be322a26a9a360ff7f2babd54fd581f
2018-10-18 16:17:25 -07:00
f380f0ba27 Move torch.onnx.operators functions into ATen (#12803)
Summary:
These were indiscriminately dumping `onnx::` instructions into traces, and making it so you couldn't run the traces in the JIT interpreter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12803

Differential Revision: D10443526

Pulled By: jamesr66a

fbshipit-source-id: 07172004bf31be9f61e498b5772759fe9262e9b3
2018-10-18 16:04:34 -07:00
79709f02e9 fix overwriting of CMAKE_EXE_LINKER_FLAGS (#12834)
Summary:
bug lurking since 2016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12834

Reviewed By: bddppq

Differential Revision: D10452484

Pulled By: anderspapitto

fbshipit-source-id: 352584af06e2fb35338fb66b3d8eb1050b716349
2018-10-18 15:34:28 -07:00
92890d4314 Delete ExtendTensor operator
Summary: Added 2 years ago in D3665603, never used, kill it.

Reviewed By: ezyang

Differential Revision: D10421336

fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8
2018-10-18 15:18:40 -07:00
324a510f9c JIT Cleanups (#12804)
Summary:
1. Change scope ownership model so they can be shared across Graphs.
   Now scopes own their parent and are intrusive pointers. Graphs
   no longer require a scope_root and cloning a node automatically
   clones its scope. This causes some changes in expect files for
   trace+script things. As far as I can tell these are not bugs but
   a different way of interpreting how scopes should propagate.
   Big traces like that of alexnet keep their scopes unchanged.
2. Remove VariableType.cpp dependency on a symbol being in the pre-
   declared symbol list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12804

Differential Revision: D10447922

Pulled By: zdevito

fbshipit-source-id: dcfcaf514bbe5687047df0f79c2be536ea539281
2018-10-18 14:41:55 -07:00
6058886b03 Speedup pnorm (#12811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12811

L1 version of this operator was super slow and timing out one of our
unit-tests. This diff is addressing TODO and making it fast.

Reviewed By: chocjy

Differential Revision: D10444267

fbshipit-source-id: 550b701b6a5cb3f2540997fd7d8b920400b983a6
2018-10-18 14:22:55 -07:00
68843c683d Open source multithreaded predictor bench utils (#11135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11135

This diff does not have any logic change; it simply move files/functions/classes around.
Open source (almost all) necessary dependency for multithreaded predictor bench.
The benchmark itself can be open sourced once the predictor is open sourced.

Reviewed By: salexspb

Differential Revision: D9602006

fbshipit-source-id: 386c9483e2c64c8b7d36e4600189c4e0b7e159ff
2018-10-18 14:16:36 -07:00
ee563c5899 Add license reference to README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12822

Differential Revision: D10451895

Pulled By: JoelMarcey

fbshipit-source-id: dee4cafd3120571e52cf242bb0674c7aa7dab217
2018-10-18 14:10:24 -07:00
9473e57eca Revert D10444104: [pytorch][PR] Windows CI integration for custom ops
Differential Revision:
D10444104

Original commit changeset: 4c447beeb967

fbshipit-source-id: ead52444aefa27692e3f36dadad986e2313261bd
2018-10-18 14:08:18 -07:00
ed317b6203 Remove useless MKL target (#12783)
Summary:
Context: https://github.com/pytorch/pytorch/pull/12625#issuecomment-430560919
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12783

Differential Revision: D10451726

Pulled By: yinghai

fbshipit-source-id: 3cd1e61209628d7c52b440e5b232ae95dd09885e
2018-10-18 14:03:34 -07:00
805f4d5cb8 Revert D10416438: Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE
Differential Revision:
D10416438

Original commit changeset: cb842e3e26b0

fbshipit-source-id: c0760e73ecc76ca9b1b74f6844e243c2df5260a2
2018-10-18 13:46:33 -07:00
57ddc08a57 Enable multiple external output (#12778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12778

att

Differential Revision: D10248027

fbshipit-source-id: fc3d17314e8c2d9704b8bfcc50ace176ec2c85d7
2018-10-18 13:36:23 -07:00
dec9bc5f0b Expose device_option directly
Summary: as title states

Reviewed By: duc0

Differential Revision: D10442424

fbshipit-source-id: bba2dd600e1979ff018ac0e403463f992a94a6e5
2018-10-18 13:22:17 -07:00
63cd051867 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (#12799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Reviewed By: ezyang

Differential Revision: D10416438

fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58
2018-10-18 12:49:01 -07:00
2c566a17c7 nomnigraph - simplify subgraph matching APIs (#12681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12681

- Get rid of NodeMatchCriteria as a template parameter, which was too generic. So MatchNode<NodeMatchCriteria> becomes MatchNode<GraphType>, and MatchStore stores the predicate on GraphType::NodeRef.

- Similarly, get rid of NNNodeMatchCriteria

Now one can just pass in a function pointer NodeRef -> bool to NNMatchNode constructor directly like this

mg.createNode(is<Relu>)

- Merge static utilities in SubgraphMatcher class into MatchGraph class

- Rename MatchNode to MatchPredicate

Change use cases and tests to make it work

Reviewed By: ZolotukhinM

Differential Revision: D10386907

fbshipit-source-id: 43874bd154e3d7c29ce07b4b74eca8a7a9f3078a
2018-10-18 12:32:40 -07:00
9c617140f7 Try to reduce c10d test flakiness (#12782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12782

We have seen the "Address already in use" error popup a few times when instantiating the TCPStore. The port that it uses is dynamically generated through common.find_free_port(), which binds a new socket to a random port, closes the socket, and returns the port that the OS had assigned. If some other process grabs that port in the time between closing the socket and the TCPStore binding to it, the bind error shows up. This commit changes most tests to use the FileStore instead and includes a retry when testing the TCPStore.

Differential Revision: D10433401

fbshipit-source-id: 8dd575ac91a3cddd1cc41ddb0ff4311ddc58c813
2018-10-18 12:12:33 -07:00
3fe35300ed Revert D10417038: [pytorch][PR] Use C locale in lexer
Differential Revision:
D10417038

Original commit changeset: 1d5f2f9a24ec

fbshipit-source-id: 5780fed8e29551ec5b0a56ad6966a560c02bc171
2018-10-18 11:45:18 -07:00
545f22c070 Link libshm against c10 (#12802)
Summary:
Fixes this build failure i got: https://gist.github.com/jamesr66a/1e0025d8d6d30b090f0e247457063093
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12802

Differential Revision: D10447916

Pulled By: jamesr66a

fbshipit-source-id: ab2cddff95429881db992c04e80453a46eb81f79
2018-10-18 11:38:42 -07:00
5b971445a6 Typo fix (#12826)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12826

Differential Revision: D10449047

Pulled By: ezyang

fbshipit-source-id: eb10aa5886339b43bb8c239dd8742e458f3d024d
2018-10-18 11:36:00 -07:00
2b63b7a0a5 Support GPU version of Spatial Batch Norm (#11711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11711

Added GPU support for spatial batch normalization. This functions by reducing values from GPUs onto a CPU and broadcasting those results back to each GPU. We have run several experiments, and found these results to be better than those without spatial bn: https://fb.quip.com/fr7HAeDliPB8

Reviewed By: enosair

Differential Revision: D9547420

fbshipit-source-id: ccbd2937efd6cfd61182fff2f098fb7c5ae8aeb1
2018-10-18 11:22:13 -07:00
e240e89984 move the torch/csrc/jit/serialization.h to caffe2 source folder and rename to inline_container.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12781

Reviewed By: dzhulgakov

Differential Revision: D10436151

Pulled By: houseroad

fbshipit-source-id: 7f59eec21df5acbab0ea693e1a1cd4fa152f05e5
2018-10-18 09:47:19 -07:00
963b012bd8 nomnigraph - HEFT scheduler (#12788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12788

Static task scheduling algorithm

- Input/Output for static scheduler
- HEFT static scheduling algorithm
- Theoretical critical path analyzer

Reviewed By: bwasti

Differential Revision: D10436418

fbshipit-source-id: 074bc587b9a2c7cb2d9e64291981ff1c160f02b2
2018-10-18 08:40:46 -07:00
12be60cc04 Windows CI integration for custom ops (#11527)
Summary:
This is likely currently broken due to symbol visibility issues, but we will investigate it using this PR.

CC orionr yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11527

Differential Revision: D10444104

Pulled By: goldsborough

fbshipit-source-id: 4c447beeb9671598ecfc846cb5c507ef143459fe
2018-10-18 07:55:05 -07:00
eb6a1245a2 Fix torch::jit::load docs (#12709)
Summary:
`torch::jit::load` is currently incorrectly documented/rendered

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12709

Differential Revision: D10422064

Pulled By: goldsborough

fbshipit-source-id: 4b195a84847d731ae3fe2d40868ebe858d510a2e
2018-10-18 07:52:13 -07:00
b1a6fa90e1 Add script::Module::to (#12710)
Summary:
There is currently no obvious way for users to move their `script::Module` to GPU memory. This PR implements the `to()` functions that C++ frontend modules have.

zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12710

Differential Revision: D10444103

Pulled By: goldsborough

fbshipit-source-id: daa0ec7e7416c683397ee392c6e78b48273f72c7
2018-10-18 07:48:51 -07:00
710191e292 fix error message of large kernel size in conv2D (#12791)
Summary:
- fix #12565
- test plan:
with this fix, we have:
```
>>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True)
>>> input = torch.randn(1, 3, 1, 1)
>>> output = m(input)
```
RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50

not sure why these are `int` instead of `int64_t`:
5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791

Differential Revision: D10443045

Pulled By: weiyangfb

fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c
2018-10-18 00:51:16 -07:00
f1e7d384b6 Support scales as inputs in ResizeNearest (#12720)
Summary:
To address https://github.com/onnx/onnx/pull/1467
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12720

Reviewed By: BIT-silence

Differential Revision: D10414813

Pulled By: houseroad

fbshipit-source-id: 8831381b0115c363065c8d23bd1a95b4d641b857
2018-10-17 23:08:53 -07:00
f4944f0f8a Rename test/common.py to test/common_utils.py (#12794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794

common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.

Reviewed By: orionr

Differential Revision: D10438204

fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
2018-10-17 23:04:29 -07:00
cffeb03a2d fix forward and backward for norm with negative infinity norm (#12722)
Summary:
I found a bug in norm() and fixed it (and added tests to make sure it's fixed)
here is how to reproduce it:
```python
import torch
x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]])
print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000,  0.0000, 11.9853]])
print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong!
from numpy.linalg import norm as np_norm
x = x.numpy()
print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]])
print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]])
```
it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722

Differential Revision: D10427687

Pulled By: soumith

fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803
2018-10-17 21:07:43 -07:00
ed5eb7196b Add quantized GroupNormOp (#11852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11852

Add quantized GroupNormOp

Reviewed By: houseroad

Differential Revision: D9931468

fbshipit-source-id: 02af82d98356a49736e44162042783c9e36a81b5
2018-10-17 18:32:44 -07:00
08aab4dfdd remove ATen/Error.h and ATen/core/Error.h (#12792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12792

This is a follow up diff after D10238910.

Only non-codemod change is the removal of ATen/Error.h and ATen/core/Error.h. Other files are basically changing the inclusion path + clang format for inclusion order.

Reviewed By: bddppq

Differential Revision: D10437824

fbshipit-source-id: 7f885f80ab5827468d1351cfb2765d0e3f555a69
2018-10-17 17:25:42 -07:00
cd88c5ccf4 CircleCI hot fix: pin awscli to 1.16.35 (#12787)
Summary:
awscli==1.16.36 is broken: https://circleci.com/gh/pytorch/pytorch/77338?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12787

Differential Revision: D10437424

Pulled By: yf225

fbshipit-source-id: c15bed7aa83ddca92ff32e2aaa69fbe97ac6ab1c
2018-10-17 15:57:52 -07:00
84ce3ab47e Add MAE and L2 loss to docs (#12754)
Summary:
Fixes #12751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12754

Differential Revision: D10427661

Pulled By: ezyang

fbshipit-source-id: 75bbef85976e253ab5a7140fc57f7a0ad34d96f5
2018-10-17 15:40:20 -07:00
5ccdd7a626 Support cmake3 for 14.04 and CentOS (#12771)
Summary:
Fix https://github.com/caffe2/caffe2.github.io/issues/24

cc pjh5 anderspapitto soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12771

Reviewed By: anderspapitto

Differential Revision: D10430865

Pulled By: orionr

fbshipit-source-id: 10c03cd25ab9faad49d53d0f18dd9566bfd28ae2
2018-10-17 15:02:19 -07:00
21ff6de4b3 Add missing HANDLE_TH_ERRORS (#12770)
Summary:
THPSize_pynew is called from the Python C API and may throw exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12770

Differential Revision: D10431180

Pulled By: colesbury

fbshipit-source-id: 93dd1b604ac6bc05d4eb02b97e3f79a73aec73c5
2018-10-17 13:52:02 -07:00
ab1a25aa9b caffe2::empty for Resize+mutable_data refactor (#12407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12407

We want to use tensor factory to refactor the caffe2's old way of initialize Tensor by Resize and mutable_data
in order to eliminate uninitialized Tensor.

Previously when we want to create a Tensor in caffe2, we'll do the following
```
Tensor x(CPU); // device type provided
x.Resize({1, 2, 3}); // size provided
x.mutable_data<float>(); // data type provided and memory allocated
```
This leaves Tensor in not fully initialized state during the process, to eliminate this, we
want to provide all the needed information in the begining. ATen already has its TensorFactories: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp, and there is a TensorOption, we want to adopt the same interface to ease future refactoring.

In the callsite, we used to have `Output(i)` that returns a `Blob` that contains an uninitialized `Tensor` and we'll call Resize and mutable_data afterwards to provide dimension and data type,
```
// uninitialized tensor
auto* Y = Output(0);
// set dimensions
Y->Resize({1, 2, 3});
// actually allocate the data
auto* data = Y->mutable_data<float>();
// After this step, Tensor is fully initialized.
```
We want to change it to the following:
```
// provide dimensions and TensorOptions which include device type and data type.
// This will set all the information of Tensor properly and also allocate memory.
auto* Y = Output(0, {1, 2, 3}, at::device({context_.device_type()}).template dtype<T>());
// Tensor is fully initialized after this step

// following `mutable_data` call won't allocate memory.
auto* data = Y->mutable_data<float>();
```

microbenchmarks
```
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative  time/iter  iters/s
============================================================================
OperatorNewOutputTensorAPI                                   3.27us  306.05K
OperatorOldOutputTensorAPI                                   3.55us  281.54K
============================================================================
```

Reviewed By: ezyang

Differential Revision: D10207890

fbshipit-source-id: f54ddacaa057b7c6bc7d5a8290171f35e9e40e29
2018-10-17 13:03:06 -07:00
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
348867c10b Remove cereal submodule (#12666)
Summary:
Cereal is dead!

soumith orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12666

Reviewed By: soumith

Differential Revision: D10422061

Pulled By: goldsborough

fbshipit-source-id: ca1ac66d05e699df9de00fc340a399571b7ecb9f
2018-10-17 11:52:47 -07:00
dd7501e3a8 Remove Blob::ShareExternal from serialization (#11926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11926

With the preparation work of diffs stacked below, we're now able to remove this call to Blob::ShareExternal(),
preparing for removing that function from Blob,

Reviewed By: dzhulgakov

Differential Revision: D9884563

fbshipit-source-id: 7dd5c5fe02be0df7a44be45587c1dd7c474126ef
2018-10-17 11:50:35 -07:00
6cbf1992bd Serialization takes pointers instead of Blob (#11925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11925

This is step 1 in the refactoring to remove Blob::ShareExternal(), i.e. Blob would then always own its contents.

ShareExternal() is for example used to pass non-owning blobs to serialization. This diff prepares removing that.

Reviewed By: ezyang

Differential Revision: D9884177

fbshipit-source-id: d01df9a613a4fc62e5679fe45bfc47e2c899b818
2018-10-17 11:50:34 -07:00
25db86cca5 Fix isfinite for int input (#12750)
Summary:
`torch.isfinite()` used to crash on int inputs.
```
>>> import torch
>>> a = torch.tensor([1, 2])
>>> torch.isfinite(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/pytorch/torch/functional.py", line 262, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: value cannot be converted to type int64_t without overflow: inf
```
But this is a easy special case and numpy also supports it.
```
>>> import numpy as np
>>> a = np.array([1, 2])
>>> a.dtype
dtype('int64')
>>> np.isfinite(a)
array([ True,  True], dtype=bool)
```
So added a hacky line to handle non-floating-point input. Since pytorch raises exception when overflow, we can safely assume all valid int tensors are infinite numbers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12750

Differential Revision: D10428204

Pulled By: ailzhang

fbshipit-source-id: f39b2d0975762c91cdea23c766ff1e21d85d57a5
2018-10-17 11:48:25 -07:00
9a76e84a08 Use C locale in lexer (#12739)
Summary:
Possible fix for #11326. Testing in CI for windows code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12739

Differential Revision: D10417038

Pulled By: zdevito

fbshipit-source-id: 1d5f2f9a24eceef7047dc218669faca8a187c65c
2018-10-17 10:42:38 -07:00
459cff93fe fix math formula for conv1d and conv2d (#12740)
Summary:
- fix math formula
- test plan: build html and view on a browser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12740

Differential Revision: D10419430

Pulled By: weiyangfb

fbshipit-source-id: b8eee9e75c3ce6e37535e3de597431ef5030e9ac
2018-10-17 10:24:11 -07:00
e027f7a913 Fix character with wrong encodding in documentation (#12761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12761

, is not really , and thus it can fail some of the Python 2 import.

Reviewed By: weiyangfb

Differential Revision: D10423231

fbshipit-source-id: 3738c0b9d2f52aa47eef06250f84c5933a38783f
2018-10-17 10:20:45 -07:00
9d79030d38 Fixup THPUtils_unpackIndex (#12738)
Summary:
See https://github.com/pytorch/pytorch/issues/12735
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12738

Differential Revision: D10416682

Pulled By: jamesr66a

fbshipit-source-id: 69f3452750dffda3cfed50463d9241fd7b52528b
2018-10-17 10:16:54 -07:00
409ee5bcd9 Remove redundant semicolon
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12753

Differential Revision: D10427674

Pulled By: ezyang

fbshipit-source-id: f790dbbafc6b1965c4e1368f311076ea045555de
2018-10-17 09:52:48 -07:00
1a6071d436 fixing seq to tensors in documentation (#12741)
Summary:
Fixes #12251

In the docs the actual key word argument was supposed to be `tensors` but instead it is given as `seq` for doing `torch.cat` operation.

zou3519 can you review this code? I don't have access to request for code reviews.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12741

Differential Revision: D10419682

Pulled By: ezyang

fbshipit-source-id: a0ec9c3f4aeba23ac3a99e2ae89bd07d2b9ddb58
2018-10-17 09:16:04 -07:00
7edfe11ba4 Use TypeMeta::dtor() instead of Blob::DestroyCall (#11500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11500

Since TypeMeta already stores a destructor, and we removed the ability from Blob to store a custom destructor in a diff stacked below this, there is now no reason for Blob to store it again.

Reviewed By: ezyang

Differential Revision: D9763423

fbshipit-source-id: d37a792ffd6928ed1906f5ba88bd4f1d1e2b3781
2018-10-17 06:21:46 -07:00
7b7bf09e3c Add TypeMeta::New/Delete (#12307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12307

This adds non-placement variants of New/Delete to TypeMeta.
In a future diff, this is going to be used from Blob to destruct its contents.

Reviewed By: dzhulgakov

Differential Revision: D10184116

fbshipit-source-id: 7dc5592dbb9d7c4857c0ec7b8570329b33ce5017
2018-10-17 06:21:45 -07:00
90737f7f5d Fix missing final activation in NLLLoss second example (#12703)
Summary:
Fixed the second example in NLLLoss.
The LogSoftmax activation was missing after the convolution layer. Without this activation, the second example loss was sometimes negative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12703

Differential Revision: D10419694

Pulled By: ezyang

fbshipit-source-id: 98bfefd1050290dd5b29d3ce18fe075103db4674
2018-10-17 02:57:39 -07:00
0521c47c91 Amend nondeterminism notes (#12217)
Summary:
include atomicAdd commentary as this is less well known

There is some discussion in #12207

Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217

Differential Revision: D10419739

Pulled By: SsnL

fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e
2018-10-16 23:59:26 -07:00
8c873def88 Revert D10220313: restore caffe2 strides
Differential Revision:
D10220313

Original commit changeset: aaf9edebf4ff

fbshipit-source-id: 46c4d23d89d47be26c3f4967476271d8c2f95f11
2018-10-16 23:57:20 -07:00
70c527dacd Re-disable softmax ops tests in ROCM (#12749)
Summary:
They are flaky in master.

ashishfarmer petrex

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12749

Differential Revision: D10420265

Pulled By: bddppq

fbshipit-source-id: cac58efb711941786b10b07ada58e0d59ab1db1d
2018-10-16 22:54:50 -07:00
034c969f3c Simply exit DataLoader when Python is dying (#12700)
Summary:
I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed.

Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children.

An `atexit` hook is used to detect Python exit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700

Differential Revision: D10419027

Pulled By: SsnL

fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0
2018-10-16 22:05:33 -07:00
d34578026c Various example code fixes (#12707)
Summary:
- Fix broken sparse_coo_examples, update output
- Tensor(...) to tensor(...)
- Fix arguments to math.log to be floats

While the last might be debateable, mypy currently complains when passing an int to math.log. As it is not essential for our examples, let's be clean w.r.t. other people's expectations.

These popped up while checking examples in the context of  #12500 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12707

Differential Revision: D10415256

Pulled By: SsnL

fbshipit-source-id: c907b576b02cb0f89d8f261173dbf4b3175b4b8d
2018-10-16 21:59:40 -07:00
c8ac878b98 Fix bug in script for where (#12385)
Summary:
Where is declared as:

```
where(Tensor condition, Tensor self, Tensor other)
```

Previously the compiler assumed that self must be the first argument.
But this is not true in practice for `where` and for a few other exceptions.

This changes the compiler to take an explicit self argument which gets matched
to the `self` that appears in the schema.

Note that this requires renaming a variant of pow, which referred to
an exponent Tensor as `self` because otherwise that would cause `t^3`
to match against `t` being the exponent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12385

Differential Revision: D10364658

Pulled By: zdevito

fbshipit-source-id: 39e030c6912dd19b4b0b9e35fcbabc167b4cc255
2018-10-16 21:05:14 -07:00
84edd4a48b Enable mapping from operatordef to converted node for debugging
Summary: Add a mapping for conversion -- this will help with debugging as well but is directly used by the TUI stacked on top of this

Reviewed By: duc0

Differential Revision: D10396130

fbshipit-source-id: cdd39278f0ed563bb828b1aebbbd228f486d89c8
2018-10-16 21:03:28 -07:00
1bf642800d Remove duplicate descriptors (#8321)
Summary:
This PR removes some duplication in `recurrent_op_cudnn.cc`. Instead of 4 of the same exact descriptor, should work fine with just 1. I don't see any other code that relies on those being 4 separate locations, but if that is what you need you can always allocate additional descriptors as necessary.

Have not fully tested this thing out, just something I noticed when I was reading through the descriptor  code.

Cheers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8321

Differential Revision: D10363744

Pulled By: ezyang

fbshipit-source-id: 733c8242fb86866f1d64cfd79c54ee7bedb03b84
2018-10-16 20:59:00 -07:00
e497aa1e35 Optimize UpsampleNearest Op (#12151)
Summary:
Optimize the UpsampleNearest Op.
1. Add OMP
2. revise the translated_idx method
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12151

Differential Revision: D10362856

Pulled By: ezyang

fbshipit-source-id: 535a4b87c7423942217f2d79bedc463a0617c67a
2018-10-16 20:34:20 -07:00
ba25e13782 Forbid Module.to with copy argument. (#12617)
Summary:
Module.to uses the Tensor.to parsing facility.
It should not, however, accept "copy" as a keyword/fourth positional
argument.

See #12571 for discussion.

Thank you SsnL for noticing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617

Differential Revision: D10392053

Pulled By: ezyang

fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c
2018-10-16 20:31:44 -07:00
5416260b1e Add the OpenMP optimization for BatchPermutation. (#12153)
Summary:
This is for Caffe2 optimization.
WIth this optimization, the following two ops can boost a lot. (Test with MaskRCNN, on SKX8180 one socket)
BatchPermutation op: reduced from 8.296387 ms to 1.4501984 ms.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12153

Differential Revision: D10362823

Pulled By: ezyang

fbshipit-source-id: 04d1486f6c7db49270992cd8cde41092154e62ee
2018-10-16 20:23:09 -07:00
3709734b1c Improve reporting on pytest. (#12610)
Summary:
Before and after coming after I run the tests on CI

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12610

Differential Revision: D10419483

Pulled By: ezyang

fbshipit-source-id: 5543e971f8362e4cea64f332ba44a26c2145caea
2018-10-16 20:15:01 -07:00
3bfa7258b3 Don't serialize hooks (#11705)
Summary:
Fixes #11683.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705

Differential Revision: D9833057

Pulled By: ezyang

fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6
2018-10-16 20:11:03 -07:00
b1892226aa A quick rundown of codebase structure. (#12693)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12693

Differential Revision: D10419424

Pulled By: ezyang

fbshipit-source-id: dc3999253f19b5615849619bd3e4a77ab3ca984e
2018-10-16 20:02:27 -07:00
0054df19b1 Simplify InheritOnnxSchema registration (#12696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12696

In majority of the case, we use `InheritOnnxSchema(type_)`. This diff makes declaration of such case easier.

Reviewed By: bddppq

Differential Revision: D10395109

fbshipit-source-id: 914c1041387d5be386048d923eb832244fc506c3
2018-10-16 19:59:49 -07:00
81975a497f update docs for sparse tensor (#12221)
Summary:
- update docs examples at sparse tensor after print format changed
- update example to create empty sparse tensor:
```
>>> torch.sparse_coo_tensor(torch.LongTensor(size=[1,0]), [], torch.Size([1]))
tensor(indices=tensor([], size=(1, 0)),
       values=tensor([], size=(0,)),
       size=(1,), nnz=0, layout=torch.sparse_coo)
```

zou3519 SsnL yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12221

Differential Revision: D10412447

Pulled By: weiyangfb

fbshipit-source-id: 155b8cb0965f060e978f12239abdc1b3b41f6ab0
2018-10-16 19:56:51 -07:00
dc07102b17 Check dim size preventively when doing shape inference for BatchMatMul (#12691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12691

We check input(0) but not input(1) in BatchMatMul. This may result in a protobuf exception which won't be caught by upstream and causing termination of the program. Check that with `CAFFE_ENFORCE` will be caught by upstream inference function. Plus, it will print out clean stack tracing showing where went wrong.

Reviewed By: bddppq, houseroad, BIT-silence

Differential Revision: D10391130

fbshipit-source-id: daf8dcd8fcf9629a0626edad660dff54dd9aeae3
2018-10-16 17:27:44 -07:00
50c0aedbec Don't segfault on Tensor.__delitem__ (#12726)
Summary:
The mapping protocol stipulates that when `__delitem__` is called, this is passed to `__setitem__` [(well, the same function in the C extension interface)](https://docs.python.org/3/c-api/typeobj.html#c.PyMappingMethods.mp_ass_subscript) with NULL data.

PyTorch master crashes in this situation, with this patch, it does not anymore.

Test code (careful, sefaults your interpreter):
```python
import torch
a = torch.randn(5)
del a[2]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12726

Differential Revision: D10414244

Pulled By: colesbury

fbshipit-source-id: c49716e1a0a3d9a117ce88fc394858f1df36ed79
2018-10-16 17:24:18 -07:00
6476e4598c Rename TypeMeta function pointers (#12306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12306

In a future diff, I'm going to introduce non-placement constructor and destructor to TypeMeta.
To make it less ambigous, this diff is first renaming the existing ones to PlacementXXX.

Reviewed By: dzhulgakov

Differential Revision: D10184117

fbshipit-source-id: 119120ebc718048bdc1d66e0cc4d6a7840e666a4
2018-10-16 16:45:47 -07:00
d0df1e8ec9 Remove MIOpen Softmax operator (#12727)
Summary:
This PR contains changes for:
1. Removing MIOpen softmax operator. Will be added later with the required functionality
2. Enabling softmax_ops_test on ROCm target

Differential Revision: D10416079

Pulled By: bddppq

fbshipit-source-id: 288099903aa9e0c3378e068fffe6e7d6a9a84841
2018-10-16 16:45:46 -07:00
30aaa07594 New serialization format (#12384)
Summary:
Addressed Dima's feedback.

The proposal is here: https://fb.quip.com/TbQmAuqIznCf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12384

Reviewed By: dzhulgakov

Differential Revision: D10246743

Pulled By: houseroad

fbshipit-source-id: c80db0c35d60ca32965275da705f2b1dfb2a7265
2018-10-16 16:36:58 -07:00
ac994f2c78 Fix SpectralNorm with DataParallel (#12671)
Summary:
There were two problems with SN + DP:

1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost.
2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work.

Fixes are:
1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
2. Do not call `detach_`.
3. Added comments in SN about the subtlety.
4. Added a note to the DP doc on this particular behavior of DP.

cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu

Fixes https://github.com/pytorch/pytorch/issues/11476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671

Differential Revision: D10410232

Pulled By: SsnL

fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9
2018-10-16 16:02:17 -07:00
c414eb2618 fix improper calling of ShareExternalPointer from RNN op (#12593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12593

size() returns numel_, but what we really want is nbytes(), which is the capacity.

Reviewed By: salexspb

Differential Revision: D10354488

fbshipit-source-id: f7b37ad79ae78290ce96f37c65caa37d91686f95
2018-10-16 15:58:14 -07:00
4d698cae2e Enhance shape inference in ONNXIFI transformer (#12685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12685

In this diff, we push the fake run of the net into the ONNXIFI transformer, because
1. We cannot do shape inference for every op
2. Since the net has been SSA rewritten, we cannot use shape info from outer workspace directly.

In addition, this diff adds input shape info when querying the `onnxBackendCompatibility` function.

Reviewed By: bddppq

Differential Revision: D10390164

fbshipit-source-id: 80475444da2170c814678ed0ed3298e28a1fba92
2018-10-16 14:15:46 -07:00
f53d5e0a75 Automatic update of fbcode/onnx to 1cbe2743cda739ff752d6ce79553b0ef8ad49783 (#12676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12676

Previous import was 06f6d63d5529e3a94533c9f34c402be1793420b1

Included changes:
- **[1cbe274](https://github.com/onnx/onnx/commit/1cbe274)**: fix the optimizer (#1510) <Lu Fang>
- **[481ad99](https://github.com/onnx/onnx/commit/481ad99)**: Fix TensorProto int32_data comment (#1509) <Lutz Roeder>
- **[f04fbe0](https://github.com/onnx/onnx/commit/f04fbe0)**: fix ninja external (#1507) <Rui Zhu>

Reviewed By: jamesr66a, wanchaol

Differential Revision: D10388438

fbshipit-source-id: 298100589ce226c63d4e58edf185c9227fd52c85
2018-10-16 10:24:15 -07:00
e15501fb68 fix bce_with_logits with legacy reduce (#12689)
Summary:
Fix #12624 . internal usecase of legacy `reduce`.
Add test in test_nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689

Reviewed By: ezyang

Differential Revision: D10391195

Pulled By: ailzhang

fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1
2018-10-16 09:46:58 -07:00
00f0dca4b5 restore caffe2 strides (#12381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12381

The workflow passes after D10150834, so we can restore strides.

Reviewed By: ezyang

Differential Revision: D10220313

fbshipit-source-id: aaf9edebf4ff739cbe45b2d32e77918fce47ba34
2018-10-16 09:19:42 -07:00
7035975508 fix double free exposed by latest llvm (#12697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12697

Latest LLVM started reporting double free related to this code. The stack trace: P60181558
Fix it by using the leaky Meyers' singleton

Reviewed By: meyering

Differential Revision: D10352976

fbshipit-source-id: 11afc2999235831da10c73609d1153d04742ba18
2018-10-16 07:32:08 -07:00
a9981c8477 Remove Type.tensor, Type.native_tensor. (#12687)
Summary:
They aren't needed anymore now that at::empty can handle all backends.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12687

Differential Revision: D10390740

Pulled By: gchanan

fbshipit-source-id: 521d6f92448798aa368186685662451e191c0b05
2018-10-16 07:12:16 -07:00
7d24985852 Kill is_type_dispatched. (#12684)
Summary:
All factory functions are now implemeneted in terms of TensorOptions, which is passed through Type, if necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12684

Differential Revision: D10390224

Pulled By: gchanan

fbshipit-source-id: fb536271735e6e0e542f021e407529998b0482eb
2018-10-16 07:05:49 -07:00
5b8a640d0b Update fft docs for new cache size (#12665)
Summary:
Follow up of #12553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12665

Differential Revision: D10385615

Pulled By: SsnL

fbshipit-source-id: 44fe9ec75cb735de37c56270f160a16a1d2bfb64
2018-10-16 01:47:36 -07:00
0916f4a337 Remove caffe2/submodules/cereal-rev.txt
Summary: Zero-th step in removing the cereal submodule.

Reviewed By: yns88

Differential Revision: D10385343

fbshipit-source-id: cc93c22b2cafa73f929f2f7659a6f6e66458aa7e
2018-10-16 01:42:20 -07:00
04d4ec285c Cleanup namespace that were moved to ATen accidentally (#12680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12680

torch::jit shouldn't live in aten

Reviewed By: ezyang

Differential Revision: D10389502

fbshipit-source-id: f38582e61a275edccf22845c7d709a201f6a0be1
2018-10-16 01:25:08 -07:00
eb02a1d8a7 Fix clang tidy master comparison (#12674)
Summary:
This PR makes the clang-tidy CI get its diff by comparing the current commit against the base branch that the PR is targeting.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12674

Differential Revision: D10397692

Pulled By: goldsborough

fbshipit-source-id: 7fd9e22c92dd885112cd5c003c732d1c12667157
2018-10-16 01:17:18 -07:00
31d8e5e71a Improve Python API with the addition of pythonic setters/getters
Summary:
Simple additions that make it vastly easier to use nomnigraph in
python

Reviewed By: duc0

Differential Revision: D10383027

fbshipit-source-id: 441a883b84d4c53cca4f9c6fcc70e58692b8f782
2018-10-16 00:57:54 -07:00
f2b62e113c Clean up IR.h (#12551)
Summary:
Move a lot of methods that don't have an obvious reason for being inline out-of-line.  This cleans up the header and should help reduce the problem of touching IR.h and having to rebuild the world.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12551

Differential Revision: D10384808

Pulled By: resistor

fbshipit-source-id: 314af89e3282f35fdc94fa3fd3000e3040c8cb6b
2018-10-15 21:21:39 -07:00
058c1284be Fix the symbolic for pixel shuffle (#12192)
Summary:
Using Transpose + Reshape, not using DepthToSpace, since they are not available in C2 yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12192

Reviewed By: BIT-silence

Differential Revision: D10129913

Pulled By: houseroad

fbshipit-source-id: b60ee6d53b8ee95fd22f12e628709b951a83fab6
2018-10-15 19:53:35 -07:00
a1dd608260 Reduce MAX_JOBS for pytorch rocm build to make CI more stable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12662

Differential Revision: D10393109

Pulled By: bddppq

fbshipit-source-id: e14f72ebc877b5c0f75fe5d195c8b4dbb9b111db
2018-10-15 18:12:46 -07:00
d80a3eb549 Set philox seed and offset on cuda manual_seed (#12677)
Summary:
Fixes: #12669

Thank you Changmao Cheng for reporting this on the forum with a small example!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12677

Differential Revision: D10391989

Pulled By: ezyang

fbshipit-source-id: 5aa7a705bdb8ce6511a8eb1b3a207f22741046bf
2018-10-15 17:45:59 -07:00
01a333fd7f OpenCV 4.0 Compatibility fix (#9966)
Summary:
caffe2 compiles with latest opencv 4.0 after committed changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9966

Differential Revision: D10369130

Pulled By: ezyang

fbshipit-source-id: 9a104803edca5a22e27e140a794e4b8c878ca416
2018-10-15 17:42:04 -07:00
083e037dea minor fix (#12688)
Summary:
This seems to be a typo that never got caught - no actual functionality changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12688

Differential Revision: D10391704

Pulled By: Yangqing

fbshipit-source-id: ce633776957628c4881956c5423bfab78294d512
2018-10-15 17:25:49 -07:00
23c4dbd6d7 Fix ONNX upsample mode (#12648)
Summary:
Fixes #12647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12648

Differential Revision: D10389124

Pulled By: houseroad

fbshipit-source-id: 53bc17b592d0d7f1884b555f3a12a33dbf18b4a0
2018-10-15 17:14:44 -07:00
7a52117792 Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic (#9711)
Summary:
Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic
Due to limitations in ONNX only output_size=1 is supported.
AdaptiveAvgPool2d -> GlobalAveragePool
AdaptiveMaxPool2d -> GlobalMaxPool
Fixes #5310
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9711

Differential Revision: D10363462

Pulled By: ezyang

fbshipit-source-id: ccc9f8ef036e1e54579753e50813b09a6f1890da
2018-10-15 17:02:20 -07:00
52cbf4b774 Update eigen submodule to fix CUDA arch>=5.3 build issue. (#12191)
Summary:
Discussed in #11379, #12545. Eigen submodule needs to be updated to f59336cee3 to support building with CUDA arch >= 5.3.

It seems there was a similar fix checked in from #6746, but later the Eigen submodule is switched to the current mirror #7793 at a point the fix was not included.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12191

Differential Revision: D10362557

Pulled By: ezyang

fbshipit-source-id: 548541e2c93f412bf6680ee80b8da572846f80d2
2018-10-15 17:02:19 -07:00
e22a776890 Fix for some tests (#12575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12575

Just my guess as to why those tests are failing. Waiting on sandcastle to see if the tests resolve themselves.

Reviewed By: mlappelbaum, wesolwsk

Differential Revision: D10305051

fbshipit-source-id: 455597b12bbe27dd6c16f7d0274f2c939949d878
2018-10-15 16:53:18 -07:00
0b96e5d792 Move some files to c10/util (#12245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12245

Move these files to c10/util:
- C++17.h
- Metaprogramming.h
- TypeList.h
- TypeTraits.h
- Array.h

(including .cpp files and test cases)

Reviewed By: ezyang

Differential Revision: D10139933

fbshipit-source-id: ce7ce89392bf1a6be070ffdfc0407a8a2ce4ba6e
2018-10-15 16:25:12 -07:00
ade97afc74 Re-enable IDEEP graph rewrite test (#12661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661

Was disabled since workspace.has_mkldnn is now set to false

Reviewed By: yinghai

Differential Revision: D10383913

fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686
2018-10-15 15:50:28 -07:00
ab7520eb50 Revamp and document serialization, support streams (#12421)
Summary:
This PR does three things:

1. Add support for serializing to `ostream` and deserializing from `istream`s in addition to files. This is after https://github.com/pytorch/pytorch/pull/11932 added support for streams in `torch::jit::ExportModule` and `torch::jit::load`.
2. Update the internal interface for how things get serialized into archives (e.g. use the more idiomatic `operator<<` instead of a `save` method). *The external interface does not change*.
3. Add documentation.

ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12421

Reviewed By: ezyang

Differential Revision: D10248529

Pulled By: goldsborough

fbshipit-source-id: 6cde6abd0174e3fbf3579c05376a32db0b53755f
2018-10-15 15:47:59 -07:00
03429e4eaf Update Gloo submodule to resolve __CUDA_DEPRECATED warning (#12574)
Summary:
Gloo was updated with `type` usage for cudaPointerAttributes which resolves the `__CUDA_DEPRECATED` warnings in our CUDA 10 CI. This PR brings in that change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12574

Differential Revision: D10342450

Pulled By: ezyang

fbshipit-source-id: d50564bfcd8623a20b82b0052fba441c8358c17b
2018-10-15 15:45:13 -07:00
ef18f74e20 Simplify typeid macros (#12654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12654

The previous diff managed to get the macros working, but they've been quite unmaintainable.
This diff improves the situation a bit.

- Before, there were three global variables for each registered type: type id, type name and a global type meta instance. Now, it's only type id and type meta, type name is gone. I also wanted to get rid of type id, but that doesn't work due to issues with static initialization ordering (type ids for types are requested during static initialization time, meh)
- Instead of repeating the whole CAFFE_KNOWN_TYPE macro for GCC and non-GCC because they need different export flags, define it only once and use a EXPORT_IF_NOT_GCC macro.
- The CAFFE_KNOWN_TYPE macro has to delegate to a _CAFFE_KNOWN_TYPE_DEFINE_TYPEMETADATA_INSTANCE macro, because of the counter. The pattern was copied for the macros for preallocated types. However, there we don't use a counter but use the preallocated id, so there's no need to delegate to a separate macro.

Reviewed By: ezyang

Differential Revision: D10379903

fbshipit-source-id: 50a32a5cb55ab85db49618a5f1ee4e8b06e0dfb2
2018-10-15 15:42:10 -07:00
bb35d085ef Dispatch backend-specific TensorOptions-based 'factory' functions via… (#12071)
Summary:
… Type.

This allows one to write a cpu/cuda split 'factory' function that uses TensorOptions.
Also move all remaining native_functions with either function or method variants that use Type to use TensorOptions.
Thus, there are no more Types in the public function / method API.

I believe there is a _lot_ of opportunity for cleanup here, as the old tensor, th_tensor, native_tensor and sparse variants can probably be removed, but let's do that in a follow-on patch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12071

Reviewed By: ezyang

Differential Revision: D10041600

Pulled By: gchanan

fbshipit-source-id: 30ebc17146d344bc3e32ccec7b98b391aac5470b
2018-10-15 15:21:11 -07:00
86aa6a61e0 Dedup MethodValue and FunctionValue (#12589)
Summary:
... they are basically the same class and I didn't see it in the initial PR. I also got resolvers back onto std::functions by keeping the function_table logic local to defineMethodInModules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12589

Differential Revision: D10383103

Pulled By: zdevito

fbshipit-source-id: 1b0a85eb4f112bc28256cac44446d671d803d3a2
2018-10-15 15:00:54 -07:00
71d142604f Add upcoming features to schema parser (#12585)
Summary:
This commit adds the hooks in schema parser for futures, options,
mutable alias sets, marking writes, and named output arguments that
need to exist for other upcoming work.

This also fixes that problem where  you could not declare Lists of Lists.

Implementation of most of these features is left NYI. This commit should
avoid merge conflicts for these individual features on the schema parser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12585

Differential Revision: D10382229

Pulled By: zdevito

fbshipit-source-id: 41d794e58ca462cf3a389861c533c68944dc560b
2018-10-15 14:51:42 -07:00
4c21b2f2d3 split register_aten_ops.cpp into shards (#12615)
Summary:
after an analogous breakup of VariableType.cpp, the generated
register_aten_ops.cpp is now the slowest-to-compile file in a typical
incremental rebuild by a wide margin. Therefore, give it the same
treatment - the generated code is split across several files to allow
parallel compilation.

Note that the existing code takes some care to arrange that overloads
of the same op name are given in a particular order. This diff
preserves that behavior, by treating all overloads of the same name as
a single indivisible unit, and sharding based on these groups rather
than on individual constructors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615

Reviewed By: ezyang

Differential Revision: D10367363

Pulled By: anderspapitto

fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e
2018-10-15 14:12:27 -07:00
c6f0fe5f26 CircleCI: Remove --depth from git fetch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12657

Differential Revision: D10386020

Pulled By: yf225

fbshipit-source-id: 08d1c57159b323c19d5fc94180972d0c70d6aec1
2018-10-15 13:55:27 -07:00
6f339cac6b Windows local dev: install conda in user-specific directory to avoid conflict (#12663)
Summary:
Currently when developing on the shared Windows debug machine, it's very easy to accidentally wipe out someone else's working binary because the conda environment is shared. This PR fixes that by always installing conda in the user's directory instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12663

Differential Revision: D10386130

Pulled By: yf225

fbshipit-source-id: 1242ef8b2b4239c4a96459a59eb0255b44ed9628
2018-10-15 13:46:12 -07:00
bbe6ef3864 torch.finfo and torch.iinfo to mimic the numpy equivalent (#12472)
Summary:
This pull request intends to provide the functionality requested in https://github.com/pytorch/pytorch/issues/10742 by adding a new torch.finfo and torch.iinfo API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12472

Differential Revision: D10250829

Pulled By: benoitsteiner

fbshipit-source-id: eb22ca55d5b0064bef381fa7f1eb75989977df30
2018-10-15 13:43:52 -07:00
e8d8ccb34a Emphasize that the /path/to/libtorch must be absolute (#12660)
Summary:
ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12660

Differential Revision: D10386952

Pulled By: goldsborough

fbshipit-source-id: efd82f2aa3a349e9acd29303984b8fd7c3208c3f
2018-10-15 13:41:18 -07:00
a74cc03aa7 Use branch of exhale that fixes overloads (#12668)
Summary:
Docs for [`torch::jit::load`](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1jit_1ace2c44fb8af5905ae17834e81086b8a3.html#exhale-function-namespacetorch-1-1jit-1ace2c44fb8af5905ae17834e81086b8a3) are currently broken. svenevs has a fix on this branch, and we need to update to it.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12668

Differential Revision: D10386949

Pulled By: goldsborough

fbshipit-source-id: 1887ba53989e5a77b178f8b2782a7b3ae52b7405
2018-10-15 13:39:01 -07:00
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
aef8cadb9a mark Storage functions as const (#12623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12623

Mark Storage functions as const so that they they can be exposed outside of TensorImpl when calling storage()

Based on this discussion https://github.com/zdevito/ATen/issues/27#issuecomment-330717839

Also potentially useful in the effort to remove ShareExternalPointer

Reviewed By: ezyang

Differential Revision: D10370201

fbshipit-source-id: 43cf3803a4aa7b94fdf0c3a604d7db769ca0bdd5
2018-10-15 13:03:28 -07:00
189c1e1afb Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636)
Summary:
The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636

Differential Revision: D10377099

Pulled By: soumith

fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039
2018-10-15 13:03:27 -07:00
a6c7cf8741 python bindings: enable generic nn operator handling
Summary: hotfix to unblock @[100000295380748:Dong Shi]

Reviewed By: duc0

Differential Revision: D10385763

fbshipit-source-id: 80badd31c1039a245f32940c719e867a86ec7e47
2018-10-15 12:55:42 -07:00
0740a5d521 compute_uv for SVD (#12517)
Summary:
Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD.

Closes https://github.com/pytorch/pytorch/issues/12420 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517

Differential Revision: D10384554

Pulled By: SsnL

fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be
2018-10-15 12:35:56 -07:00
d5eae90537 update onnx tests (#12619)
Summary:
Fixes #12586
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12619

Reviewed By: ezyang

Differential Revision: D10377548

Pulled By: houseroad

fbshipit-source-id: 1166e40aa8b98f1fe015fb1bdb2e90acfad3c356
2018-10-15 11:59:19 -07:00
d17b0bc679 Allow running root tasks inline (#12289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12289

When we have a all sync net, the chaining algorithm will generate one single group. And we want to just run that in the serving thread instead of scheduling it onto the worker queue. This will closely mimic the behavior of simple net and gives us the expected performance.

Reviewed By: ilia-cher

Differential Revision: D10174323

fbshipit-source-id: 1dae11a478936634f8ef1e4aa43d7884d6362e52
2018-10-15 11:14:12 -07:00
a1bbe80e21 Remove NervanaGPU operators from Caffe2 (#12564)
Summary:
Fix #12540
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564

Reviewed By: orionr

Differential Revision: D10379775

Pulled By: soumith

fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0
2018-10-15 11:04:46 -07:00
151b28521a Fix Windows test script on local dev machine (#12073)
Summary:
We should not clean up Miniconda environment when the user is running `win-test.sh` locally.

This would help reproduce #11527 locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12073

Differential Revision: D10053497

Pulled By: yf225

fbshipit-source-id: 11027500e7917a7cb79270c811379e11dbbb6476
2018-10-15 09:36:50 -07:00
7326739188 Remove out-of-date TODO.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12638

Differential Revision: D10376584

Pulled By: gchanan

fbshipit-source-id: 47fb0333cd9e41a66c2e215f91e129fe19dc9225
2018-10-15 08:45:59 -07:00
07d67aa17a Make TensorOptions immutable. (#12630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12630

Instead of providing mutable accessors, our "mutators" now
return new copies of TensorOptions.  Since TensorOptions is
simply two 64-bit integers, this is not a big efficiency
problem.

There may be some sites that assumed that TensorOptions was
mutable.  They need to be fixed.

Reviewed By: SsnL

Differential Revision: D10249293

fbshipit-source-id: b3d17acc37e78c0b90ea2c29515de5dd01209bd3
2018-10-15 08:30:16 -07:00
1014c8a7db 'Re-sync with internal repository' (#12652) 2018-10-15 10:57:10 -04:00
6dd71947ea remove unused Iterable, also avoid Python 3.7 deprecation warning
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12639

Differential Revision: D10377094

Pulled By: soumith

fbshipit-source-id: d904c4c1bbac900e44ea0b3b5635697159aec717
2018-10-15 02:30:22 -07:00
eaf33f22c8 Revert D10123465: Set the correct engine name for position weighted pooling when fp16 is used for training
Differential Revision:
D10123465

Original commit changeset: e8d929d4153d

fbshipit-source-id: 36269e49ac79955fe695ac1a53a3c386aa2f5bec
2018-10-15 01:53:48 -07:00
02695c11db fix masked_fill_ bug on non-contiguous tensor (#12594)
Summary:
bug fix on #12230 , the following script pass after the fix.
```python
x = torch.randn(2, 2, 2)
x = x.permute((2, 0, 1))
y = x.clone()
y.masked_fill_(y > 0, 1)
x.masked_fill_(x > 0, 1)
print((x == y).all())
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12594

Differential Revision: D10377088

Pulled By: soumith

fbshipit-source-id: 88feabe1459d325bfdf9a860412ddbd28686a28b
2018-10-14 23:12:27 -07:00
0c6ab0e8f4 Delete caffe2/mkl, and references. (#12625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12625

It's obsoleted by ideep

Reviewed By: Yangqing

Differential Revision: D10372230

fbshipit-source-id: 2d6475ae72389dd654ba0bcbb57766530eb4ac1a
2018-10-13 22:02:32 -07:00
a98958d3bd dtype option for softmax (#11719)
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719

Reviewed By: ezyang

Differential Revision: D10175514

Pulled By: zou3519

fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
2018-10-13 17:57:10 -07:00
e986f307c3 Fix math formatting of PairwiseDistance docs (#12628)
Summary:
`:math:` was being displayed in the docs for https://pytorch.org/docs/stable/nn.html#torch.nn.PairwiseDistance.

I haven't tested this locally, but I assume it works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12628

Differential Revision: D10373778

Pulled By: SsnL

fbshipit-source-id: 6eb918c521e73c17f6662d83f69e0e4b14dec860
2018-10-13 16:39:15 -07:00
a91f3338a0 Some documentation fixes (#12521)
Summary:
ezyang soumith

Partly addresses https://github.com/pytorch/cppdocs/issues/2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12521

Differential Revision: D10374244

Pulled By: goldsborough

fbshipit-source-id: 8e9fe688cbaa2d2b0b96f721e5477ee8845b8f20
2018-10-13 14:20:42 -07:00
1f94ce1f97 Fix aten::to export in ONNX
Summary: D10356994 broke ONNX export for casting, this fixes it

Reviewed By: wanchaol

Differential Revision: D10366103

Pulled By: jamesr66a

fbshipit-source-id: 039454cce571a1186265708e7ddcb946814cc8b0
2018-10-12 21:20:01 -07:00
635cbff300 Set the correct engine name for position weighted pooling when fp16 is used for training
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12225

Reviewed By: hyuen, xianjiec

Differential Revision: D10123465

fbshipit-source-id: e8d929d4153d1ee987ae3d1c37892525d7574d16
2018-10-12 20:15:13 -07:00
6bc8d303eb Update onnx to onnx/onnx@06f6d63 (#12621)
Summary:
06f6d63d55
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12621

Differential Revision: D10368472

Pulled By: bddppq

fbshipit-source-id: b62fbbc0ad5bc41c5e7221ba889b1061087c3214
2018-10-12 17:25:20 -07:00
63a220f54d Deprecate prof_dag (#11956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11956

Deprecate prof_dag and redirect it to the unified executor

Reviewed By: aazzolini

Differential Revision: D9983992

fbshipit-source-id: 16821628a99a5683dc39cbb345ddab56e9d8721c
2018-10-12 16:37:57 -07:00
53f4dbc9ac test_proper_exit: avoid truncation of info message (#12612)
Summary:
test_proper_exit in the dataloader test bucket includes
(as its docstring) a reassuring message about complaints that
may appear during the test. The message is displayed
when the tests are run in verbose mode.

But the docstring includes a line break, and the unittest
framework only prints the first line of the docstring (see
shortDesription()). As a result, the 2nd (more reassuring)
half of the message is not displayed.

Catenate the docstring onto a single line so all is visible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12612

Differential Revision: D10368786

Pulled By: ezyang

fbshipit-source-id: 14b259a6d6a3491d4290148eae56e6ab06f2a9b6
2018-10-12 16:32:28 -07:00
17ab3bd502 implement rowwise quantization for fp16 (#12382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382

implement fp16-> (uint8 + scale and bias in fp32)

this is similar to fp32 rowwise quantization

we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways

Reviewed By: csummersea

Differential Revision: D10220463

fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f
2018-10-12 13:57:55 -07:00
7a1b668283 Implement Tensor.__cuda_array_interface__. (#11984)
Summary:
_Implements pytorch/pytorch#11914, cc: ezyang_

Implements `__cuda_array_interface__` for non-sparse cuda tensors,
providing compatibility with numba (and other cuda projects...).

Adds `numba` installation to the `xenial-cuda9` jenkins test environments via direct installation in `.jenkins/pytorch/test.sh` and numba-oriented test suite in `test/test_numba_integration.py`.

See interface reference at:
https://numba.pydata.org/numba-doc/latest/cuda/cuda_array_interface.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11984

Differential Revision: D10361430

Pulled By: ezyang

fbshipit-source-id: 6e7742a7ae4e8d5f534afd794ab6f54f67808b63
2018-10-12 13:41:05 -07:00
134b5d62e8 don't copy weight gradients in rnn (#12600)
Summary:
This PR gets rid of unnecessary copy of weight gradients in cudnn rnn. Also removes unnecessary check for  input size when deciding whether to use persistent rnn, and adds doc string explaining when persistent rnn can be used. cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12600

Differential Revision: D10359981

Pulled By: soumith

fbshipit-source-id: 0fce11b527d543fabf21e6e9213fb2879853d7fb
2018-10-12 13:34:10 -07:00
49256ddb4a split generated VariableType.cpp (#12493)
Summary:
On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493

Reviewed By: zdevito

Differential Revision: D10315988

Pulled By: anderspapitto

fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e
2018-10-12 13:14:44 -07:00
3f52a0aad7 Fix the linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12613

Differential Revision: D10364963

Pulled By: houseroad

fbshipit-source-id: f9e2a76c1ab021cce4f45f5b4e74ddcc9618c138
2018-10-12 13:12:08 -07:00
239b2ac718 make the variable declaration closer to usage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9262

Differential Revision: D10363576

Pulled By: ezyang

fbshipit-source-id: 05c8eb12f3b389caf562cca9e338cc91b0e9acc1
2018-10-12 12:07:08 -07:00
15bdb9fe61 remove duplicate BUILD_TEST flag in libtorch cmake file (#12583)
Summary:
there is already a BUILD_TEST flag in the root-level cmake file. Removing this makes sure it doesn't interfere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12583

Differential Revision: D10348620

Pulled By: anderspapitto

fbshipit-source-id: 3957783b947183e76a4479a740508c0dc1c56930
2018-10-12 11:53:07 -07:00
7da4643232 Caffe2: fix error C2398 and syntax error with Visual Studio 2015 (#10089)
Summary:
Similar fix to [pull #7024](https://github.com/pytorch/pytorch/pull/7024).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10089

Differential Revision: D10363341

Pulled By: ezyang

fbshipit-source-id: bc9160e2ea75fc77acf3afe9a4e20f327469592e
2018-10-12 11:47:34 -07:00
c1d0784dcb enable onnx integration tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12592

Reviewed By: BIT-silence, zrphercule

Differential Revision: D10363056

Pulled By: houseroad

fbshipit-source-id: 4d1dc0302a8cbe3d6ff1594f0d038330ba4efc81
2018-10-12 11:34:16 -07:00
97eec33f80 Allow tensor.device, tensor.dtype, and tensor.shape in JIT (#12363)
Summary:
Closes https://github.com/pytorch/pytorch/issues/12364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12363

Differential Revision: D10362491

Pulled By: ezyang

fbshipit-source-id: f2716e656977370c5ec51cb15f62b6376798e617
2018-10-12 11:29:04 -07:00
5317429e82 move bceWithLogits from python to Aten (#11054)
Summary:
Fixes #10648 .
Perf comparison:
```
import torch
import torch.nn as nn
import time

def bm(testsize, repeat=100, cuda=False):
    total_time = 0.0
    pos_weight= torch.ones(testsize[1], device='cuda' if cuda else 'cpu') / testsize[1]
    # loss = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    loss = nn.BCEWithLogitsLoss()
    input = torch.randn(testsize, device='cuda' if cuda else 'cpu').clamp_(2.8e-2, 1 - 2.8e-2)
    target = torch.randn(testsize, device='cuda' if cuda else 'cpu').gt(0).float()
    input.requires_grad = True
    target.requires_grad = True
    for _ in range(repeat):
        start = time.time()
        l = loss(input, target)
        l.backward()
        # print(target.grad)
        end = time.time()
        total_time += end - start
    return total_time

for cuda in [False, True]:
    for testsize in [(100, 100), (1000, 1000), (2000, 2000)]:
        # print(testsize, cuda)
        print('{:.5f}'.format(bm(testsize, cuda=cuda)))
```
|    | Python CPU | Aten CPU | Python GPU | Aten GPU
| ------------- | ------------- | ------------- | ------------- | ------------- |
| (100, 100)  | 0.15813s | 0.10890s | 0.14601s | 0.07070s |
| (1000, 1000)  | 1.74051s | 0.95038s | 0.15158s | 0.10153s |
| (2000, 2000) | 5.36515s | 2.46996s | 0.31322s | 0.200941s |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11054

Differential Revision: D9728289

Pulled By: ailzhang

fbshipit-source-id: b7c5bc50635f8cc63c317caa4321e32f7df860f8
2018-10-12 11:13:33 -07:00
6069f6f454 Try to prevent occasional timeout in test_proper_exit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12587

Differential Revision: D10361411

Pulled By: SsnL

fbshipit-source-id: 97d0ff9d40918b7729c21f4de6d8cabeb65c728a
2018-10-12 10:53:01 -07:00
12686ec656 fix _AllReduce not applying the DeviceScope guard to model.Copy operations. (#12342)
Summary:
This resolves an issue where the `model.Copy` operation would
copy to the wrong GPU, such that the below `net.Sum` operation
would use an input argument for which p2p access was not enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12342

Differential Revision: D10343181

Pulled By: ezyang

fbshipit-source-id: fd2d6d0ec6c09cda2db0a9a4f8086b3560e5a3ec
2018-10-12 10:47:58 -07:00
dfad8b60ba Remove duplicate codes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12526

Differential Revision: D10342611

Pulled By: ezyang

fbshipit-source-id: 470b4a181fd9091c3fd33d3d43a2cf6d44594202
2018-10-12 09:58:44 -07:00
038d5ca943 Remove incompatibility MSVC, Cuda and Debug (#12572)
Summary:
Experimentally this works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12572

Differential Revision: D10342468

Pulled By: ezyang

fbshipit-source-id: dc36587c32ab0910aa14b7351ca12532acd41c7d
2018-10-12 09:52:13 -07:00
63e09707a2 Use SFINAE instead of macros for 'long' hack (#12605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12605

Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't.
Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't.
Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone.

This new approach uses SFINAE to register a type id for 'long' only if it's a separate type.

Reviewed By: Prowindy

Differential Revision: D10359443

fbshipit-source-id: aa371cbb43658c8cd3664ba3d9b0dedbaa225c1d
2018-10-12 09:46:07 -07:00
b57fdf1db5 Properly set cmake python library and include_dirs (#12569)
Summary:
Properly set cmake python_library and include_dirs hints, so that systems with multiple version of python can still find the correct libraries and header files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12569

Differential Revision: D10359910

Pulled By: soumith

fbshipit-source-id: 2238dcbed7aac8a818c9435e6bba46cda5f81cad
2018-10-12 08:11:21 -07:00
48bc57fa8d Introduce chain_matmul (#12380)
Summary:
- This was one of the few functions left out from the list of functions in
  NumPy's `linalg` module
- `multi_mm` is particularly useful for DL research, for quick analysis of
  deep linear networks
- Added tests and doc string
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12380

Differential Revision: D10357136

Pulled By: SsnL

fbshipit-source-id: 52b44fa18d6409bdeb76cbbb164fe4e88224458e
2018-10-12 03:58:12 -07:00
0cf3c1ce66 Add copy= keyword to Tensor.to (#12571)
Summary:
Fixes: #12454
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571

Differential Revision: D10356994

Pulled By: SsnL

fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25
2018-10-12 02:10:44 -07:00
2279299c6c Implement aten::contiguous (#12541)
Summary:
Implement contiguous as `aten::contiguous` so it can be recorded during tracing. This was causing issues with both the trace checker as well as when a `contiguous()`-ed tensor was used downstream in a view that expected certain strides
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12541

Differential Revision: D10304028

Pulled By: jamesr66a

fbshipit-source-id: dc4c878771d052f5a0e9674f610fdec3c6782c41
2018-10-11 23:39:39 -07:00
1be8b7cc56 Delete "default" codeowners from root directories. (#12584)
Summary:
We will still have an informal notion of codeowner, but it
is not necessary to get a review from these people in particular
for these directories.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12584

Differential Revision: D10348999

Pulled By: ezyang

fbshipit-source-id: 97331ec4bab9f1aa02af82b71ad525a44ad1e7fe
2018-10-11 23:18:04 -07:00
0df4d66210 Update caffe2 docker images version in circleci (#12596)
Summary:
72b6d26950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12596

Differential Revision: D10355881

Pulled By: bddppq

fbshipit-source-id: 33c15819ec51315defc23a7fbc23caa2ddd65e75
2018-10-11 21:54:33 -07:00
fa99ed9b30 Emit warning about optimization passes only once
Reviewed By: ajtulloch

Differential Revision: D9584925

fbshipit-source-id: 191035eaefe3ab3980e46598f2ebf34b2b704a9b
2018-10-11 21:41:17 -07:00
01cb90adf1 fix the ONNX test_operator test (#12591)
Summary:
update the expect file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12591

Differential Revision: D10355620

Pulled By: houseroad

fbshipit-source-id: 5acdbf2406d322378025631808108a2d795be916
2018-10-11 21:41:15 -07:00
eb5fdc5fb5 Add default values in script (#12345)
Summary:
Add support for default values on script functions and Modules

Followup to #11962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12345

Reviewed By: michaelsuo

Differential Revision: D10263613

Pulled By: driazati

fbshipit-source-id: 9b380d8c3f8c4abb2d24c33b23c00ec5896ca372
2018-10-11 20:49:23 -07:00
97bee5cd80 Adds max plan number for CUDA 10 cufft plan cache array (#12553)
Summary:
SsnL As per your review in https://github.com/pytorch/pytorch/pull/12017/, I added a max plan number for CUDA 10 path. Our internal cuFFT team couldn't suggest a number since the limit depends on host/device memory. That is, a plan allocates some buffers on the device and also creates objects for the plans on the host side. I raised this number to 4x arbitrarily per you suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12553

Differential Revision: D10320832

Pulled By: SsnL

fbshipit-source-id: 3148d45cd280dffb2039756e2f6a74fbc7aa086d
2018-10-11 19:36:25 -07:00
957142a4fe switch ROCm CI targets to white rabbit release (#12577)
Summary:
* switches docker files over to white rabbit release - removed custom package installs
* skips five tests that regressed in that release
* fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker
* includes first changes to the infrastructure to support upcoming hip-clang compiler
* prints ROCm library versions as part of the build (as discussed w/ ezyang )
* explicitly searches for miopengemm
* installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577

Differential Revision: D10350165

Pulled By: bddppq

fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31
2018-10-11 18:03:11 -07:00
93a4b76114 Enable alternative LayerNorm impl in FisherGan (#12178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12178

Fisher GAN calls processor_util.add_mlp, which inject the layer norm through the
normalizer. We allow to use alternative impl for LayerNorn in the normalizer.

Reviewed By: Wakeupbuddy

Differential Revision: D9235528

fbshipit-source-id: 88c126c658102926613242ef84a481f6de1676ed
2018-10-11 17:36:11 -07:00
8ac8b823c2 Allow use substitute ops for LayerNorm (#12177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12177

as titled

Reviewed By: Wakeupbuddy

Differential Revision: D9218047

fbshipit-source-id: 8d68861472c99d587e678c3d76ac43abc9c8fe6d
2018-10-11 17:36:10 -07:00
d9eff40546 Revert D10209620: Use SFINAE instead of macros for 'long' hack
Differential Revision:
D10209620

Original commit changeset: 68f09339e279

fbshipit-source-id: e33927e92e34efc40917d97cd8ba80996a875dff
2018-10-11 16:50:09 -07:00
5973312abc Add clang 6 docker images
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12581

Differential Revision: D10349785

Pulled By: bddppq

fbshipit-source-id: 638641d369be0898dd6232737ebaa9d9a8c2e557
2018-10-11 16:48:13 -07:00
a1487bf874 Smarter differentiable subgraph slicing (#12175)
Summary:
If any inputs require_grad then the graph executor does differential subgraph slicing. The existing algorithm combines adjacent differentiable Node*.

There are two major motivations. The first is improving fusion opportunities: the graph fusion pass runs after differential subgraph slicing. This means that only nodes that are a part of the same differential subgraph may be considered for fusion. If something like the following happens,
```
y = f(x)
k = not_differentiable_op(m)
z = g(y)
```
and f and g are both fusible and differentiable operations, then they will be inserted into different differential subgraphs and not fused together.

The second is to enable JIT optimizations on backward passes for things like an (automatically) unrolled LSTM. Right now, in an unrolled LSTM, we see something like the following:
```
lstm_cell()
non_differentiable_list_op()
lstm_cell()
non_differentiable_list_op()
lstm_cell()
non_differentiable_list_op()
```
Each lstm_cell itself is differentiable and gets put into a separate differential subgraph. During the backwards pass, each prim::DifferentiableSubgraph has its own graph executor: these graph executors cannot talk to each other. It is better if we combined all of the lstm_cells (where applicable) into one differential subgraph so their backward passes are combined into one graph executor that can perform better optimizations than several separate graph executors.

Think about the computation graph as a DAG where edges are data dependencies and vertices are operations (the nodes). Each vertex is either black or red; a vertex is colored black if it is differentiable and red otherwise. The goal is to contract edges (merge nodes) to have the fewest black vertices remaining such that the graph is still a DAG.

The algorithm is the following:
- Take the Graph& and create a shadow "DynamicDAG" object to wrap Node* and edges. Each Vertex holds multiple Node* (but starts out holding one Node*) and each edge is a data dependency.
- Greedily contract vertices in the DynamicDAG if they are "differentiable". This operation is unrelated to the Graph&.
  - A Vertex is "differentiable" if all the nodes it holds is differentiable.
  - When contracting vertices, combine their Node* contents.
  - The DynamicDAG keeps its vertices in topological order and complains if the contraction is invalid so everything is good.
- Take the DynamicDAG: reorder the nodes in the Graph& to match the topological order in the DynamicDAG.
- Finally, go through each Vertex in the DynamicDAG: if it contains multiple Node* then merge all of them into a prim::DifferentiableGraph.

The DynamicDAG is based off of the dynamic top sort algorithm in [this paper](https://www.doc.ic.ac.uk/~phjk/Publications/DynamicTopoSortAlg-JEA-07.pdf) by Pearce and Kelly.

Each contractEdge(producer, consumer) call is `O(|AR| log |AR| * min(|out_edges(producer)|, |in_edges(consumer)|)` where `AR` is the "affected region" (defined as the set of nodes that, in topological order, are between producer and consumer). By only considering contractions such that `|ord(producer) - ord(consumer)| < threshold1` and `|out_edges(producer)| < threshold2` we can make each contractEdge(producer, consumer) call take constant time. The resulting algorithm is linear in the number of nodes.

Added a lot of small test cases.

Looking for suggestions on the following:
- what big computation graphs should I run this on to test how fast or slow it is?
- what things other than correctness should I be thinking about when I test this?

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12175

Differential Revision: D10302564

Pulled By: zou3519

fbshipit-source-id: 8a94d130d82f8a1713cc28483afef9a72d83d61a
2018-10-11 16:20:53 -07:00
0ee2e7c398 Relax the locking of running_mutex_ in async_scheduling net (#12544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12544

`running_mutex_` inside async_scheduling net is used to guard access to the `running_` variable. So we don't need to acquire that lock when we are actually running the net. This will help us prevent potential double locking situation when we decide to run the root nodes inline.

Reviewed By: ilia-cher

Differential Revision: D10304745

fbshipit-source-id: 5f701b2c22b06ff5bee7f2c37ac634326748f579
2018-10-11 16:00:54 -07:00
0f9807ee61 Enable addmm fusion for ONNX export only (#12538)
Summary:
There's some action at a distance issues and not having this is disabling quantization in C2 for prod use cases

ref T34831022
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12538

Differential Revision: D10302931

Pulled By: jamesr66a

fbshipit-source-id: 700dc8c5c4297e942171992266ffb67b815be754
2018-10-11 13:57:50 -07:00
7b0f5d6631 Support USE_CUDNN for Windows (#12518)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/12495

cc peterjc123 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12518

Reviewed By: mingzhe09088

Differential Revision: D10338792

Pulled By: orionr

fbshipit-source-id: b465c42ea6d5fe9dbc2a4e1f973d952365d0af07
2018-10-11 13:53:27 -07:00
033e00cd3f Fix bug in caffe_translator tool (#10056)
Summary:
1. Fix BN translator
    IntelCaffe and NVCaffe fuse BN+Scale, and the "BatchNorm" op contains 5 params including (scale and bias)

2. Fix Scale translator
   the translated outputs of scale have the same names with those of Conv.
   All their names are output + '_w' and output + '_b'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10056

Differential Revision: D10099205

Pulled By: yinghai

fbshipit-source-id: 73a73868e3e16c495e8b233fdb1d373d556a9537
2018-10-11 13:13:12 -07:00
666bebc7d2 adapting caffe2 operator docs generator to pytorch url
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10801

Differential Revision: D9472991

Pulled By: ezyang

fbshipit-source-id: 1b8ba77b8255b7e900b6528bd93b3b870f9ba0d4
2018-10-11 12:55:06 -07:00
eef083e477 CircleCI: add timestamp to build log, clean up unused jobs, print docker image name
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12556

Differential Revision: D10343032

Pulled By: yf225

fbshipit-source-id: fd2dcba18a5cb037fdc448dba64bf9d747dc3761
2018-10-11 12:23:42 -07:00
a4120fa132 Get rid of emitApplyIdent (#12504)
Summary:
And reroute builtin/CompilationUnit function resolution through one resolution pathway
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12504

Differential Revision: D10319920

Pulled By: jamesr66a

fbshipit-source-id: 3ab9877664dd32b97136a7625d0688e1adc0c022
2018-10-11 10:53:53 -07:00
8482ea8774 Update develop install command in onnx scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12561

Differential Revision: D10340194

Pulled By: bddppq

fbshipit-source-id: 10fb7261028d56f73111e2ca39d4eb2ab930812a
2018-10-11 10:38:52 -07:00
cee19eb31c Back out "[c10][NFCI] Move jit/type, function_schema, and utils/functional to ATen/core" (#12568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12568

Second attempt at D10324615

Original commit changeset: b71eeec98dfe
Original commit changeset #2: 1af6400ae0c1

Reviewed By: bwasti

Differential Revision: D10338168

fbshipit-source-id: 04cb443a89a9cd1a174df6d5ac1a86c3d423d56b
2018-10-11 09:53:40 -07:00
7acb145893 Fixed print issue for TensorTypeId (#12402)
Summary:
Fixed printing issue for TensorTypeID. It used to print a hex of the ID, e.g.
   /x1
Now it prints the ID as a string, e.g.
  1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12402

Reviewed By: ezyang

Differential Revision: D10224026

Pulled By: lauragustafson

fbshipit-source-id: a9ca841d08c546fccbb948a17f06a29fea66f3fb
2018-10-11 08:23:32 -07:00
229397b439 Revert D10324615: [pytorch][PR] Revert #12466 and #12467 to fix JIT test error on Windows CI
Differential Revision:
D10324615

Original commit changeset: 12e5fc73da42

fbshipit-source-id: 710c5f3b7a4fe56799ae31a86359b2085b7e741d
2018-10-11 03:39:14 -07:00
1c7832c854 CUDA 10 warnings fixed (#12442)
Summary:
Deprecation warning against `cudaPointerAttributes`, where `memoryType` field has been deprecated in favor of `type`, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#contents-end
for details
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12442

Differential Revision: D10251239

Pulled By: zou3519

fbshipit-source-id: 500f1e02aa8e11c510475953ef5244d5fb13bf9e
2018-10-11 00:25:22 -07:00
234e6b3797 Bugfix in onnx exporter (#10607)
Summary:
Incorrect processing for int and float arguments. Possibly a typo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10607

Differential Revision: D9376040

Pulled By: bddppq

fbshipit-source-id: e3665e7bbb26842d1d7eed50442993cfdbf55a80
2018-10-11 00:25:20 -07:00
1f7cbea984 Revert #12466 and #12467 to fix JIT test error on Windows CI (#12557)
Summary:
Sample error log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/11766/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12557

Differential Revision: D10324615

Pulled By: yf225

fbshipit-source-id: 12e5fc73da42ffa22e39250aee9ea072fd2e33de
2018-10-10 23:56:56 -07:00
170d84228e Delete redundant statement of col2im (#12514)
Summary:
Hi, I found that there was two statement of `col2im` in `im2col.h` and think the former one
may be redundant.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12514

Differential Revision: D10328721

Pulled By: ezyang

fbshipit-source-id: d225547848803511c7cc58bd9df1cc6832a537fb
2018-10-10 23:56:54 -07:00
2b033332c8 Allow linking to backwards-compatible cuDNN at runtime (#12239)
Summary:
Fixes #12193
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239

Differential Revision: D10321744

Pulled By: soumith

fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478
2018-10-10 23:56:51 -07:00
8734b174ca Multinomial raise error (#12490)
Summary:
Fixes #12260 #2896

```
torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False)
```
The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread.

- Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12490

Differential Revision: D10278794

Pulled By: ailzhang

fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1
2018-10-10 20:39:04 -07:00
b89a3b50fb Remove StaticContext (#12547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12305

Remove StaticContext from context_base.h

Reviewed By: dzhulgakov

Differential Revision: D10073519

fbshipit-source-id: 350beec3c54365edef338318ce58229ccb825a98
2018-10-10 19:41:03 -07:00
c32839fc90 CircleCI: better credentials visibility (#12552)
Summary:
We will rotate the credentials if the new setting works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12552

Differential Revision: D10322121

Pulled By: yf225

fbshipit-source-id: 158f2f89b83a751566a912869a4400d5be6e5765
2018-10-10 18:25:09 -07:00
89010d60f9 Migrate HIP to use DeviceOption.device_id and delete DeviceOption.hip_gpu_id
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12546

Reviewed By: hyuen, xw285cornell

Differential Revision: D10305222

fbshipit-source-id: 955e1d2878508a25fe4e9980ae66f8f54aaf7db9
2018-10-10 18:25:06 -07:00
25bd7fe488 Add USE_FFMPEG flag for setup.py and R2Plus1D (#12543)
Summary:
Needed for https://github.com/facebookresearch/R2Plus1D/pull/46
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12543

Differential Revision: D10320147

Pulled By: orionr

fbshipit-source-id: a7dcbf7c0d4b405b9e89b28ef75a0ed1cf2a3e6a
2018-10-10 18:09:48 -07:00
da3dd9af12 No Op Optimizer (#12390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390

Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes.

Reviewed By: mlappelbaum

Differential Revision: D10209812

fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6
2018-10-10 18:09:46 -07:00
8399778049 Update FP16 submodule (#12554)
Summary:
Pull a patch that fixes remaining incompatibility with Microsoft compiler on Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12554

Differential Revision: D10319736

Pulled By: Maratyszcza

fbshipit-source-id: bcd88581df48f2678ef81e095f947391104f24d5
2018-10-10 17:25:17 -07:00
543048d275 Adds launch bounds for CTC loss kernel (#12379)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/12324
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12379

Differential Revision: D10318361

Pulled By: ezyang

fbshipit-source-id: aec4ae8205e780b18560d639543ed9d0ef0527ce
2018-10-10 17:09:38 -07:00
7724807551 Remove ExtractDeviceOption from StaticContext (#12304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12304

- make ExtractDeviceOption to be a free function.
- Add a Strorage(at::Device) constructor in order to preserve the device_id.

Reviewed By: dzhulgakov

Differential Revision: D10069839

fbshipit-source-id: a5f3994a39bdf1b7503b39bb42c228e438b52bfa
2018-10-10 14:12:16 -07:00
0d50c117db Introduce BUILD_ATEN_ONLY cmake option (#12443)
Summary:
Following up #11488 conversation with orionr
And our brief conversation at PTDC about ATen with soumith and apaszke

This PR enables a very slim build focused on ATen particularly without caffe2 and protobuf among other dependencies.
WIth this PR NimTorch tests pass fully, including AD, convolutions, wasm, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12443

Reviewed By: mingzhe09088

Differential Revision: D10249313

Pulled By: orionr

fbshipit-source-id: 4f50503f08b79f59e7717fca2b4a1f420d908707
2018-10-10 12:54:19 -07:00
a442853f4f CircleCI: try to fix submodule not found error (#12542)
Summary:
Try to fix the "submodule not found" infra error: https://circleci.com/gh/pytorch/pytorch/48431 by switching to use the official git client (instead of CircleCI's default git client).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12542

Differential Revision: D10305027

Pulled By: yf225

fbshipit-source-id: 42db0694efb468d9460ef51d7b4b2bd90d78ff24
2018-10-10 12:54:17 -07:00
b51901f7d3 Update FP16 submodule (#12539)
Summary:
Pull a patch that makes FP16 compatible with Microsoft compiler on Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12539

Reviewed By: hyuen

Differential Revision: D10303487

Pulled By: Maratyszcza

fbshipit-source-id: 4e20ece6338e4d0663cd3591914ce333f0972693
2018-10-10 11:54:06 -07:00
45db8274de CircleCI: Add credentials for pushing to perf test S3 bucket (#12523)
Summary:
This will fix the perf test baseline update in master builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12523

Reviewed By: bddppq

Differential Revision: D10289415

Pulled By: yf225

fbshipit-source-id: 408893ab2b0f93c7cffb9f8fbf74453155b850c4
2018-10-10 11:54:04 -07:00
c2a57d082d Fix windows build (#12534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12534

att

Reviewed By: orionr

Differential Revision: D10300123

fbshipit-source-id: 3079864b6979779af4a524a54b28f9b2baed8ba4
2018-10-10 09:39:06 -07:00
033e95765c Diff against master and enable bugprone-* checks (#12378)
Summary:
This PR:

1. Makes clang-tidy diff against `master` instead of `HEAD~1` in CI, which makes much more sense
2. Enables all checks in the `bugprone-*` category (see https://clang.llvm.org/extra/clang-tidy/checks/list.html) except one about parantheses in macros, because it doesn't always apply too well for us.

Fixed some nice code smells.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12378

Differential Revision: D10247972

Pulled By: goldsborough

fbshipit-source-id: 97dc9e262effa6874d2854584bf41a86684eb8bd
2018-10-10 07:23:57 -07:00
727609f435 Use SFINAE instead of macros for 'long' hack (#12424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12424

Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't.
Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't.
Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone.

This new approach uses SFINAE to register a type id for 'long' only if it's a separate type.

Reviewed By: Yangqing, dzhulgakov

Differential Revision: D10209620

fbshipit-source-id: 68f09339e279a9a56b95caeef582c557371b518d
2018-10-10 01:11:06 -07:00
e25b8869f7 typo: Aten.h -> ATen.h in cppdocs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12519

Differential Revision: D10287901

Pulled By: goldsborough

fbshipit-source-id: 56e0c1851aade84e4154777776d14e087645a762
2018-10-09 23:40:14 -07:00
3829f86c7a Update NNPACK-related submodules (#12505)
Summary:
Update submodules below:
- NNPACK
- FP16
- pthreadpool
- cpuinfo
- psimd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12505

Reviewed By: hyuen

Differential Revision: D10286690

Pulled By: Maratyszcza

fbshipit-source-id: 279214b47c82e9e2582693191cc218173c00ea69
2018-10-09 21:54:07 -07:00
283f21d518 Caffe 2 adoption (#12116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12116

Adapt Caffe 2 to platform007 (gcc 8):
* gcc 8 + nvcc template symbol lookup (D9319742):
context_.template CopySameDevice<T> ==> this->context_.template CopySameDevice<T>
* New gcc 8 warning (error):
  * -Werror=sizeof-pointer-div
  * Unnecessary parenthesis

Reviewed By: bddppq

Differential Revision: D10045844

fbshipit-source-id: 95f509fefc9593cbb82b1687793fef8930260d2f
2018-10-09 19:29:23 -07:00
16b8075acd finishRun fix (#10970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10970

Fixing a possible case when next iteration of a net may be started prematurely.
We have to ensure that resetting running_ flag is done *after* finalizeEvents
(e.g. waiting for the rest of net's event to be finished).

Reviewed By: heslami

Differential Revision: D9545442

fbshipit-source-id: bc324a180b1e93054b051981817be7985f52b4cb
2018-10-09 16:09:46 -07:00
f54ab540af Rename cuda_gpu_id to device_id in DeviceOption (#12456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456

codemod with 'Yes to all'
codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format

Reviewed By: Yangqing

Differential Revision: D10240535

fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25
2018-10-09 15:54:04 -07:00
caf8b0777a Move function_schema to ATen/core (#12467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12467

final move of files to enable nomnigraph wrapped pytorch IR

Reviewed By: ezyang

Differential Revision: D10242930

fbshipit-source-id: 1af6400ae0c1f1e7c3be262fbca58010eb2bfa86
2018-10-09 15:38:27 -07:00
f989d4b18e Move jit/type and utils/functional to ATen/core (#12466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12466

Moves type.{h,cpp} and functional.h to ATen/core

move is necessary for IR merging -- slimmed down from this diff: D9819906

Reviewed By: ezyang

Differential Revision: D10242680

fbshipit-source-id: b71eeec98dfe9496e751a91838d538970ff05b25
2018-10-09 15:38:24 -07:00
58b247fc42 Update onnx to onnx/onnx@008e381 (#12492)
Summary:
008e381855
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12492

Differential Revision: D10268646

Pulled By: bddppq

fbshipit-source-id: 39d2eae66abee898a30b71c23e54f5c51d3f9ac8
2018-10-09 15:38:22 -07:00
64f707cd26 Enable more unit tests (ROCm 255) (#12486)
Summary:
* Enable more tests that relied on CPU LAPACK at compile time.
* enabled min/max tests in test_cuda (ROCm 236)

bddppq ezyang

Tests ran as part of the ROCm CI here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/255
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12486

Differential Revision: D10262534

Pulled By: ezyang

fbshipit-source-id: 167a06fc8232af006f4b33dcc625815fd4b06d6b
2018-10-09 15:38:19 -07:00
dcd9d73d47 Expunge torch.utils.trainer.* (#12487)
Differential Revision: D10273602

Pulled By: SsnL

fbshipit-source-id: 630c1f8ee0e366f7092d4f93dbe1efa96fc860e0
2018-10-09 14:56:00 -07:00
8468b7d3f0 Fix tensor doc (#12469)
Summary:
The C++ docs for `at::Tensor` are currently broken because we moved the place `Tensor.h` gets generated to without updating our docs. I use `GEN_TO_SOURCE=1` when generating ATen files, so the `Tensor.h` file should end up in `aten/src/ATen/core/Tensor.h` if i understand correctly.

dzhulgakov ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12469

Differential Revision: D10248521

Pulled By: goldsborough

fbshipit-source-id: 8d8a11f0f6e2703b8d767dbc523fc34a4374f345
2018-10-09 14:09:22 -07:00
2b22c60980 Fix GPU perf tests on CircleCI (#12491)
Summary:
`COMMIT_SOURCE` is missing in the current CircleCI config, which is used in perf tests to decide whether to store the new numbers as baseline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12491

Differential Revision: D10274426

Pulled By: yf225

fbshipit-source-id: 047ef6cc61a12738062f9940d1bfd4c3bf152909
2018-10-09 13:53:45 -07:00
b572e27502 Fix types and warp sizes for ROCm (ROCm 256) (#12485)
Summary:
* Correct the warp size for current AMD GPUs
* Fix copy paste error in configure
* Correct the wrong typing explicitly

bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12485

Differential Revision: D10262490

Pulled By: ezyang

fbshipit-source-id: 93467944247ed764d9ac9f7bb212a94fc250608e
2018-10-09 12:53:48 -07:00
c96afa3322 topk and sort fixes (#12337)
Summary:
* Topk part 1: fix intrinsincs for 64 wave front (#224)
64 in a wave front - intrinsics change.
* Disable in-place sorting on ROCm. (#237)
It is known to hang - use the Thrust fallback
Skip one test - fails with the fallback.
* Topk fixes (#239)
* Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255
* Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs
* Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63
* Round up blockDim.x to prevent negative index for smem

bddppq ezyang

Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12337

Differential Revision: D10259481

Pulled By: ezyang

fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e
2018-10-09 12:08:48 -07:00
ea79f7c032 Add derivative to pow with scalar base (#12450)
Summary:
Fixes: #12426

Thank you, DriesSmit, for the report!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12450

Differential Revision: D10238556

Pulled By: soumith

fbshipit-source-id: 8bf71467c6734ecc5ff30f15500304d731f7e155
2018-10-09 11:38:48 -07:00
Jie
a3fb004b18 (#12474)
Summary:
Modifies the DistributedSampler logic. Now each process samples elements with
a given interval, instead of a consecutive section.

  This eliminates the possibility where the DataLoader uses padded data while
dropping the real data. It happens when:
  1. DistributedSampler padded data; and
  2. DataLoader drops_last is effectively true, and drops less then the number
of padded data.
  from the example down, we see that data (10, 11, 12) are padded through
duplicating data sample (1, 2, 3)
  The old sampler drops legit original data (3, 6, 9) and introduces duplication
(10, 11) into the training set; while the new sampler logic samples correct data
points from the data set.
  This example has been added to dataloader unit test

example:
```
  data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9
  padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

  old sampler:       ->  DataLoader with (batch_size=2 and drop_last=True)
   p 1: 1, 2, 3          1, 2
   p 2: 4, 5, 6          4, 5
   p 3: 7, 8, 9          7, 8
   p 4:10,11,12         10,11

  new sampler:       ->
   p 1: 1, 5, 9          1, 5
   p 2: 2, 6,10          2, 6
   p 3: 3, 7,11          3, 7
   p 4: 4, 8,12          4, 8
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474

Differential Revision: D10260410

Pulled By: SsnL

fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf
2018-10-09 11:23:50 -07:00
1c69d368e1 Remove New with Allocator Registry (#12111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12111

Setup allocator registry keyed by at::DeviceType, and remove New from StaticContext.

Reviewed By: ezyang

Differential Revision: D10022853

fbshipit-source-id: 3e88a181fe5df24f33f49b88be1f75284a185588
2018-10-09 10:53:52 -07:00
f564163951 Remove SSE-only code and convolve5x5 (#12109)
Summary:
Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs.

On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109

Differential Revision: D10055134

Pulled By: colesbury

fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa
2018-10-09 10:53:50 -07:00
11c31aef04 Prevent hanging in data loader altogether
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985

Differential Revision: D10202374

Pulled By: SsnL

fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4
2018-10-09 09:54:19 -07:00
1a0d82e4f4 fix import for script module with control flow blocks (#12351)
Summary:
The value_info proto field was being processed in BuildGraph, but control flow blocks used buildBlocks instead. This PR moves moves that step to BuildBlock.

I removed DecoderBase because it was making the code confusing and we never needed it in the first place.

closes #12319
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12351

Differential Revision: D10212411

Pulled By: li-roy

fbshipit-source-id: 47f289a462a1ab7391ff57368185401673980233
2018-10-08 22:25:14 -07:00
c959be9d1d Create named functions construct (#12237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12237

This diff creates named functions and cleans up a lot of the basic block usage throughout the code

Reviewed By: duc0

Differential Revision: D10134363

fbshipit-source-id: d0c4ae0bbb726236a15251dbfd529d4fddcd9e9f
2018-10-08 22:12:18 -07:00
8414094562 cleanup controlflow (#12235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12235

SSA is actually implicitly maintained so not only was this function not implemented, it never should be implemented.

Reviewed By: duc0

Differential Revision: D10133928

fbshipit-source-id: e8e5e2386f8b57812b0be2c380af85ed07cd3152
2018-10-08 22:12:13 -07:00
d400502b1d Fix a bunch of warnings in TestNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453

Differential Revision: D10244130

Pulled By: SsnL

fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0
2018-10-08 17:38:23 -07:00
cdead5ace1 Enable CircleCI for Linux jobs (#12389)
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.

After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389

Differential Revision: D10224267

Pulled By: yf225

fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
2018-10-08 17:09:37 -07:00
5a0d2c7138 Add clamping functionality to stats_put_ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12391

Reviewed By: mlappelbaum

Differential Revision: D10220000

fbshipit-source-id: 10fdbc8ebab931a5be31df964b5de5728048205d
2018-10-08 16:53:26 -07:00
1ee6fc4002 Delete noexcept on the move constructor of OrderedDict (#12369)
Summary:
Previously we tested if default-construction was noexcept, which
doesn't really mean that the move constructor is noexcept too.

Shuts up clang-tidy.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12369

Differential Revision: D10217348

Pulled By: ezyang

fbshipit-source-id: b46437d8ac7a8d756cf03ed0c6bf4400db7ecde7
2018-10-08 16:38:27 -07:00
dd4b9b06a4 Back out "Back out "[caffe2] Use custom CPU thread pool in async_scheduling"" (#12418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12418

Original commit changeset: 32921600925b

Reviewed By: yinghai

Differential Revision: D10231119

fbshipit-source-id: 7d09ea8de82ff2d911d9ded88d87af4226464d1b
2018-10-08 16:24:07 -07:00
c5d7494ca1 Use open-source NCCL2 in PyTorch (#12359)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359

Reviewed By: orionr, yns88

Differential Revision: D10219665

Pulled By: teng-li

fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
2018-10-08 15:39:07 -07:00
c3987a0fc3 Fix issues with ATenOp handling methods where self is not the first arg (#12353)
Summary:
ATenOp was handling `torch.where` incorrectly. Whereas the `torch.where` overload (and `aten::` function) had arguments in the order `Tensor condition, Tensor self, Tensor other`, ATenOp was emitting code that assumed that `self` was the 0th argument, and thus was trying to interpret the wrong value as the condition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12353

Differential Revision: D10218435

Pulled By: jamesr66a

fbshipit-source-id: afe31c5d4f941e5fa500e6b0ef941346659c8d95
2018-10-08 15:25:39 -07:00
d0e1dca0f5 fix expect file (#12465)
Summary:
Fix expect file that got out of sync
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12465

Differential Revision: D10244646

Pulled By: eellison

fbshipit-source-id: 66d101d4c6c0a235ce9fa47dc3cce027624c86bc
2018-10-08 13:54:24 -07:00
5bac46508a Fix TestJit.test_alexnet expect file (#12458)
Summary:
This test only runs when you have torchvision installed, which is not the case on CI builds. When I run test_jit on my local machine, this fails, so fixing up the expect file here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12458

Differential Revision: D10244344

Pulled By: jamesr66a

fbshipit-source-id: 728c5d9e6c37f807a0780066f20f6c31de84d544
2018-10-08 13:54:22 -07:00
d4b4c1fbec Add missing url links to README.md file. (#12440)
Summary:
Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12440

Differential Revision: D10242642

Pulled By: SsnL

fbshipit-source-id: f47d7579cf3df097c476a97b58149ca4b1eb17ab
2018-10-08 13:54:21 -07:00
a55b9f77a0 Implement 3D and 4D parallelization in Caffe2 thread pool (#12455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12455

- Mirror changes in pthreadpool

Reviewed By: harouwu

Differential Revision: D10240470

fbshipit-source-id: c1af769b5894f7865736fdaf4e0e5bf17c524614
2018-10-08 13:12:57 -07:00
d181e0f1fc Add move{Node,Edge,Subgraph} for Graph move-like semantics (#12303)
Summary:
Adding back import{Node,Edge} as move{Node,Edge} and adding a new
function moveSubgraph.  Previous diff broke OSS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12303

Differential Revision: D10182522

Pulled By: bwasti

fbshipit-source-id: 9619431d6d1a44f128613a4f6d8b7f31232ccf28
2018-10-08 12:53:25 -07:00
cf2b88fa30 Induce edges on subgraphs (#12255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12255

Simple algorithm to connect a subgraph

Reviewed By: ZolotukhinM

Differential Revision: D10141701

fbshipit-source-id: c79c5bc2be89100db602d0a5ff3d17e3dc332d8c
2018-10-08 12:24:55 -07:00
7103d0d938 Add python bindings (#12253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12253

Adding python bindings to unblock DAI development

Reviewed By: duc0

Differential Revision: D10141621

fbshipit-source-id: efac7fb8a0cc787e1c4cc94515e673812529a997
2018-10-08 12:24:53 -07:00
e7653c7561 New chaining/partitioning algorithm for async_scheduling for inference (#11957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11957

For distributed inference, we want to use async_scheduling net to run the net as we need its async part. However, according to the profiling, async_net has big overhead of dispatching tasks onto worker threads. This diff improves the issue by generating a smaller number of chains/tasks by grouping the sync ops that can be run in one shot. Note that it also schedule individual async ops as a single chain because unlike gpu ops, rpc ops are not guaranteed to be linearized at the remote site. For example, if you have two rps ops `op1->op2`, op2 won't implicitly block until op1 finishes. Therefore we need to put each of the async op as one chain as async_scheduling net will only sync the tail of the chain.

For the all sync op nets, this change give us `1.5X` slower than simple_net, while without the change, it is `7X` slower.

Next step is to work on the executor to make the task scheduling faster. And add a fallback path to be able to run ops inline if it's a all-sync net.

Reviewed By: ilia-cher

Differential Revision: D9874140

fbshipit-source-id: fcd45328698c29211f2c06ee3287194acda12227
2018-10-08 12:24:52 -07:00
f1f521f71b make bench_gen.py work for 3d conv (#12433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12433

To test 3d conv, we need to pass lists in spec argument. We also don't want to set use_cudnn=True which is the default in brew.

Reviewed By: llyfacebook, csummersea

Differential Revision: D10234315

fbshipit-source-id: 96a39992a97e020d6e9dac103e6d64df0cc1020b
2018-10-08 12:24:43 -07:00
00aedfc0e2 constant pooling pass (#12222)
Summary:
Add a pass to move all constants to the beginning of the graph, and deduplicate.

This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222

Reviewed By: driazati

Differential Revision: D10201616

Pulled By: eellison

fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd
2018-10-08 11:55:02 -07:00
83b4dc6822 Remove Type.tensor(). (#12360)
Summary:
Use at::empty instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12360

Reviewed By: ezyang

Differential Revision: D10215119

Pulled By: gchanan

fbshipit-source-id: f9bb257dff1b1bf1ecd3a6e358c4791d81b5bd31
2018-10-08 11:39:05 -07:00
28e1571843 Add the x64 msvc toolchain into PATH (#12446)
Summary:
A possible fix for the problem stated in #12410.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12446

Differential Revision: D10238572

Pulled By: soumith

fbshipit-source-id: 17ade148c4036d2481b878e5cd7d9d67c1e3626e
2018-10-08 07:54:20 -07:00
def655ec27 fix critical section of atomic add op
Summary: When testing D10220313, I ran into this bug.

Reviewed By: aazzolini

Differential Revision: D10224295

fbshipit-source-id: f46d7333612bce437c1ae6c0b0b579fc2a639665
2018-10-08 02:20:23 -07:00
8689d8af36 Format inline code block. (#12441)
Summary:
Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12441

Differential Revision: D10236743

Pulled By: SsnL

fbshipit-source-id: c0e446a81a388cf6a558bf7ab8ba0e59703dc169
2018-10-08 00:51:07 -07:00
0e44db8b0d Add check for backend of arguments to bmm cpu (#12434)
Summary:
Fixes: #12406
Thank you, jcjohnson, for reporting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12434

Differential Revision: D10235799

Pulled By: soumith

fbshipit-source-id: 44ee35010bac3791901f604095f5b4bc66b0e7f8
2018-10-07 18:55:42 -07:00
db8d01b248 Move JIT tests to gtest (#12030)
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch.

I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:

1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp

Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030

Differential Revision: D10207745

Pulled By: goldsborough

fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e
2018-10-06 23:09:44 -07:00
6f664d3917 Improve TypeMeta (#11502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11502

TypeMeta now is only a pointer to a TypeMetaData structure, of which there is exactly one global instance per type.
This reduces the size of everything storing a TypeMeta (Tensor, Blob, ...) and potentially improves performance.

Also, this diff gets rid of the type name registry in favor of static strings.

Experiments (summary: 1-3% perf gain)
- Service Lab: https://our.intern.facebook.com/intern/servicelab/30712497/
 -> No significant results found.
- Mobile Lab c10bench.json: https://our.intern.facebook.com/intern/fblearner/details/75984908/
 -> 1-3% perf gain
- Mobile Lab c10bench default: https://our.intern.facebook.com/intern/fblearner/details/75984999/
 -> 2-3% perf gain
- adindexer canary: https://our.intern.facebook.com/intern/ads/canary/413002142824203076
 -> no significant changes (benchmark too noisy)
- adfinder canary: https://our.intern.facebook.com/intern/ads/canary/413002166737860362
 -> no significant changes (benchmark too noisy)

Reviewed By: dzhulgakov

Differential Revision: D9763422

fbshipit-source-id: fc08937f114af5ff9f3ddbe7c7e396942868cdf5
2018-10-06 14:09:28 -07:00
ac9bb8ecef Make dynamic_cast_if_rtti safer (#12408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12408

Using static_cast is better than reinterpret_cast because it will cause a compile time error in the following cases, while reinterpret_cast would run into undefined behavior and likely segfault:
- Src and Dst are not related through inheritance (say converting int* to double*)
- Src and Dst are related through virtual inheritance

This `dynamic_cast_if_rtti` is still unsafe because `dynamic_cast` and `static_cast` behave differently if the runtime type is not what you expected (i.e. dynamic_cast returns nullptr or throws whereas static_cast has undefined behavior), but it's much safer than doing reinterpret_cast.

Reviewed By: Yangqing

Differential Revision: D10227820

fbshipit-source-id: 530bebe9fe1ff88646f435096d7314b65622f31a
2018-10-06 12:56:27 -07:00
0e966fc9f9 Back out "[caffe2] Use custom CPU thread pool in async_scheduling" (#12415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12415

Original commit changeset: 95da8c938b8e

Reviewed By: ilia-cher

Differential Revision: D10229804

fbshipit-source-id: 32921600925b65edb5bb201c9afba0d03ed49426
2018-10-06 00:42:06 -07:00
695465915a Remove some Type.tensor usages and remove native_tensor without size. (#12403)
Summary:
Same as before, but with "initialTensorOptions()" instead of "TensorOptions(false)".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12403

Differential Revision: D10225427

Pulled By: gchanan

fbshipit-source-id: 60bd025a5cc15bdbbab6eafc91ea55f5f2c3117e
2018-10-05 20:55:14 -07:00
14b48a2404 Use custom CPU thread pool in async_scheduling (#12295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12295

Add ability to use custom implementations of thread pool instead of TaskThreadPool

Reviewed By: yinghai

Differential Revision: D10046685

fbshipit-source-id: 95da8c938b8e60b728484c520319b09b0c87ff11
2018-10-05 19:56:04 -07:00
92b0e7026e Add weak script mode for script functions (#11963)
Summary:
This PR is the start of weak script mode for functions

Weak scripts allow you to compile a graph from Python code at runtime by annotating with `torch.jit.weak_script` for use in the JIT without affecting eager execution. Scripts are compiled lazily on the first call in a graph to avoid long Python startup times.

apaszke zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11963

Differential Revision: D10183451

Pulled By: driazati

fbshipit-source-id: 128750994d5eb148a984f8aba4113525c3e248c8
2018-10-05 18:55:49 -07:00
058a31839d Warn about local_rank not being globally unique. (#12370)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC deepakn94
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12370

Differential Revision: D10220135

Pulled By: ezyang

fbshipit-source-id: 6d1a8a383951ae52753e4f75a14b8080bf02b815
2018-10-05 17:38:41 -07:00
3f04ca9a91 Remove duplicate math transpilation function (ROCm 233) (#12387)
Summary:
* Remove duplicate math transpilation function
* Modify regex to expand matches to more __device__ functions
* Try a different tack. Apply math transpilations only to .cu and .cuh files
* Undo change that's not required anymore since we're not using regex to detect device functions

This should address "overtranspilation" as observed in another PR.

bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12387

Differential Revision: D10226798

Pulled By: bddppq

fbshipit-source-id: fa4aac8cd38d8f7ef641fad5129ed4714c0fada5
2018-10-05 17:16:35 -07:00
e1fe617600 Fix flipped pad buffer constructor arguments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12361

Differential Revision: D10218404

Pulled By: jamesr66a

fbshipit-source-id: f02137f97cd138155ba8181df3ab65f41d5abab7
2018-10-05 17:16:32 -07:00
99de4565dd Split reduction_front_backops.[cc|cu] into smaller units to allow build of smaller size (#12315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12315

Allows inclusion of needed reduce_front_back_* ops only

Differential Revision: D10188611

fbshipit-source-id: e17fd955ac5aa163a039872b6a435942b1e1e164
2018-10-05 16:50:21 -07:00
b937cbb776 Fix a bug that would resize tensor storage on export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12377

Differential Revision: D10219213

Pulled By: zdevito

fbshipit-source-id: 85cfa4467c672ff5a718e58cfae7e8c8b1cfc532
2018-10-05 16:24:54 -07:00
57fcc57f31 set CMAKE_INSTALL_MESSAGE to NEVER (#12392)
Summary:
this removes a bunch of spam output from the build. This is

(1) cleaner
(2) a couple seconds faster in some cases, e.g. my slow-rendering emacs-based shell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12392

Differential Revision: D10225340

Pulled By: anderspapitto

fbshipit-source-id: 477ee76d24f8db50084b1e261db8c22733de923b
2018-10-05 15:57:44 -07:00
54d9823d00 Make caffe2::Tensor::dims() return an IntList instead of a const vector& (#12180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180

I had to fix a lot of call sites, because a lot of places assume that
you can actually get a const vector&, and if the internal representation
of sizes in a tensor is NOT a vector, it's not possible to fulfill
this API contract.

Framework changes:
- I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to
  sizes() now.
- De-templatized SetDims; now it is an explicit list of ArrayRef and
  variadic overloads.  This makes implicit conversions work again,
  so I don't need to explicitly list the std::vector cases too.
  - As a knock-on effect, this causes Reset() to accept at::IntList as well as
    const std::vector<int64_t>&
- Edited variadic overloads of SetDims to all forward to the underlying
  arbitrary-dim implementation, reducing code duplication. (It's probably
  marginally less efficient in the new world.)
- Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList
- Make MKLTensor accept ArrayRef along with vector in constructor and
  Reset (unfortunately, no implicit conversions here, since it's templated on
  index type.)
- There are a few other places, like cudnn, where I changed functions
  that previously took const std::vector<int64_t>& to take at::IntList
  instead.

Classification of call site changes:
- 'const std::vector<int64_t>& x_dims = x.dims()' ==>
  'at::IntList x_dims = x.dims()'
- 'std::vector<int64_t> x_dims = x.dims()' ==>
  'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!)
  Usually this is because we're about to mutably modify the vector
  to compute some new dimension.  However, it also very commonly occurs in the
  form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators.
- Instead of constructing std::vector<int64_t>{blah, blah}, construct an
  at::IntList directly

ArrayRef changes:
- cbegin()/cend() iterators, they operate the same aas begin()/end() because
  everything on ArrayRef is const.
- Moved operator<< into ArrayRef.h, so that it's always available when
  working with ArrayRef.  I also templated it, so it now works on an
  ArrayRef of any type.
- Add operator== overload for ArrayRef, and also add variants to permit
  comparison of ArrayRef with std::vector, a very common operation.
  (The non-templated version of operator== can get these automatically
  via implicit conversion, but with templates C++ refuses to do
  any explicit conversions.)

I'm planning to audit all dims() call sites to make sure they don't
expect 'auto x = t.dims()' to give you an x whose lifetime can validly
outlive the tensor.

I opted not to do a dims() to sizes() rename, because dims() also matches
the protobufs accessor.  Bad news!

Reviewed By: jerryzh168

Differential Revision: D10111759

fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452
2018-10-05 15:57:41 -07:00
f9fb37ca79 Guard Denormals-Are-Zero with runtime CPU check (#12386)
Summary:
Previously, we were only enabling Flush-To-Zero (FTZ) and
Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After,
Christian's patch (https://github.com/pytorch/pytorch/pull/12109) we
won't be compiling core files with SSE3 or SSE4 enabled, to better
support older AMD processors.

This moves the FTZ and DAZ code behind a runtime CPU check in
preparation for that change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12386

Differential Revision: D10222237

Pulled By: colesbury

fbshipit-source-id: 7ffe32561ab965e1e5f9eb6e679602bbf4775538
2018-10-05 14:54:54 -07:00
bd09ab6687 Remove stages from IR, they are not longer used
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12352

Differential Revision: D10219743

Pulled By: zdevito

fbshipit-source-id: 4d9441dc3748616f9b1f0734c65ec1a7abb0d663
2018-10-05 13:58:15 -07:00
c7e8044fc8 Support additional device types (#12293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12293

Adding support for additional device types besides cuda and cpu.

Reviewed By: ezyang

Differential Revision: D10175683

fbshipit-source-id: 7a8a35c3f1b13a3b6ed84dd2d835f3902a418a6c
2018-10-05 13:15:05 -07:00
f8086845aa Fix bug in grad.py when conv bias != None (#12281)
Summary:
Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)`
```
RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281

Differential Revision: D10217370

Pulled By: ezyang

fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2
2018-10-05 12:55:14 -07:00
e2d2b270db Revert D10212616: [pytorch][PR] Remove some Type.tensor usages and remove native_tensor without size.
Differential Revision:
D10212616

Original commit changeset: c9cd128d1111

fbshipit-source-id: 923781ba9cd6e60e7c92789832e5601a1fd848b5
2018-10-05 11:55:45 -07:00
705d80b51e Remove some Type.tensor usages and remove native_tensor without size. (#12355)
Summary:
This is to move us along the path to removing Type from the public API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12355

Reviewed By: ezyang

Differential Revision: D10212616

Pulled By: gchanan

fbshipit-source-id: c9cd128d1111ab219cb0b2f3bf5b632502ab97c0
2018-10-05 11:12:07 -07:00
9ebac3d7fe Improve type kind error message (#12344)
Summary:
Address #12326
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12344

Differential Revision: D10210681

Pulled By: driazati

fbshipit-source-id: fcc2e26b79dd2d7d5f9e7ef930e2bf434f2a7e08
2018-10-05 10:57:16 -07:00
0ebbfc25f3 Add utility function make_tensor (#12288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12288

Current implementation of Tensor takes an intrusive_ptr as an argument for storing data. But instead of requiring users to explicitly pass an intrusive_ptr we want them to pass args for intrusive ptr directly which are forwarded internally through new helper function called make_tensor

Reviewed By: ezyang

Differential Revision: D10152661

fbshipit-source-id: bfa72de161ace3fd1c4573427abcd1bfbd12e29e
2018-10-05 10:40:28 -07:00
dd2c487ab0 Enforce invariant that storage_ is always non-null (#12328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12328

- Delete reset() from Storage, as it makes it easy to accidentally
  create a null storage.
- Immediately reject a storage if it is null when passed in

Reviewed By: dzhulgakov

Differential Revision: D10200448

fbshipit-source-id: 14bfa45f8f59859cc350bd9e20e3ef8692e3991d
2018-10-05 09:43:34 -07:00
7788ec9dd1 Remove dangling cmake check for long typemeta (#12356)
Summary:
TSIA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12356

Differential Revision: D10212726

Pulled By: Yangqing

fbshipit-source-id: b9c2c778fb496278477ef323ecfefd5d19d1af3c
2018-10-05 09:43:32 -07:00
1e7050072b Make TensorOptions contain optional fields, optimize struct size (#12103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103

This defers lookup of defaults to the site where we read
out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE,
but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions
inside the OptionsGuard region, and then use it outside of the region
(the situation where behavior could change.)

I also optimize the size of TensorOptions by rearranging fields, so that we
always fit in two 64-bit words.

Reviewed By: goldsborough

Differential Revision: D10052523

fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13
2018-10-05 09:24:53 -07:00
b3cdaee6db Update README.md of ATen Documentation (#12367)
Summary:
The changes are made to clarify how the parsing between the yaml files and header files of THNN and THCUNN works. As issue #12320 shows it is not that easy to understand the existing code without a hint to the important files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12367

Differential Revision: D10217459

Pulled By: ezyang

fbshipit-source-id: 9b3e64dea4f156843814840e736dc3230332060c
2018-10-05 08:39:55 -07:00
5cb2b2358c Move interned_strings and get build working (#12039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12039

Refactoring out this diff D9819906

Reviewed By: ezyang

Differential Revision: D10024844

fbshipit-source-id: 75b6c93526dc1490299f8b5e564e029146338178
2018-10-05 00:41:18 -07:00
f494f004b7 Fix unintended casting to long (and fix Half overloads)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12357

Reviewed By: Yangqing

Differential Revision: D10213037

Pulled By: li-roy

fbshipit-source-id: 98f7f5ee2b51a3fab378faf65482919caf008957
2018-10-05 00:28:00 -07:00
d4c58216d7 Stop warnings on AT_DECLARE_TENSOR_TYPE(.); (#12348)
Summary:
e.g.,
```
│../aten/src/ATen/core/TensorTypeIdRegistration.h:101:43: warning: extra ‘;’ [-Wpedantic]
│ AT_DECLARE_TENSOR_TYPE(SparseCUDATensorId);
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12348

Differential Revision: D10210072

Pulled By: SsnL

fbshipit-source-id: 90eacc97ef490148c0ac1357cf28f1326a791dfa
2018-10-04 23:16:47 -07:00
d9ba2b6894 Add Pytorch domain specifc ONNX schema for SparseNN ops (#12338)
Summary:
as the tile said
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12338

Differential Revision: D10204691

Pulled By: yinghai

fbshipit-source-id: fe6bb8c715a54372508672fc0651841bbc4b8656
2018-10-04 23:16:45 -07:00
bd8980e8c0 Enable CUDA 10 in CI. (#12343)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12343

Differential Revision: D10215274

Pulled By: ezyang

fbshipit-source-id: ab14e0cadd4100d7cfc3c7e924dd92742da3c29e
2018-10-04 23:16:42 -07:00
6544cd4590 Revert D10205876: Fix unintended casting to long
Differential Revision:
D10205876

Original commit changeset: b0678b019b19

fbshipit-source-id: ebd3acc017fd10cf293e1de281ea294da86747be
2018-10-04 21:10:52 -07:00
8e5ac43b4e Fix unintended casting to long
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12341

Reviewed By: ezyang

Differential Revision: D10205876

fbshipit-source-id: b0678b019b196ac9ee52969f80819ee9ee442bf2
2018-10-04 17:41:40 -07:00
16e21e14e3 Fix Caffe2 build on 64-bit Android (#12340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12340

`long` and `int64_t` are the same type on 64-bit Android.

Reviewed By: Yangqing

Differential Revision: D10204892

fbshipit-source-id: 2d5bf707bf87b99fc597c9292b59f032e9004620
2018-10-04 15:14:53 -07:00
f0b73ff790 Pretty printer improvements (#12179)
Summary:
* Replaces `prim::PythonOp` with the name of the function being called
* Delays printing values used in `prim::Return` nodes until the return
node itself if that is the only place the value is used to remove some
useless assigns

zdevito apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179

Differential Revision: D10132661

Pulled By: driazati

fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208
2018-10-04 15:14:51 -07:00
895994a7c3 Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch"
Reviewed By: The controller you requested could not be found.

fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d
2018-10-04 14:12:04 -07:00
a98489747d Enable sparse functionality and tests (#12323)
Summary:
* Enable sparse functions for ROCm

* Reenable test_sparse unit tests that are now passing in ROCm

ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12323

Differential Revision: D10203540

Pulled By: bddppq

fbshipit-source-id: 33ffcfbda32875676c27b33ad1e7cd96fbadc790
2018-10-04 13:43:12 -07:00
39bd73ae51 Guard NumPy usage using USE_NUMPY (#11798)
Summary:
All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source.

Fixes #11757

Reviewed By: Yangqing

Differential Revision: D10031862

Pulled By: SsnL

fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712
2018-10-04 12:11:02 -07:00
c064f8a89d Fix build error mkldnn due to corruptted CMAKE_REQUIRED_LIBRARIES (#12195)
Summary:
This is to fix cmake-time compilation error.

 When we change script to build Caffe2 with mkldnn, we run into some cmake-time compilation support check (like in libsleef) failed due to incorrect setting of CMAKE_REQUIRED_LIBRARIES.  It is a global setting which can interfere camke compilation if it is not clean up properly.  FindBLAS.cmake and FindLAPACK.cmake didn't clean this flag, and causes incorrect building of libsleef.so.

yinghai gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12195

Differential Revision: D10159314

Pulled By: yinghai

fbshipit-source-id: 04908738f7d005579605b9c2a58d54f035d3baf4
2018-10-04 11:56:06 -07:00
ae7a7fb398 Use open-source NCCL2 in PyTorch (#12312)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312

Differential Revision: D10190845

Pulled By: teng-li

fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
2018-10-04 11:42:17 -07:00
6b79e16d6d revert test/expect files (#12332)
Summary:
Linter added newline to the expect files in #12144 . This reverts it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12332

Reviewed By: SsnL

Differential Revision: D10201790

Pulled By: Yangqing

fbshipit-source-id: 29f87c013c3522675a765a81a92520fbaea10057
2018-10-04 11:12:57 -07:00
83de6f0dac hip minor fix for c10 (#12329)
Summary:
TSIA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12329

Differential Revision: D10201437

Pulled By: Yangqing

fbshipit-source-id: 4e62f5870ad269d7a4f936393d2b3e646d0a6b2c
2018-10-04 11:12:54 -07:00
bcb62cb525 Lazily create tensors in optim_baseline (#12301)
Summary:
Tensors cannot be created globally because of static initialization order issues. So tensors for the optim_baseline test must be created lazily instead. This is fine because these functions will only be called once (in the respective test).

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12301

Differential Revision: D10201008

Pulled By: goldsborough

fbshipit-source-id: 59a041f437354e7c6600e5655b3e2d0647dbde9e
2018-10-04 10:55:53 -07:00
1962646d0f Remove CAFFE2_UNIQUE_LONG_TYPEMETA (#12311)
Summary:
CAFFE2_UNIQUE_LONG_TYPEMETA has been a tricky variable defined only from cmake - this is an experiment to remove it and see what exact compilers need that one set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12311

Reviewed By: dzhulgakov

Differential Revision: D10187777

Pulled By: Yangqing

fbshipit-source-id: 03e4ede4eafc291e947e0449382bc557cb624b34
2018-10-04 10:12:13 -07:00
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
c9f7d7b506 mark unit tests as working, skip failing unit test (#12313)
Summary:
* enabled fp16 tests for test_torch

* enable fp16 tests for test_nn

* enabled multilabelmargin loss for fp16

* removed skip for test_pdist_empty_col

* Enable test_nn tests that pass with compiler fixes etc.

* Enable test_legacy_nn tests that pass with compiler fixes etc.

ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313

Differential Revision: D10189922

Pulled By: bddppq

fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6
2018-10-03 23:56:26 -07:00
8c64655460 Open source distributed code (#12254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12254

Move distributed_* code to oss folders

This unblocks adding python bindings

Reviewed By: duc0

Differential Revision: D10141400

fbshipit-source-id: 04d6654b73b6757c4dc4a1ddd9dfa2ce23c8c91d
2018-10-03 21:41:14 -07:00
15367ba9bc Deserialize offset of TreeCursor only when it is not empty (#11465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11465

It happened in one of my testing workflow run that deserialization of dataset_cursor failed. The reason it fails is due to the offset vector is serialized only when it's non-empty, but deserialization always process offset_blob whenever it is called. Though I'm still checking the reason why the offset of dataset_cursor is empty, I think it's good to remove this discrepancy.

Reviewed By: aazzolini, Tianshu-Bao

Differential Revision: D9737636

fbshipit-source-id: bb111933f534b092f29469680ff29e59617655f0
2018-10-03 20:38:59 -07:00
07bb79bd8b Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12274

We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type.

Reviewed By: llyfacebook

Differential Revision: D10156452

fbshipit-source-id: 52cf2bedc9dbb433cd5d03f0b76723f7df6a7361
2018-10-03 19:26:16 -07:00
faab6ea922 Split Allocator (#12105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12105

Split CUDA/OpenCL/xxx Allocator from xxxStaticContext::New and rewrite it under at::Allocator interface.

Reviewed By: dzhulgakov

Differential Revision: D10001033

fbshipit-source-id: e1ffbc04c18d1dcb1f8d4ef2cbbb321967de5ccc
2018-10-03 19:10:10 -07:00
74dc4460eb New in StaticContext returns at::DataPtr (#12029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029

In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr.

Reviewed By: ezyang

Differential Revision: D9889990

fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4
2018-10-03 19:10:07 -07:00
bcc2a0599b Enable clang-tidy in CI (#12213)
Summary:
At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward.

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213

Differential Revision: D10183069

Pulled By: goldsborough

fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769
2018-10-03 17:25:06 -07:00
c9f9df002d Properly catch errors in PythonOps (#12243)
Summary:
If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42)

This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL

Fixes #12118

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243

Differential Revision: D10182787

Pulled By: driazati

fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04
2018-10-03 17:25:03 -07:00
557015fd93 wipe cache with writes (#12279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12279

By some reason if we don't write to the wipe buffer, it doesn't really wipe out everything from caches in x86.
We also need to wipe out cache after initializing input blobs.

Reviewed By: Maratyszcza

Differential Revision: D10161211

fbshipit-source-id: c34414dd8b83947805010d7d57e4134d56de1430
2018-10-03 17:12:23 -07:00
6b9afc894b pyHipify Fixes (#12292)
Summary:
This PR makes the following changes:
* stores cuda_to_hip mappings in python OrderedDicts
* Replace cudaError with cudaError_t and remove cudaError mapping

bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12292

Differential Revision: D10184399

Pulled By: bddppq

fbshipit-source-id: b20a4661ba534e4fb12aa738e1ed74dba84f30fc
2018-10-03 17:12:17 -07:00
fe10f3d0c6 Fix up onnxwhile op (#12124)
Summary:
Fix things in onnxwhile op to support nested loops, correctly track loop carried deps. Nested loops should be fully supported together with https://github.com/onnx/onnx/pull/1453
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12124

Differential Revision: D10108817

Pulled By: wanchaol

fbshipit-source-id: 51b948024da857c9962833213ee792f47f054e48
2018-10-03 15:55:58 -07:00
8aa23907e8 Make if block also take control_inputs, preserve SSA (#12224)
Summary:
If block is missing control inputs when do caffe2 net execution, this PR add them back and remove the un-SSA semantics

jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12224

Differential Revision: D10135408

Pulled By: wanchaol

fbshipit-source-id: 746c870bde54ed4ca627167361db1b3f36cd235c
2018-10-03 14:29:01 -07:00
b548f8320d Reduce size of TensorImpl from 160 bytes to 128 bytes (#12266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12266

- Put all byte-size fields together (booleans and TensorTypeId),
  so they can be coalesced into a single word.
- Replace std::vector<int64_t> strides with
  std::unique_ptr<int64_t[]>, saving two words.

Reviewed By: dzhulgakov

Differential Revision: D10150834

fbshipit-source-id: f54f38eec34732f3ff7e52e00b1371d7b5b210eb
2018-10-03 14:28:59 -07:00
2217c0b408 create the onnx_root in local, and link it
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12294

Reviewed By: BIT-silence

Differential Revision: D10178208

Pulled By: houseroad

fbshipit-source-id: 6105b88ea5f3ce9164961cf13b356d85178c374d
2018-10-03 13:55:56 -07:00
3db9738b30 add torch factory methods (zeros/ones) to onnx symbolic
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11477

Differential Revision: D9761637

Pulled By: wanchaol

fbshipit-source-id: 401f8d43a831685a444e88509bace94ce5b94e52
2018-10-03 13:55:54 -07:00
01d835c9b2 Revert D10128131: [nomnigraph] Add move{Node,Edge,Subgraph} for Graph move-like semantics
Differential Revision:
D10128131

Original commit changeset: b0e17ec2802c

fbshipit-source-id: c4a922c10ce8eddc965447b3cc4b6b01dd26dabb
2018-10-03 13:11:23 -07:00
d1ac1eba3b Add bool type to IR (#11834)
Summary:
This PR adds a bool type to `IValue` and puts it into place.

* changes conds for `prim::If` and `prim::Loop` to use `bool` type
* changes operators that take `bool`s to match their native ops
* fixes ambiguous `aten` ops `aten::std` and `aten::var`
	* fixes tests in `test_jit.py TestJitGenerated`
		```
		'test_std_dim',
		'test_std_dim_1d',
		'test_std_dim_1d_neg0',
		'test_std_dim_neg0',
		'test_var_dim',
		'test_var_dim_1d',
		'test_var_dim_1d_neg0',
		'test_var_dim_neg0'
		```
* adds `prim::BoolToTensor` and `prim::TensorToBool`

apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834

Differential Revision: D9928570

Pulled By: driazati

fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb
2018-10-03 12:40:03 -07:00
c029c839a1 MIOpen 1.5 group conv API integration (#12273)
Summary:
This PR contains changes for:
1. Group convolutions introduced in MIOpen 1.5
2. Checks to initialize MIOpen conv operator descriptors only when needed (inputs or weights changed)

Differential Revision: D10174611

Pulled By: bddppq

fbshipit-source-id: cd3d61fae350c4a5e540ce1a6e08012e0e2689fe
2018-10-03 12:26:58 -07:00
a839ec805a Add move{Node,Edge,Subgraph} for Graph move-like semantics
Summary: Adding back import{Node,Edge} as move{Node,Edge} and adding a new function moveSubgraph

Reviewed By: duc0, yyetim

Differential Revision: D10128131

fbshipit-source-id: b0e17ec2802cb211b6455578fdb17dab2a7a425b
2018-10-03 12:26:55 -07:00
b911ca9b0d docs: change links to https (#12258)
Summary:
Hi, I think it might be better to use https instead of http in the README.md.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12258

Differential Revision: D10162279

Pulled By: soumith

fbshipit-source-id: 4658aa75175909b4fea6972b437765d8b49c749f
2018-10-03 06:33:09 -07:00
080266e79c Document CUDAHOSTCXX environment variable (#12265)
Summary:
This variable is already being used so this just serves to document that. I think it's an important variable, too, so it should definitely be documented there somewhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12265

Differential Revision: D10162261

Pulled By: soumith

fbshipit-source-id: e0d01e012c2fedea63372de9967a8eaa3745fe94
2018-10-03 06:33:06 -07:00
1fb8925efe Fix typo LMBD->LMDB in docs of setup.py (#12282)
Summary:
`setup.py` reads `USE_LMDB` rather than `USE_LMBD`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12282

Differential Revision: D10162025

Pulled By: soumith

fbshipit-source-id: 6295a777be10509ca49516ad7c10061d26b6f9c9
2018-10-03 06:14:19 -07:00
c0ed48a57e Add support to the accuracy metric (#12211)
Summary:
The code that reads a blob from input files are broken. Fixing them. Also, add a binary that converts input files to blobs that can be used by Caffe2 directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12211

Reviewed By: llyfacebook

Differential Revision: D10121845

Pulled By: sf-wind

fbshipit-source-id: 6e48bb594680bcb3186d8d43276b602041c30d3e
2018-10-03 02:10:51 -07:00
06360c3050 Back out "Deduplicate canonical_axis_index_ with maybe_wrap_dim"
Summary: Original commit changeset: 13c98fff0880

Reviewed By: ezyang

Differential Revision: D10153342

fbshipit-source-id: c74c56e61662e9c747206e812b1da22170cbf742
2018-10-02 16:40:21 -07:00
a76216b8ed Back out "[aibench] Use caffe2::int8::Int8TensorCPU when input type is uint8_t"
Summary: Original commit changeset: b63cd3a75f87

Reviewed By: bddppq

Differential Revision: D10154512

fbshipit-source-id: 039dfd295c5d1de799993a20e708915be65e9d76
2018-10-02 16:25:11 -07:00
035d04299c Update onnx to onnx/onnx@ddf8eb6 (#12267)
Summary:
ddf8eb6aa0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12267

Reviewed By: yinghai

Differential Revision: D10151536

Pulled By: bddppq

fbshipit-source-id: 4cb04fcc0377c6c39fb318c5fc7043e67c400866
2018-10-02 15:57:43 -07:00
04b0774964 Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12250

We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type.

Reviewed By: llyfacebook

Differential Revision: D10121216

fbshipit-source-id: b63cd3a75f87e043cc3c83de4f3520b6ffbf1d07
2018-10-02 14:57:28 -07:00
7c678746ef update the script to match the current build process
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12262

Reviewed By: BIT-silence

Differential Revision: D10148658

Pulled By: houseroad

fbshipit-source-id: c083346cc40154f7baea1be713cac799cf076cbf
2018-10-02 14:01:37 -07:00
29e5ba8a7b Fix for LibTorch download link (#12263)
Summary:
We now have a proper download link for libtorch.

ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12263

Differential Revision: D10149216

Pulled By: goldsborough

fbshipit-source-id: e9caefed1c7f8e25d7623d72c8548bfdb6114329
2018-10-02 12:25:25 -07:00
1d3f650ce4 Revert D10098106: [pytorch][PR] [WIP] New version of PT1 model format
Differential Revision:
D10098106

Original commit changeset: 94ec7fc57c84

fbshipit-source-id: 38f729b0970618f38359797b806cbbcd865f4715
2018-10-02 00:43:40 -07:00
ff608a9ff3 Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232

Original commit changeset: fca91fea58b7

This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396

Reviewed By: jerryzh168

Differential Revision: D10132473

fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b
2018-10-01 21:54:52 -07:00
696498d9e4 Delete stride updating logic from Caffe2, and make PyTorch error in this case. (#12236)
Summary:
Strides appear to cause a huge memory regression in some of our internal
training workflows. This diff stems the bleeding, while we figure out exactly
what happened.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12236

Reviewed By: dzhulgakov

Differential Revision: D10134319

fbshipit-source-id: 1547c89a65c05473c409c0977c19c99dcaefb89c
2018-10-01 21:25:04 -07:00
2cbcaf4544 Skip failing tests in test_sparse (#12229)
Summary:
Skip the recently introduced tests that fail on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12229

Differential Revision: D10138146

Pulled By: bddppq

fbshipit-source-id: a0f1ff97fabb71f635a468e8030dbe32d388de49
2018-10-01 18:31:45 -07:00
8af06d8114 Use DFS scheduling only within single device (#11848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11848

Avoid crossing the boundary between devices when using DFS scheduling

Reviewed By: romain-intel

Differential Revision: D9931091

fbshipit-source-id: 1f3cf52127830048ed1db50b01677b66eeed8b32
2018-10-01 18:31:43 -07:00
ecace9eb21 Move crf in caffe2 from fb to oss (#12200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12200

moved crf_viterbi_op, copied crf_predict and crf_viterbi_test to oss

Reviewed By: Yangqing

Differential Revision: D10118341

fbshipit-source-id: 51e30e57d280d6ca75fc0b488f743794f23b589f
2018-10-01 18:31:41 -07:00
26df16eb21 Clear previous device option when keep_device is set in load op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240

Reviewed By: jerryzh168

Differential Revision: D10133933

fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4
2018-10-01 17:20:26 -07:00
23f86ad57f Back out "[caffe2][mpscnn] Enable multiple external output"
Summary: Original commit changeset: 0cea9469cea0

Differential Revision: D10135814

fbshipit-source-id: 9563361cc00f4ce5dc2e903c0fcb10643ee9af26
2018-10-01 16:55:32 -07:00
35becd1879 New version of PT1 model format (#12149)
Summary:
Considered four different existing formats: 1) static graph, 2) torch script, 3) pickle files, 4) PyTorch C++ serialize APIs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12149

Reviewed By: BIT-silence

Differential Revision: D10098106

Pulled By: houseroad

fbshipit-source-id: 94ec7fc57c842e50fae5286ddeda657a4967a07a
2018-10-01 15:57:02 -07:00
8fa7de35f2 Enable ROCM clang-7 build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12223

Differential Revision: D10133697

Pulled By: bddppq

fbshipit-source-id: c1de99afccdad415ac1beb85d3b8ab44f9b58738
2018-10-01 15:11:40 -07:00
15d28e400f remove support for c extensions (#12122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12122

We are deprecating support for c extensions. Please use cpp extension in the future.

Reviewed By: Yangqing

Differential Revision: D10060541

fbshipit-source-id: 4f7149e06a254bd7af463fd7aa9740f65369963a
2018-10-01 13:55:28 -07:00
1b59cf8b51 Add support to use llvm 7 in CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12182

Differential Revision: D10129630

Pulled By: bddppq

fbshipit-source-id: f0217336474b807f03f84a4b8052ce92a6e3564b
2018-10-01 13:39:50 -07:00
06f535d8a0 More debug info in plan executor (#12183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12183

Adding more debug info printed from plan executor

Reviewed By: manojkris

Differential Revision: D10113104

fbshipit-source-id: dddc9aec8012c8575ab305033388412fdaaac537
2018-10-01 12:56:32 -07:00
eba1cf2145 Unify style (#11949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11949

Unify naming style

Reviewed By: yinghai

Differential Revision: D9931227

fbshipit-source-id: b6956bd98ed8625623e4747d616989f9f3a2ed46
2018-10-01 12:56:29 -07:00
3010dc4208 Revert D10123245: Back out "codemod cuda_gpu_id to device_id"
Differential Revision:
D10123245

Original commit changeset: d83da8e00a12

fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b
2018-10-01 12:22:36 -07:00
ecb3835387 change \gamma to \Gamma (#12214)
Summary:
- revert `\gamma` changes at landed PR: https://github.com/pytorch/pytorch/pull/12126
- minor fix for docs of `torch.norm()`

SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12214

Differential Revision: D10127337

Pulled By: weiyangfb

fbshipit-source-id: 15eb8abda39ec9e8b2e815e2a22096cae786995a
2018-10-01 11:31:18 -07:00
7d7d336c45 Back out "codemod cuda_gpu_id to device_id"
Summary:
Original commit changeset: f5614a5d2607

D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.

Reviewed By: orionr

Differential Revision: D10123245

fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
2018-10-01 11:31:14 -07:00
e43ffb0148 nomnigraph - easy - some code cleanup for transformations_test (#12101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12101

clean up some duplicate test code

Reviewed By: ZolotukhinM

Differential Revision: D10051914

fbshipit-source-id: 698ff144a85e8c70572116c5ddb415cd2396b4e3
2018-10-01 11:31:08 -07:00
006171fffc Back out "[pytorch][PR] Revert "Move CreateContext to global registry (#11688)"" (#12121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12055

Original commit changeset: 6ca9de65b707

Reviewed By: ezyang

Differential Revision: D10033396

fbshipit-source-id: ca9f4b2f7ef0561f619b833415d394a8b9972bf4
2018-10-01 11:10:46 -07:00
fed91f873f (Very small) allow trailing commas in assign or tuples (#11723)
Summary:
Allow trailing commas in assign statements or tuples, which also allows single element tuples.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723

Differential Revision: D10052162

Pulled By: eellison

fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8
2018-10-01 10:10:13 -07:00
f3c32a4b54 dnnlowp_16 -> dnnlowp_acc16 (#12205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12205

We're more interested in testing the performance of DNNLOWP_ACC16 engine.

Reviewed By: llyfacebook

Differential Revision: D10121080

fbshipit-source-id: 7def38be838feb7636f7dd0c8ed352c2df398ec1
2018-10-01 09:40:13 -07:00
9768b4d4ff support half float for SparseLengthsIndicesInGradientWeightedSumWithMainInputGradient (#12186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12186

specialized implementation, preconvert embeddings to float and do everything on fp32

Reviewed By: jspark1105

Differential Revision: D10100603

fbshipit-source-id: 3255b4addb6fda24722bd519163099f5d354d084
2018-09-30 23:56:14 -07:00
c3817e85fa Temporary fix for LibTorch download link (#12212)
Summary:
We're waiting for the libtorch links to show up on the website. I had a fake link in the docs so far which is misleading. This PR changes it to a temporary markdown file until the web people fix the site tomorrow.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12212

Differential Revision: D10121872

Pulled By: goldsborough

fbshipit-source-id: f1bd1315f7333b9168e99983f3f6b679c9b0c52a
2018-09-30 15:39:51 -07:00
572132fb17 copy_(Sparse, Sparse) for sparse tensor (#9005)
Summary:
- fix #8330
- add `torch.copy_(Sparse, Sparse)` with autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9005

Differential Revision: D8987885

Pulled By: weiyangfb

fbshipit-source-id: b317a41da22ee1eae2835622a0ed28a6771a3a06
2018-09-30 11:55:09 -07:00
93ecf4d72a Remove raise_from (#12185)
Summary:
soumith

CC alsrgv

Fixes #11995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12185

Differential Revision: D10120103

Pulled By: goldsborough

fbshipit-source-id: ef7807ad83f9efc05d169675b7ec72986a5d17c3
2018-09-29 22:41:55 -07:00
5ffc915f26 fix docs (#12126)
Summary:
- fix https://github.com/pytorch/pytorch/issues/12120
- add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files
- add parameter dim to `torch.unique` doc
- fix table and args for `torch.norm`
- test plan: make html and check docs in browser

gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126

Differential Revision: D10087006

Pulled By: weiyangfb

fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9
2018-09-29 22:26:45 -07:00
40aa212cd6 Support fp16 mkl engine in training
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12080

Reviewed By: hyuen

Differential Revision: D10037719

fbshipit-source-id: 618ce894eccc4c87a038dc3ab836684f16843cde
2018-09-29 21:55:11 -07:00
a2ebbccc9f fix unit tests on CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187

Differential Revision: D10118483

Pulled By: bddppq

fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442
2018-09-28 23:11:55 -07:00
878e7740fd Turns optimizations off when checking trace (#12172)
Summary:
Currently when tracing optimizations are performed twice. This means that optimizing passes, like the fusion pass, are also called twice. This is unnecessary and this PR turns off optimizations when checking the trace (since the trace is independent of optimizations). This should improve performance and debugging.

apaszke who proposed this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12172

Reviewed By: ezyang

Differential Revision: D10109250

Pulled By: apaszke

fbshipit-source-id: 8b3385eae143446820f1b61ca7576d7c07f9b248
2018-09-28 19:40:10 -07:00
22ce6060ec Add caffe2_api to exported functions (#12184)
Summary:
Broke the build, sorry.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12184

Differential Revision: D10114818

Pulled By: bwasti

fbshipit-source-id: 49844183a48d9383c5055a9ce06fe61fbf353050
2018-09-28 18:12:00 -07:00
ebc2643498 Enable multiple external output (#10957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10957

att

Differential Revision: D9525097

fbshipit-source-id: 0cea9469cea06cbfd3828549b168483413788269
2018-09-28 18:11:58 -07:00
0a5dfa5a52 Add support for device annotations on blobs
Summary: device annotations on blobs with Declare and Export trick

Reviewed By: yyetim

Differential Revision: D9999916

fbshipit-source-id: 0bd4d15e7beed2788f47255d52ea296f8f674295
2018-09-28 14:11:54 -07:00
08e5ca1262 Add filter<T>(NNModule) and explicit Declare/Export classes (#11955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11955

Adding a `filter<T>(NNModule)` function to easily get inputs/outputs of a DAI-style NNModule.

Reviewed By: duc0

Differential Revision: D9997696

fbshipit-source-id: 818c4f2e3093e0d02b35e6632b426e8d3189c21e
2018-09-28 14:11:53 -07:00
60061a20d9 Adding Declare and Export operators (#11954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11954

Adding an alternative to external_input and external_output for use in some distributed settings

Reviewed By: aazzolini

Differential Revision: D9997121

fbshipit-source-id: 1b5cc03fd3051368a3edc69e7bc472386f5746b5
2018-09-28 14:11:51 -07:00
7b2c0a09e4 Adds support for NaN, +inf, -inf float scalars to CPU and CUDA fusers (#12070)
Summary:
In current upstream float scalars are always written into kernels with:

`out << std::scientific << v << "f";`

When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN.

To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter.

Edit:

Forgot to mention, this partially addresses issue #12067.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070

Reviewed By: ezyang

Differential Revision: D10044704

Pulled By: soumith

fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c
2018-09-28 14:11:49 -07:00
0e779c27e1 Deduplicate canonical_axis_index_ with maybe_wrap_dim (#11891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11891

maybe_wrap_dim is a slightly more general function, which is able
to, under some circumstances, treat 0 as a "valid" dimension even
with a tensor is scalar.  canonical_axis_index_ never accepts
this behavior, so it always passes false.

Reviewed By: jerryzh168

Differential Revision: D9968320

fbshipit-source-id: 13c98fff0880d7bfcd00911a76c8aa10d37bd183
2018-09-28 14:11:48 -07:00
ab9a5976a0 Disable inlinining of EnforceFailMessage (#12078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12078

The constructor is inlined multiple times

Reviewed By: salexspb

Differential Revision: D9358084

fbshipit-source-id: c8d4177a3fcccac574ee4f63336a6fa8bfb07d11
2018-09-28 11:24:35 -07:00
8009b6cdb5 Kill self_ty in TYPE_DERIVED_DEFINITION_NATIVE (#11903)
Summary:
This allows us to call the type argument with name other than `self_ty`. ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11903

Differential Revision: D10105029

Pulled By: SsnL

fbshipit-source-id: 0fbdc728123ebc1154d080628cb41a085ba3e6d7
2018-09-28 11:09:50 -07:00
e7e10e60e0 Introduce builtin script functions (#12141)
Summary:
This functionality replaces the Scalar-Tensor builtin operators,
with builtin functions.

Builtin functions are used in place of operators where one operator
can be defined using a composition of another. This simplifies later
optimization passes by allowing us to have fewer operator.

In the future, builtin functions can be used for other purposes.
For example, we can define derivative functions as code rather than
building graphs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141

Reviewed By: ezyang

Differential Revision: D10088065

Pulled By: zdevito

fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf
2018-09-28 10:55:08 -07:00
65bf181ddf Add "ai.onnx.pytorch" onnx domain (#12157)
Summary:
zrphercule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12157

Differential Revision: D10100799

Pulled By: bddppq

fbshipit-source-id: 76fdd126e0b52c54276752b3b0174735355a7d2f
2018-09-28 09:57:06 -07:00
0aff3cc559 Fix broadcasting bug in StudentT (#12148)
Summary:
This fixes a broadcasting error with the `StudentT` distribution

- [x] added a regression test
- [x] strengthened parameter broadcasting tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12148

Differential Revision: D10099226

Pulled By: soumith

fbshipit-source-id: 0c5eb14180d158f8fff28ceb9e7cd3471c2bb803
2018-09-28 09:57:02 -07:00
b0248df72a Docs: Change cuda(async) —> cuda(non_blocking) (#12158)
Summary:
goldsborough Modify the docs to match the changes made in #4999
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12158

Differential Revision: D10103964

Pulled By: SsnL

fbshipit-source-id: 1b8692da86aca1a52e8d2e6cea76a5ad1f71e058
2018-09-28 08:39:27 -07:00
5be0baefa2 Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932)
Summary:
This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization.
It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python.

`getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932

Differential Revision: D10084303

Pulled By: apaszke

fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d
2018-09-28 07:54:27 -07:00
d291cf7de6 Ensuring positive definite matrix before constructing (#12102)
Summary:
Ensuring positive definite matrix in Multivariate Normal Distribution
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12102

Reviewed By: ezyang, Balandat

Differential Revision: D10052091

Pulled By: jeffreyksmithjr

fbshipit-source-id: 276cfc6995f6a217a5ad9eac299445ff1b67a65f
2018-09-28 07:27:20 -07:00
04c0971679 Special case BatchGather and BatchGatherGradient for block_size=1. (#11349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349

Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case.

Reviewed By: jspark1105, ilia-cher

Differential Revision: D7218043

fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7
2018-09-27 21:11:38 -07:00
f5a0c337ba Move TensorImpl IsType, meta, dim32, dim, ExtractDeviceOption to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12100

Reviewed By: jerryzh168

Differential Revision: D10051424

fbshipit-source-id: 5986e92ea54e60ec6bfe992015a05e09288c948c
2018-09-27 20:40:03 -07:00
bbae57d06e Move TensorImpl size_from_dim, size_to_dim, size_between_dim, canonical_axis_index to caffe2::Tensor (#12099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12099

- Generalize the free functions to accept IntList, not just std::vector<int64_t>

Reviewed By: jerryzh168

Differential Revision: D10051365

fbshipit-source-id: e3d571bf8fead22f6f25c3ca46f0c38c2bb065d2
2018-09-27 20:40:00 -07:00
3eb5940cf5 codemod cuda_gpu_id to device_id (#12022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022

codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

codemod with 'Yes to all'

Reviewed By: orionr

Differential Revision: D9986213

fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1
2018-09-27 20:24:53 -07:00
149403f849 Move TensorImpl ndim, size, itemsize and nbytes to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12098

Reviewed By: jerryzh168

Differential Revision: D10051298

fbshipit-source-id: a833fad74bbda38c019ec2cb97d4bb6804e09963
2018-09-27 19:56:00 -07:00
7f35e92af2 mutable lists (#10700)
Summary:
This PR implements the design that we discussed. Changes:
- Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state).
- Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema.
- Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences
- AnnotateEffects pass which wires up all the world tokens together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700

Reviewed By: eellison

Differential Revision: D9547881

Pulled By: michaelsuo

fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88
2018-09-27 19:25:13 -07:00
a5818047c4 Rewrite serialization to correctly handle partial reads/writes in all cases (#12143)
Summary:
Previously, doRead/doWrite were functions that could return partial reads/writes,
and we checked for this case inconsistently in the call sites of serialization.cpp.
Now, these functions do NOT return the amount of bytes read/written, and instead
handle the necessary checking loop themselves.

Fixes #12042. Maybe.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143

Differential Revision: D10097027

Pulled By: ezyang

fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1
2018-09-27 19:09:53 -07:00
a86a61b004 Implement caffe2::Tensor::raw_data() in terms of data()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12097

Reviewed By: jerryzh168

Differential Revision: D10051202

fbshipit-source-id: b4b61869363a606ab465d1500558226efae30d06
2018-09-27 18:40:37 -07:00
2021b26bcb Move TensorImpl::ShareExternalPointer helper overloads to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12096

Reviewed By: jerryzh168

Differential Revision: D10051126

fbshipit-source-id: a9b95d00512a0b4e6339d4f3f0bb180dd0c79247
2018-09-27 18:40:35 -07:00
976a9e0454 Move TensorImpl::DebugString() to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12095

Reviewed By: jerryzh168

Differential Revision: D10051078

fbshipit-source-id: f56b6fc5d1cb8ae4b636e88efe607fe65cc1d7a0
2018-09-27 18:40:33 -07:00
b0e48aa197 Move TensorImpl::Reshape(vector<int>) to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12094

Reviewed By: jerryzh168

Differential Revision: D10051079

fbshipit-source-id: 87fb91f31c33ce9b64c4654e79e0131ae391cd78
2018-09-27 18:40:30 -07:00
8c533c2c90 Fix bug where Reshape() trashes strides.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12092

Reviewed By: jerryzh168

Differential Revision: D10051005

fbshipit-source-id: c36d1c8d12fb41baf8d1a1a9f38776deeff242de
2018-09-27 18:40:28 -07:00
d02478e607 Move TensorImpl::ResizeLike to caffe2::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12091

Reviewed By: jerryzh168

Differential Revision: D10051012

fbshipit-source-id: 772ecd2e377f7d4e1ae510c1f647f6c8b71e5a57
2018-09-27 18:40:25 -07:00
dd73d57643 Move TensorImpl::ShrinkTo to caffe2::Tensor (#12090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12090

This is a slight pessimization because we need to do a
full recompute of is_contiguous(), even though a modification
of dim-0 is guaranteed to preserve contiguity.

Reviewed By: jerryzh168

Differential Revision: D10050905

fbshipit-source-id: b99233e21c9f4275b0db6e76740462e5430ce152
2018-09-27 18:40:23 -07:00
00c6fb16e7 Move ExtendTo to caffe2::Tensor from TensorImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12089

Reviewed By: jerryzh168

Differential Revision: D10050859

fbshipit-source-id: 843067aacfa2a519657220bc39a0f499582a48a4
2018-09-27 18:40:21 -07:00
6a2dbc9808 Rename TensorImpl::GetDeviceType to device_type, and properly test if is_variable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12087

Reviewed By: jerryzh168

Differential Revision: D10050781

fbshipit-source-id: 0b6c9d7caf3b1000691f86fcc7f2ef203936a29f
2018-09-27 18:40:19 -07:00
c5fc2f1105 Merge UndefinedTensorImpl.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11972

Reviewed By: gchanan, Yangqing, jerryzh168

Differential Revision: D9995633

fbshipit-source-id: 6b4645c9d4bb0bc4301cd4bcfa76cf85331b8379
2018-09-27 18:40:16 -07:00
e8cb6cb9d2 Fix some symbolics for ReduceSum, GE, LE (#12123)
Summary:
reduce sum negative indices turn to positive as caffe2 not supporting it. GE/LE symbolic operand order is wrong..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12123

Reviewed By: houseroad

Differential Revision: D10095467

Pulled By: wanchaol

fbshipit-source-id: eb20248de5531c25040ee68b89bd18743498138d
2018-09-27 17:40:46 -07:00
f6abd16a9d Merge TensorImpl. (#11971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11971

- Switched TensorImpl::data<T>() to use Storage::unsafe_data<T>() to work
  around an outstanding bug in the Storage::data<T>() implementation
  where it only works on Ts which are valid ScalarType
- Qualify a bunch of identifiers which still live in caffe2:: namespace
- strides returns an IntList now
- s/update_strides/update_to_contiguous_strides/
- Correctly compute type_id_ for the Storage only constructor from Caffe2.
  This is special cased to only work for CPU and CUDA dense tensors.
- Fix some signed-unsigned comparisons in Caffe2 code (OSS build for
  ATen/core has more restrictive warning tests.)

Reviewed By: jerryzh168

Differential Revision: D9995559

fbshipit-source-id: 9c74032e011189e1c7e9a98d20f2bd1e25ad2e5c
2018-09-27 17:40:44 -07:00
1619264ca5 Make ATen-core and caffe2 mutually recursive / merge template data<T>() (#11970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970

Adds an ATen-core-headers target, which caffe2_cpu_internal depends
on, and makes ATen-core depend on caffe2_headers.  If you link against
ATen-core, you must ALSO link against caffe2_cpu_internal; if you
link against caffe2_cpu_internal, you must ALSO link against ATen-core,
otherwise you'll have undefined symbols.

Then, we merge template data<T>() method with Caffe2 implementation,
demonstrating that includes to Caffe2 (core) from ATen/core are working

Reviewed By: jerryzh168

Differential Revision: D9967509

fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a
2018-09-27 17:40:42 -07:00
c35f85a6d4 Export symbols for pybind and other libs after caffe2 rebase (#11975)
Summary:
Export symbols for pybind and other libs after caffe2 rebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11975

Differential Revision: D10042615

Pulled By: yinghai

fbshipit-source-id: 6de562d99403099113093716834abc51bf726e94
2018-09-27 14:40:27 -07:00
80e3081c28 Add observers for mkldnn fallback operators (#9093)
Summary:
Add observers for ideep operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9093

Reviewed By: salexspb

Differential Revision: D9952949

Pulled By: yinghai

fbshipit-source-id: 1678d1a738f8781dc75eb3cb9dfb309f7b7934fb
2018-09-27 14:11:19 -07:00
6e7e63fda3 Implementation MomentumSGD/MomentumSGDUpdate operators for mkl-dnn (#11686)
Summary:
the speed-up of a single operation is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686

Reviewed By: yinghai

Differential Revision: D9828129

Pulled By: wesolwsk

fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8
2018-09-27 13:39:59 -07:00
13cf39294d Remove ATen/Error.h and use ATen/core/Error.h instead. (#12132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12132

TSIA. No code change involved.

Reviewed By: bwasti

Differential Revision: D10083237

fbshipit-source-id: bdab029015b9d0f1fa1f866c68aa5945cc68db9d
2018-09-27 10:11:17 -07:00
a72603f8f8 Fix for ppc64le jit graph difference in sigmoid backward, see #10726 (#11579)
Summary:
As reported in Issue #10726, the jit compiler, when running on ppc64le,  may produce an isomorphic output but fail a diff test against the expected output file.  The expected output file is created from a test that was ran on x86_64.  This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579

Differential Revision: D10080890

Pulled By: soumith

fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9
2018-09-27 07:09:31 -07:00
9c49bb9ddf Move registry fully to c10 (#12077)
Summary:
This does 6 things:

- add c10/util/Registry.h as the unified registry util
  - cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10

Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077

Reviewed By: ezyang

Differential Revision: D10050771

Pulled By: Yangqing

fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
2018-09-27 03:09:54 -07:00
383d340e88 Small optimization for adam (#12107)
Summary:
Apply weight decay for Adam in-place instead of via copy.

Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. eee01731a5/torch/optim/sgd.py (L93)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107

Reviewed By: soumith

Differential Revision: D10071787

Pulled By: jma127

fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673
2018-09-26 21:43:46 -07:00
5da8a8c785 Handle undefined tensor in blob correctly. (#12125)
Summary:
You can't GetDeviceType an undefined tensor, so test for this case
first.  This allows you to safely move tensors out of blobs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12125

Reviewed By: smessmer

Differential Revision: D10080075

Pulled By: ezyang

fbshipit-source-id: bb99b089b6daa9d4db99015208f939d7ce4d4a79
2018-09-26 21:43:41 -07:00
325101263a Aten: catch2gtest (#11846)
Summary:
migrant all tests in aten to use gtest except of basic.cpp
Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning.

Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch.
It will be resolved by a different pr.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11846

Differential Revision: D10080860

Pulled By: zrphercule

fbshipit-source-id: 439d4cf33fb6ccbe79b797860342853c63e59081
2018-09-26 20:57:45 -07:00
0f81039eaf Better high level C++ documentation (#12079)
Summary:
I wrote some high level docs for the larger PyTorch C++ universe and the C++ frontend specifically. Happy for reviews, but let's please also land this ASAP so I can point users at something that looks more ready baked than the C++ docs landing page (https://pytorch.org/cppdocs) does right now.

ezyang soumith

CC ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12079

Differential Revision: D10080785

Pulled By: goldsborough

fbshipit-source-id: 3028de41373f307468eb1e3802aa27871c93b2e3
2018-09-26 20:57:43 -07:00
db5f8d42bb Remove TIndex typedef from core/common.h (#12032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12032

See title

Reviewed By: dinhviethoa

Differential Revision: D10023757

fbshipit-source-id: dbf0a043b2afab767f052bd4c5e8de13e0f57dcc
2018-09-26 17:02:54 -07:00
478803a75f Introduce type variables to implement generic list operators (#12040)
Summary:
We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code.

This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup.

It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040

Differential Revision: D10037098

Pulled By: zdevito

fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150
2018-09-26 17:02:51 -07:00
75b1ae1acd Update issue templates
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12114

Reviewed By: soumith

Differential Revision: D10060349

Pulled By: JoelMarcey

fbshipit-source-id: ed88bf95f78742b089adb043e88613a5db006a10
2018-09-26 16:26:00 -07:00
1b45f68397 Use atomicAdd from cuda_fp16 header when building with CUDA 10 (#12108)
Summary:
An efficient atomicAdd for halfs has been added in `cuda_fp16.h` in CUDA 10:
```__CUDA_FP16_DECL__ __half atomicAdd(__half *address, __half val);```

Through this change, PyTorch will be able to utilize efficient atomicAdd when building with CUDA 10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12108

Differential Revision: D10053385

Pulled By: soumith

fbshipit-source-id: 946c90691a8f6bdcf6d6e367a507ac3c9970b750
2018-09-26 15:28:17 -07:00
6ff568df4d Add full namespace resolution in CAFFE_DURATION (#12065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12065

Had compilation issues using CAFFE_DURATION in some contexts, specifically due to namespace resolution. Since this is a macro, it should fully qualify.

Reviewed By: heslami

Differential Revision: D10036132

fbshipit-source-id: b8d55dfe5e991ca702ce5b7483f0ffc699882c85
2018-09-26 13:29:18 -07:00
d9c27f4d8d T33898723: Simple put operators for caffe2 stats (#12057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057

Add simple put operators for various types of stats

Reviewed By: mlappelbaum

Differential Revision: D9925268

fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810
2018-09-26 12:39:37 -07:00
c2f8f5076c add narrow() support for sparse tensors re: #8853 (#11342)
Summary:
Couple questions:

1) I used the log1p implementation in #8969 as a guide especially for testing.  I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test.

2) I implemented the branching logic in the narrow function itself; is this the right place to do so?  I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file.  Or should I implement a separate dispatch in native_functions.yml as in the log1p?

And of course, happy to make any any other updates/changes that I may have missed as well.  This is my first PR to the project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342

Differential Revision: D9978430

Pulled By: weiyangfb

fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad
2018-09-26 12:24:54 -07:00
78fe149ab9 Fix ONNX bug, add symbolic for full
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12052

Differential Revision: D10044910

Pulled By: apaszke

fbshipit-source-id: 015ef372966d7594e1b450e348d457429f6ef20d
2018-09-26 11:45:25 -07:00
18f9c07b18 Enable tracing of tensor factories with an out argument
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12051

Differential Revision: D10044890

Pulled By: apaszke

fbshipit-source-id: 2d794bf408875600bc71f354f0b4961d6b715094
2018-09-26 09:40:34 -07:00
b535aecd7c Fix warnings emitted when testing distributions (#12038)
Summary:
The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT

The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in https://github.com/pytorch/pytorch/pull/11061 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12038

Differential Revision: D10033392

Pulled By: apaszke

fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893
2018-09-26 09:24:54 -07:00
02d7c88fa4 Unify versions across setup.py, libtorch, and libcaffe2 (#12053)
Summary:
This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct.

cc Yangqing ezyang soumith goldsborough pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053

Differential Revision: D10041878

Pulled By: orionr

fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0
2018-09-26 08:55:06 -07:00
c8a0b11b7f add autodiff expressions for common operations (#11832)
Summary:
This PR does a few things:

Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.

This PR enables autodiff testing by doing the following:
- added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
  autodiff subgraph inlining.
- (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
  These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
  sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
- The relevant backward jit tests disable autodiff subgraph inlining so they
  will test the autodiff versions of the operators instead of autograd whenever
  an autodiff variant exists.
- We don't run the tests that do inline autodiff subgraphs anymore.
  This has no impact on testing correctness because the assumption is
  that autograd functions are correct and are tested in test_autograd.py

This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass.

Included:
- arithmetic overloads
- abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt

TestJitGenerated tests autodiff for all of the added operations.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832

Differential Revision: D10031256

Pulled By: zou3519

fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93
2018-09-26 08:10:04 -07:00
21ed7e51b6 Blob doesn't allow access to destroyCall anymore (#11548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11548

This removes getting/setting the DestroyCall of a Blob,
paving the way to removing DestroyCall from Blob entirely and using the destructor stored in TypeMeta instead.

Use sites have been fixed in diffs stacked below this.

Reviewed By: dzhulgakov

Differential Revision: D9775191

fbshipit-source-id: 97d72d0c62843849057f295c27f391e63c99c521
2018-09-26 01:45:28 -07:00
65cbb8226b IValue can store Blob (#11414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11414

caffe2::Blob can be stored in an IValue. This is a precondition for caffe2 to switch from Blob to IValue.

Reviewed By: ezyang

Differential Revision: D9731326

fbshipit-source-id: 462a39d2d9ab6f85b99b1670848c6976a3de417c
2018-09-26 01:12:31 -07:00
b7ebc00979 Move Blob to ATen/core (#11924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11924

Previous diffs removed Blob -> caffe2 dependencies, now we can move it to ATen/core.
This is pre-work for allowing storing Blob in IValue.

Reviewed By: ezyang

Differential Revision: D9980641

fbshipit-source-id: 32082a673ec94c42c20b2298adced8bb7ca94d07
2018-09-25 23:27:52 -07:00
8ff435c8f6 Use tempfile during serialized test comparison (#12021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021

TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem.

Also remove some accidentally checked in zips of a test that we didn't end up including for now.

Reviewed By: houseroad

Differential Revision: D10013682

fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a
2018-09-25 20:55:45 -07:00
807de9a1e3 fix segfault when grad to a hook fn is None (#12028)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/11751 by checking if a grad is a Python None object before getting cdata from it
- behaviors:

pre-fix
```
>>> a = torch.randn(5, requires_grad=True)
>>> a_list = a.unbind()

>>> a0 = a_list[0]
>>> a0.register_hook
...:    def hook(grad):
...:        print(grad)

>>> a_list[0].backward()
tensor(1.)

>>> print('a_list[0]', a_list[0].grad, a.grad)
('a_list[0]', None, tensor([1., 0., 0., 0., 0.]))

>>> a_list[1].backward() # segfault
```

post-fix
```
>>> a = torch.randn(5, requires_grad=True)
>>> a_list = a.unbind()

>>> a0 = a_list[0]
>>> a0.register_hook
... :   def hook(grad):
... :       print(grad)

>>> a_list[0].backward()
tensor(1.)

>>> print(a_list[0].grad, a.grad)
(None, tensor([1., 0., 0., 0., 0.]))

>>> a_list[1].backward()
None

>>> print(a_list[1].grad, a.grad)
(None, tensor([1., 1., 0., 0., 0.]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12028

Differential Revision: D10034094

Pulled By: weiyangfb

fbshipit-source-id: 3f2135325fa7d338b920f57752057e4f6a6c0b1d
2018-09-25 19:10:25 -07:00
db2f7de5c3 Fallback CreateMutex/AtomicIter operators for mkl-dnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11685

Reviewed By: pjh5

Differential Revision: D9928058

Pulled By: wesolwsk

fbshipit-source-id: 734e19c35a684481d9a4d4f0c596e4dceae51ad4
2018-09-25 17:41:08 -07:00
28dba2f928 Unify all *_EXPORT and *_IMPORT macros across c++ backend (#12019)
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.

This is a codemod by mechanically doing the following change:

CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019

Reviewed By: ezyang, teng-li

Differential Revision: D10016276

Pulled By: Yangqing

fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
2018-09-25 17:41:05 -07:00
90bcf41291 Add safety asserts for methods on TensorImpl which don't work on Variable. (#12058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12058

Methods on TensorImpl have to be written very carefully, because
when you have a VariableImpl subclass of TensorImpl, usually the
local fields on the TensorImpl are not valid; instead, you have to
forward to the "wrapped" tensor.  Functions which are virtualized
are probably handled correctly by Variable, but functions which
are NOT cannot be handled correctly and shouldn't be called if you
have a Variable.  This diff add checks to determine if this is
the case or not.

Reviewed By: jerryzh168

Differential Revision: D10034589

fbshipit-source-id: 650b2036ca9a044c0ab4abdf6f825521a64e1fc2
2018-09-25 17:25:47 -07:00
658386a63f Make USE_IDEEP work again (#12026)
Summary:
This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to
- Merge the CMakefile of MKLDNN from caffe2 and Pytorch
- Get rid of `USE_MKL=ON`.

Build command from now on:
```
EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098"  python setup.py build_deps
```

gujinghui
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026

Differential Revision: D10041199

Pulled By: yinghai

fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf
2018-09-25 16:56:29 -07:00
b7b9e3c7e8 Fix "identifier following the 'template' keyword does not refer to a template" (#12037)
Summary:
LLVM trunk emits an error diagnostic when attempting to compile caffe2. The
identifiers following the `template` keywords are not templates, so the use of
the keyword does not make sense in this context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12037

Reviewed By: ezyang

Differential Revision: D10024531

Pulled By: modocache

fbshipit-source-id: da4b9ba405d9f7fd633ab8c1a61c77da9c1a1f89
2018-09-25 16:40:42 -07:00
1e28294487 Delete some unused variables. (#12059)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12059

Differential Revision: D10034632

Pulled By: ezyang

fbshipit-source-id: ff33da0d93734856b8e8bcfe744cefe127fffb91
2018-09-25 14:25:21 -07:00
e53e8df20b Support TypeIdentifier::name() (#12036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12036

Sometimes you have a TypeIdentifier, and no way to get to
the TypeMeta.  Still nice to be able to read out the name.

This should be obsoleted by smessmer's patches.

Reviewed By: gchanan, mingzhe09088

Differential Revision: D10024554

fbshipit-source-id: 42cdceefd5c59be0441254665f66f5edc829f422
2018-09-25 14:25:19 -07:00
aa1adde80b Refactor fastGet/fastSet for clarity, removing a null pointer check. (#11902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11902

Previously, they were going through THTensor_getStoragePtr which
incurred a null pointer check on storage.  Now they use unsafe_data
method which doesn't do this check.

I don't know if this actually make things go faster, but I get
an added bonus of reducing code duplication, so we should take
this change anyway :)

Reviewed By: SsnL

Differential Revision: D9977654

fbshipit-source-id: f45c74828213a0439480755ad0b2d7f8858cb327
2018-09-25 13:55:53 -07:00
ceadde2a7f Add some more locations to search for nccl. (#12063)
Summary:
Users generally expect ./configure to find libraries
installed in /usr/local and /usr, so search for nccl
there too.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12063

Differential Revision: D10036248

Pulled By: ezyang

fbshipit-source-id: d331ddd2ccc8ac9846fb54222db284b1ec371659
2018-09-25 13:27:54 -07:00
b263078bc3 Fix CUDA division by a scalar on large arrays. (#12023)
Summary:
The gpu_unary_kernel function was not handling arrays that
cannot use 32-bit indexing. This functions was only called directly
by CUDA division by a scalar. Other arithmetic operations go through
gpu_binary_kernel, which already properly handled large arrays.

This bug sometimes manifested as a crash and sometimes as an incorrect
answer.

Fixes #11788
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12023

Differential Revision: D10034017

Pulled By: colesbury

fbshipit-source-id: b17300f327de54035746bf02f576766007c9b144
2018-09-25 13:10:25 -07:00
a106388187 Free MAGMA queues after use (#11882)
Summary:
This PR is a minor change, just adds a simple `magma_queue_destroy` function to the implementation of `Gesv`.

Also, I have replaced calls for obtaining handles with those already written in ATen.
```
THCState_getCurrentSparseHandle(at::globalContext().getTHCState()) --> getCurrentCUDASparseHandle()
THCState_getCurrentBlasHandle(at::globalContext().getTHCState()) --> getCurrentCUDABlasHandle()
```

Differential Revision: D10032204

Pulled By: soumith

fbshipit-source-id: ccd11989ecdc357313f0b661a2468f75d3aecb0e
2018-09-25 12:56:57 -07:00
8f0db9bbbb Removing some dependency edges from Blob to other caffe2 (#12043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043

Re-trying D9979976, this time with all call sites fixed.

D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems.
I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource.

Reviewed By: ezyang

Differential Revision: D10026392

fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157
2018-09-25 11:40:24 -07:00
94c513cc7f Improve pybind11 message (#11640)
Summary:
Improving the message based on https://github.com/pytorch/pytorch/issues/11570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11640

Differential Revision: D10033383

Pulled By: orionr

fbshipit-source-id: 0cdcdbe0582d896283a12970aebe771efa390dd2
2018-09-25 11:26:05 -07:00
364ae10bb8 nomnigraph - easy - add some python test helper methods (#12020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12020

- make it less verbose to create random blobs in python unit test by adding some test helper methods
- move str_compare test helper method to test_util.py

Reviewed By: ZolotukhinM

Differential Revision: D10003637

fbshipit-source-id: cb79d2ad508341f750a1bb8f564e87d055c65652
2018-09-25 10:55:19 -07:00
7122f8b3bb Disable more flaky tests on CircleCI (#11399)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11362.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399

Differential Revision: D9736673

Pulled By: yf225

fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3
2018-09-25 10:25:30 -07:00
d7e11e3aae Revert "Move CreateContext to global registry (#11688)" (#12049)
Summary:
This reverts commit 3ae6ee4ebded136da30aa53fd3873d84acfbc9f0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12049

Differential Revision: D10030954

Pulled By: ezyang

fbshipit-source-id: 6ca9de65b707c5b4c68280fc6f1b8e5ad7251efc
2018-09-25 10:13:43 -07:00
3deb4791c3 Replace 'struct Tensor' with 'class Tensor'. (#12034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034

We need ATen and Caffe2 to line up, and the rule is
that if you have any private/protected members, you
should declare it as a class.  Class we go.

(There are some other obvious candidates for this treatment,
but I've kept this patch just to Tensor)

Reviewed By: gchanan, mingzhe09088

Differential Revision: D10024467

fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe
2018-09-25 09:54:35 -07:00
fcb3ccf23f Don't record Git version automatically via cmake (#12046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046

This /sounds/ like a good idea in theory, but a feature
like this must be implemented very carefully, because if
you just plop the Git version in a header (that is included
by every file in your project, as macros.h is), then every
time you do a 'git pull', you will do a FULL rebuild, because
macros.h is going to regenerate to a new version and of course
you have to rebuild a source file if a header file changes.

I don't have time to implement it correctly, so I'm axing
the feature instead. If you want git versions in, e.g.,
nightly builds, please explicitly specify that when you feed
in the version.

Reviewed By: pjh5

Differential Revision: D10030556

fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d
2018-09-25 09:40:19 -07:00
0947712e5d Move Factory functions from Type to TypeExtendedInterface. (#12025)
Summary:
This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions.  In particular:
1) Removes factory functions from Type, into TypeExtendedInterface.
2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads.
3) We move from Type-based sparse_coo_tensor dispatch to function-based.

Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch.  That is coming in a future patch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025

Reviewed By: ezyang

Differential Revision: D10017205

Pulled By: gchanan

fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320
2018-09-25 09:40:17 -07:00
d4ce41c4de Rename tensor_impl_ to impl_ in Tensor (#12035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035

This brings it in line with Caffe2's naming

Reviewed By: mingzhe09088

Differential Revision: D10024485

fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0
2018-09-25 09:11:39 -07:00
71b99f28be Give default values to members of TensorImpl. (#12033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12033

These are reasonable sensible default values.  One key
pick is -1 for numel: this is because in Caffe2, a tensor
may be in "un-allocated" with no storage state; this is
historically represented in Caffe2 with numel_ == -1

Reviewed By: mingzhe09088

Differential Revision: D10024439

fbshipit-source-id: a167d727a7665daac7e7a1e98c0c89d8f1da6fa6
2018-09-25 09:11:37 -07:00
2cdf98a74d Back out "Removing some dependency edges from Blob to other caffe2"
Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223

Differential Revision:
D10026321
Ninja: stable broken

fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6
2018-09-25 01:11:14 -07:00
3417a1e7e4 Prepend a "const" to a for loop in printPyObject. (#11857)
Summary:
As pytuple should be a constant type (since obj is constant), potential errors would occur without
this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it
would still be useful if we remove this compilation issue (out of very few numbers of compilation
issues) to allow hackers playing with them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857

Differential Revision: D10024149

Pulled By: soumith

fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8
2018-09-24 23:12:57 -07:00
17a65bf9b6 Removing some dependency edges from Blob to other caffe2 (#11923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923

This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore.
(1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete.
(2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go.

This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs.

Codemods:
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(*\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(*\\1, "

It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use.

Reviewed By: ezyang

Differential Revision: D9979976

fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61
2018-09-24 22:57:05 -07:00
dfa03e94eb Fix mispelling of AVAILABLE. (#12016)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12016

Reviewed By: pietern

Differential Revision: D10010808

Pulled By: ezyang

fbshipit-source-id: ff6394ae9a53f7fdad2cadb4e019e09ac63bba96
2018-09-24 20:46:41 -07:00
86e025fca2 magma-cuda should reference updated versions (#12000)
Summary:
Source build doc section **LAPACK GPU**  only lists magma-cuda80

The magma-cuda version should reflect the installed version of cuda.

- Verified on ubuntu with magma-cuda92 with build and test
- Verified 91 is available
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12000

Differential Revision: D10024158

Pulled By: soumith

fbshipit-source-id: a34c85a5e87b52657f1e6f7b21d235306ab7b2aa
2018-09-24 20:26:26 -07:00
5d4624a1d9 Fix return temporary as reference in MPI backend (#11947)
Summary:
The MPI async work class returned a temporary as reference, which is
invalid (hat tip to colesbury for noticing it). This change fixes that and
uses a std::exception_ptr to hold on to the exception if applicable, and
then returns the reference by throwing it and returning it, like the
existing code path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947

Differential Revision: D10019928

Pulled By: pietern

fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019
2018-09-24 20:17:38 -07:00
9068a46dba Fix deprecated function warning in ONNX model test. (#11827)
Summary:
When running /test/onnx/test_models.py, we see deprecation warnings in the test points for `super_resolution` and `squeezenet` models. This change updates those models to use the recommended methods, instead of the deprecated ones.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11827

Reviewed By: houseroad

Differential Revision: D10023998

Pulled By: ezyang

fbshipit-source-id: ee4e14304678c532ebd574e7bd143e3b311995ab
2018-09-24 19:59:02 -07:00
a830964007 Eliminate no-op adds and muls in peephole pass (#11801)
Summary:
Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801

Differential Revision: D9916815

Pulled By: apaszke

fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494
2018-09-24 17:48:48 -07:00
3ae6ee4ebd Move CreateContext to global registry (#11688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688

As a first step to remove static context(merge with allocator), we'll create a
global registries for context constructors, and remove CreateContext function from tensor.

Reviewed By: ezyang, dzhulgakov

Differential Revision: D9779821

fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1
2018-09-24 17:07:50 -07:00
b7c302da1a Make gen_jit_dispatch runnable (#12018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12018

Tried to use the file and ran into a small bug, this fixes it

Differential Revision: D10013231

fbshipit-source-id: 4cf8c29cf9e2cedd7a28fa0cc0196e5144a54bf2
2018-09-24 16:09:48 -07:00
70e4b3ef59 Revert D10006069: Remove TIndex typedef from core/common.h
Differential Revision:
D10006069

Original commit changeset: 5e2aac993968

fbshipit-source-id: fbd8d3860635211e641ca14eaff7a64882e0d6bd
2018-09-24 15:30:25 -07:00
e05d689c49 Unify C++ API with C++ extensions (#11510)
Summary:
Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*.

Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498

Why is this useful?

1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed.
2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API.
3. The C++ extension API simply becomes richer by gaining access to the C++ API headers.

soumith ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510

Reviewed By: ezyang

Differential Revision: D9998835

Pulled By: goldsborough

fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
2018-09-24 14:44:21 -07:00
1c09bfde1b Make promoteType(half, integer) -> half (#11941)
Summary:
Changes the result type of half type and any integer type to return half
type (instead of float or double).

This is based on top of #11808. The first new commit is "Make promoteType(half, integer) -> half". I'll rebase on top of master once that PR lands.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11941

Differential Revision: D10014122

Pulled By: colesbury

fbshipit-source-id: 16a5eb3406a5712069201d872d8736d0599e9411
2018-09-24 13:55:42 -07:00
51414822f5 Stop moving constants into DifferentiableSubgraphs (#11809)
Summary:
Or even taking them as inputs. This prevents optimizations to happen
either inside the differentiable subgraphs, or in the surrounding graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809

Differential Revision: D10009680

Pulled By: apaszke

fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4
2018-09-24 13:24:53 -07:00
ffbac7d0bb Miscellaneous updates for CUDA 10 (#12017)
Summary:
This PR has some updates related to CUDA 10.

- c2195e9864 ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888
- 423d8d3524 follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017

Differential Revision: D10013405

Pulled By: soumith

fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8
2018-09-24 11:58:32 -07:00
a6f1ae7f20 set up c10 scaffolding. Move macros proper first.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939

Reviewed By: orionr, dzhulgakov

Differential Revision: D10004629

Pulled By: Yangqing

fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c
2018-09-24 11:09:59 -07:00
1a1d79e761 Remove TIndex typedef from core/common.h (#11993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11993

See title

Reviewed By: ezyang

Differential Revision: D10006069

fbshipit-source-id: 5e2aac993968307c850e431c00052cb1a339ced2
2018-09-24 10:55:55 -07:00
a9e6a673ae Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876

Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes().

codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte   "item<uint8_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong   "item<int64_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt    "item<int32_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData   "data<uint8_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData   "data<int64_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData    "data<int32_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData  "data<float>"

codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte   "item<uint8_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong   "item<int64_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt    "item<int32_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData   "data<uint8_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData   "data<int64_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData    "data<int32_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData  "data<float>"

codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>"

codemod -d tc           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

Reviewed By: ezyang

Differential Revision: D9948572

fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c
2018-09-24 10:40:10 -07:00
1178851280 Get rid of most usages of Type.tensor. (#12002)
Summary:
1) Most usages are replaced by at::empty.
2) native_tensor has its namespace function removed
3) Type.tensor(sizes, strides) becomes at::empty_strided(sizes, strides).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12002

Differential Revision: D10007201

Pulled By: gchanan

fbshipit-source-id: 5e5647c050ed2ecb87a33e0b5ce4928fa3186c34
2018-09-24 10:16:18 -07:00
76ab26cc3e Remove unused THNN functions due to removal of torch/legacy (#11946)
Summary:
See title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11946

Differential Revision: D9994625

Pulled By: cpuhrsch

fbshipit-source-id: fca3d48ecbdab06ce53249db2402fc4613da4d21
2018-09-22 21:54:55 -07:00
a6630e25af Remove many caffe2::TIndex and replace them with int64_t (#11943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943

See title

Reviewed By: ezyang

Differential Revision: D9992645

fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a
2018-09-22 18:11:04 -07:00
5d0f1c3c8f Add #include to satisfy Android NDK unified headers
Summary:
Old per-API+arch headers reside in
  /opt/android_ndk/r*/platforms/android-*/arch-*/usr/include/
New Unified headers reside in
  /opt/android_ndk/r*/sysroot/usr/include/

Unified headers are not exactly drop-in replacements for the old ones. Old headers had some nested includes that are absent in the unified versions, so we need to explicitly include them.

Reviewed By: mzlee

Differential Revision: D9952200

fbshipit-source-id: 6515e1d1ab576069db499c3fb23a69d507279c8c
2018-09-22 15:39:56 -07:00
7517e53468 Update onnx submodule to onnx/onnx@c4734c6 (#11958)
Summary:
c4734c6200
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11958

Differential Revision: D10002779

Pulled By: bddppq

fbshipit-source-id: 8bd7dfc8fdaf0b699a61f5b228f7102a16b92258
2018-09-22 01:40:31 -07:00
f15474ade8 Export caffe2::Caffe2Annotation symbols (#11965)
Summary:
Some of these symbols are used by device_test.cc .

d0db23e95a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11965

Reviewed By: bwasti

Differential Revision: D10002439

Pulled By: bddppq

fbshipit-source-id: 4ae95b9c888b3c7685d0ffdbcbfa3441bcf90091
2018-09-21 22:43:48 -07:00
1c282ab99a Move GetExceptionString to Error.h (#11501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11501

This doesn't really belong to TypeMeta, moving it to the error handling header

Reviewed By: ezyang

Differential Revision: D9763424

fbshipit-source-id: 127a8246171ab3a4475f2767d2dc1cc13c486a2e
2018-09-21 21:54:33 -07:00
825181ea9d Rewrite C++ API tests in gtest (#11953)
Summary:
This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch.

You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were:

1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt`
2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all.
3. Updated README

ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953

Differential Revision: D9998883

Pulled By: goldsborough

fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80
2018-09-21 21:28:16 -07:00
d0db23e95a Add distributed annotations
Summary: Annotations for DAI

Reviewed By: duc0

Differential Revision: D9805867

fbshipit-source-id: 9ce2d9f3984817510ec8362a281f39878aad55e7
2018-09-21 19:09:59 -07:00
de11fe0c83 migrate PReLU to ATen (#11758)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10723
- migrate PReLU to ATen and deprecate legacy PReLU
- performance:

CPU with weight.numel() = 1
```
>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
100 loops, best of 100: 9.43 ms per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
10 loops, best of 100: 24.4 ms per loop

>>> m = nn.PReLU()
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 695 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.47 ms per loop
```

CPU with weight.numel() = channels
```
>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 603 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 13.3 ms per loop

>>> m = nn.PReLU(100)
>>> x = torch.randn(100, 100, 100, requires_grad=True)
>>> %timeit -r 100 y = m(x)
1000 loops, best of 100: 655 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 y.backward(retain_graph=True)
100 loops, best of 100: 2.45 ms per loop
```

CUDA with weight.numel() = 1
```
>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 187 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.01 ms per loop

>>> m = nn.PReLU().cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 195 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.28 ms per loop
```

CUDA with weight.numel() = channel
```
>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
1000 loops, best of 100: 174 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.27 ms per loop

>>> m = nn.PReLU(100).cuda()
>>> x = torch.randn(100, 100, 100, requires_grad=True).cuda()
>>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize();
10000 loops, best of 100: 181 µs per loop

>>> y = m(x).sum()
>>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize();
100 loops, best of 100: 2.26 ms per loop
```

The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels.

ezyang SsnL zou3519  soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758

Differential Revision: D9995799

Pulled By: weiyangfb

fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e
2018-09-21 16:26:04 -07:00
89d56ae435 Move function deletion from the stack to the heap. (#11611)
Summary:
This eliminates the need for any heuristics regarding stack size limits.

This is a re-do #11534 with a fix to properly handle cases where
multiple edges exist between a pair of functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611

Differential Revision: D9991198

Pulled By: resistor

fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0
2018-09-21 16:11:03 -07:00
b5f60af94c Shape prop view/reshape/as_strided through prim::ListConstructs (#11877)
Summary:
Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct.
See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28).
This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary
to do premultiplication).

If aten::view is passed an output of a prim::ListConstruct node, then shape prop should
be able to figure out its TensorType because we statically know the number of inputs to
prim::ListConstruct. This PR implements that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877

Differential Revision: D9972356

Pulled By: zou3519

fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5
2018-09-21 14:20:01 -07:00
7efbf3a827 Specialize ArgumentSpecs on tuple elements too (#11863)
Summary:
This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network.

Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863

Differential Revision: D9992814

Pulled By: apaszke

fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc
2018-09-21 14:19:58 -07:00
1cf5b0c7c1 Fix casting logic for 0d CPU tensors in CUDA ops (#11808)
Summary:
Previously, we didn't cast any 0-dim tensors used in CUDA operations. We
can only avoid the casts for 0-dim CPU tensors used in CUDA operations.

Fixes #11795
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11808

Differential Revision: D9922406

Pulled By: colesbury

fbshipit-source-id: 940b8a8534770aa5cd70d5d09b96be0f0f8146ff
2018-09-21 14:19:56 -07:00
1ad7e0c5ec Minor JIT improvements (#11654)
Summary:
- Disable addmm fusion. The reason for this is explained in the comment.
- Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly).
- Fixed a correctness issue in requires grad propagation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654

Reviewed By: colesbury

Differential Revision: D9813739

Pulled By: apaszke

fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3
2018-09-21 14:19:54 -07:00
4e65fbfee5 Remove tests from EXCLUDE_SCRIPT that pass (#11916)
Summary:
Spruriously added in #11261

I had a PR to catch these automatically (#11279), but it had some issues
passing on some CI environments but not others (e.g. for
`test_nn_group_norm`), any ideas?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11916

Differential Revision: D9992065

Pulled By: driazati

fbshipit-source-id: 05cfa8ed9af939e8ffd5827847ee7bfe0be799b2
2018-09-21 14:19:50 -07:00
00fe2c5606 Use -O1 for sleef build in Debug mode (#11942)
Summary:
`-O0` is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In `-O0`, the compiler spills *every* intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function `Sleef_tanhf8_u10avx2` would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with `-O1`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11942

Differential Revision: D9994658

Pulled By: jamesr66a

fbshipit-source-id: cdd9474c6ae3aa9898d5715ac19a900f5f90468a
2018-09-21 13:24:59 -07:00
775358e4c2 Add non-legacy test of bilinear (#11935)
Summary:
Fixes: #11905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935

Differential Revision: D9991120

Pulled By: soumith

fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6
2018-09-21 12:43:35 -07:00
23f5b2abbe Fixes an error with canonical url. (#11938)
Summary:
Deleted this section by mistake in last PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11938

Reviewed By: SsnL

Differential Revision: D9993258

Pulled By: brianjo

fbshipit-source-id: 2552178cebd005a1105a22930c4d128c67247378
2018-09-21 12:21:42 -07:00
c2a2110d71 Stop tracing _out overloads (#11910)
Summary:
They aren't recognized anywhere in the JIT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11910

Differential Revision: D9979968

Pulled By: apaszke

fbshipit-source-id: bb2505a14e3b1e54d5c243f99c80a4f4d918b204
2018-09-21 11:44:10 -07:00
c6a14b1edd Revert D9985212: [pytorch][PR] [minor] remove a remaining todo line deletion in THD cmake
Differential Revision:
D9985212

Original commit changeset: 5f8e7ac94101

fbshipit-source-id: 1783cbfc91008ab3db36bad7c1bf51e16da7fb2d
2018-09-21 11:25:53 -07:00
817e83fc01 fix PR #11061 (#11815)
Summary:
- fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data `
- with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args
- `torch.as_tensor` retains its behavior as documented

gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815

Differential Revision: D9932713

Pulled By: weiyangfb

fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259
2018-09-21 11:04:19 -07:00
6834dcab1c Align cuda multinomial without replacement to CPU behaviour (#11933)
Summary:
We do this by being more NaN tolerant.

Fixes: #9062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11933

Differential Revision: D9991129

Pulled By: soumith

fbshipit-source-id: c99b04462c1bee90d00eeabb0c111de12f855f4d
2018-09-21 11:04:17 -07:00
784d345828 Fix docstring of torch.jit.createResolutionCallback (#11921)
Summary:
The sample code in the docstring of `torch.jit.createResolutionCallback` is not working:

`createResolutionCallback()` gets the frame of `bar`. In order to get the frame of  `baz`, one need to use `createResolutionCallback(1)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11921

Differential Revision: D9989123

Pulled By: soumith

fbshipit-source-id: a7166defdccbbf6979f7df4c871298e6b9a2b415
2018-09-21 09:41:57 -07:00
e655f16c35 Pop stashed IntList in resize_, warn about its usage when tracing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11909

Differential Revision: D9979595

fbshipit-source-id: 07b1027bd6bd1605a31afd4f57bcd58e307fa41e
2018-09-21 08:40:20 -07:00
4fb7e72fe5 Fix _thnn_fused_lstm_cell backward (#11872)
Summary:
There are two parts:
- Optional tensors cannot be dispatch tensors because dispatch
  tensors cannot be optional.
- While the kernel dealt with undefined grad_outs, the logistics
  around it did not fully accomodate grad_hy being undefined.

Fixes: #11800

Thank you, mttk for the reproduction!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11872

Differential Revision: D9978527

Pulled By: apaszke

fbshipit-source-id: e622c288d2eac93bd8388e141fb773f2588e2b8f
2018-09-21 08:25:00 -07:00
48c8adfe1b Turn storage on UndefinedTensorImpl into nullptr. (#11738)
Summary:
I also fix a bug that crept in while we had incorrect semantics where UndefinedTensorImpl was a CPU tensor, and thus some moves which shouldn't have been legal didn't crash. Moving out the Tensor* also moved out the Tensor* in the blob, and it's not supported to store an undefined tensor in a blob.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11738

Reviewed By: gchanan

Differential Revision: D9847859

fbshipit-source-id: db6be0f76a8e6526a89fd0e87b6a23b9cc820c8d
2018-09-21 08:24:57 -07:00
11bd2f2509 Retainable is no more (#11900)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#11900 Retainable is no more**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D9977505/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #11902 Refactor fastGet/fastSet for clarity, removing a null pointer check.&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D9977654/)

Kill it with fire
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11900

Differential Revision: D9979779

Pulled By: ezyang

fbshipit-source-id: 0a437e7a0baadb6440e7dc39a01b4a406171faa7
2018-09-21 06:58:18 -07:00
a7afd133f5 Sync FindCUDA.cmake with upstream cmake repo (#11880)
Summary:
Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391/diffs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11880

Differential Revision: D9989119

Pulled By: soumith

fbshipit-source-id: 66e87367127975a5f1619fe447f74e76f101b503
2018-09-21 06:58:17 -07:00
58d28a5f12 Fix saving loaded module (#11915)
Summary:
This PR fixes #11913.

In order to test for this, the model is serialized twice in `getExportImportCopy`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11915

Differential Revision: D9984697

Pulled By: soumith

fbshipit-source-id: ae0250c179000c03db1522b99410f6ecb9681297
2018-09-21 06:58:16 -07:00
0d9be2135f remove a remaining todo line deletion in THD cmake (#11920)
Summary:
TSIA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11920

Differential Revision: D9985212

Pulled By: Yangqing

fbshipit-source-id: 5f8e7ac94101177740e791f44eaa8c8ec55a908c
2018-09-21 00:40:20 -07:00
b2b05b7c20 Move blob serialization to free functions (#11817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817

Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.

Reviewed By: ezyang

Differential Revision: D9882726

fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
2018-09-20 23:27:34 -07:00
17cd426c72 Updated docs styles (#11835)
Summary:
Updated requirements.txt and conf.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11835

Reviewed By: SsnL

Differential Revision: D9941160

Pulled By: brianjo

fbshipit-source-id: fbac91214558e6d17beff74261d990c7dc762038
2018-09-20 21:11:12 -07:00
d712a71741 Protobuf serialization (#11619)
Summary:
This PR serves two purposes:

1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general,
2. Add serialization to the ONNX/PyTorch proto format.

This is currently a rough prototype I coded up today, to get quick feedback.

For this I propose the following serialization interface within the C++ API:

```cpp
namespace torch { namespace serialize {
class Reader {
 public:
  virtual ~Reader() = default;
  virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0;
  virtual void finish() { }
};

class Writer {
 public:
  virtual ~Reader() = default;
  virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0;
  virtual void finish() { }
};
}} // namespace torch::serialize
```

There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to:

1. Provide a cereal-less serialization forward that we can ship and iterate on going forward,
2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft.

The user-facing API is (conceptually):

```cpp
void torch::save(const Module& module, Writer& writer);
void torch::save(const Optimizer& optimizer, Writer& writer);
void torch::read(Module& module, Reader& reader);
void torch::read(Optimizer& optimizer, Reader& reader);
```

with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader`

ebetica ezyang zdevito dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619

Differential Revision: D9984664

Pulled By: goldsborough

fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847
2018-09-20 20:39:34 -07:00
30521a37ad codemod: caffe::float16 -> at::Half (#11785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11785

Replace each instead of float16 with Half.

Reviewed By: Yangqing

Differential Revision: D9892158

fbshipit-source-id: b9225ca7bd5c84fd1c04a9d24b026c8b6cbff120
2018-09-20 18:55:19 -07:00
a9459bf7b5 Replace float16 with at::Half in caffe2 (#11676)
Summary:
- Finishes unifying Half type in pytorch and caffe2
- As a side effect, aten_op works for fp16 now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11676

Reviewed By: weiyangfb

Differential Revision: D9829019

Pulled By: li-roy

fbshipit-source-id: b8c9663873c10fe64c90ef180dc81af2e866674e
2018-09-20 18:55:17 -07:00
9c44c60794 Bump up the frontend version (#11873)
Summary:
To update the onnx model zoo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11873

Reviewed By: BIT-silence

Differential Revision: D9953369

Pulled By: houseroad

fbshipit-source-id: 5e96a982b8029dceeb08e3bea4094bae053e1865
2018-09-20 16:20:48 -07:00
9f0d9db6e4 Improve GRU/LSTM documentation for multiple layers (#11896)
Summary:
Prompted by Alex Falcon's input on the forums. Thank you!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11896

Differential Revision: D9976831

Pulled By: SsnL

fbshipit-source-id: 460af51049c289ed4ce529b7b6ae6314e2bdaae4
2018-09-20 15:42:48 -07:00
c7751f4df0 MIOpen bug fixes and performance enhancements (#11766)
Summary:
This PR contains changes for:
1. Performance enhancements for group conv using MIOpen
2. Performance enhancements by removing unnecessary computations while running pooling through MIOpen
3. Added check for bwdData comptutation while running MIOpen convGradient operator
4. Fix in MIOpen poolingGradient operator to compute window size for global pooling case
5. Minor code cleanup in MIOpen spatial batch norm operator

Differential Revision: D9979050

Pulled By: bddppq

fbshipit-source-id: fabc7a44a2f9ca0307d99564d1ce8fe1de9a6fbb
2018-09-20 15:31:46 -07:00
b91b15d86e Implementing Matrix Norm for torch.norm (#11261)
Summary:
Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261

Reviewed By: li-roy

Differential Revision: D9652379

Pulled By: yya007

fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c
2018-09-20 14:43:13 -07:00
6100c0ea14 Introduce ExtensionVersioner for C++ extensions (#11725)
Summary:
Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.

I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero.

One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.

Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found.

ezyang gchanan soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725

Differential Revision: D9948244

Pulled By: goldsborough

fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 14:43:12 -07:00
068eac255b Jit fuse clamp (#11574)
Summary:
This patch adds fused forward and backward for clamp to the jit.
This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 .

The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged.

This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11574

Differential Revision: D9943090

Pulled By: apaszke

fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67
2018-09-20 14:43:10 -07:00
d8f6be686d Remove torch/legacy (#11823)
Summary:
Largely unused and hinders current development
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823

Differential Revision: D9925094

Pulled By: cpuhrsch

fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5
2018-09-20 14:00:54 -07:00
24ec813967 Defer lazyInitCUDA() until needed (#11893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11893

This is needed to run binaries compiled with CUDA support on on CPU-only machines.

Reviewed By: teng-li

Differential Revision: D9972872

fbshipit-source-id: 7e4107925b3cd4d2fcf84ae532e800ab65f4b563
2018-09-20 12:12:42 -07:00
9cd0ae5e2d Remove deprecated factory functions from Type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11583

Reviewed By: SsnL

Differential Revision: D9792800

fbshipit-source-id: 9af46d577911ff38647790169df66aa5d0379dd9
2018-09-20 11:39:48 -07:00
87701289a3 fix link to previous versions (#11894)
Summary:
https://github.com/pytorch/pytorch.github.io/issues/68#issuecomment-423073108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11894

Differential Revision: D9973695

Pulled By: soumith

fbshipit-source-id: 1f74b12487ec39f4e88b527dcdfca0742e689c15
2018-09-20 11:10:37 -07:00
0927386890 Workaround CUDA logging on some embedded platforms (#11851)
Summary:
Fixes #11518
Upstream PR submitted at https://gitlab.kitware.com/cmake/cmake/merge_requests/2400

On some embedded platforms, the NVIDIA driver is verbose logging unexpected output to stdout.
One example is Drive PX2, where we see something like this whenever a CUDA program is run:

```
nvrm_gpu: Bug 200215060 workaround enabled.
```

This patch does a regex on the output of the architecture detection program to only capture architecture patterns.
It's more robust than before, but not fool-proof.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11851

Differential Revision: D9968362

Pulled By: soumith

fbshipit-source-id: b7952a87132ab05c724b287b76de263f1f671a0e
2018-09-20 09:26:00 -07:00
1c77f9e543 Support torch.distributed.barrier in gloo backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11844

Reviewed By: colesbury, SsnL

Differential Revision: D9929055

Pulled By: pietern

fbshipit-source-id: 3a34a179cb80f495f18aa926c0f9513924737d8e
2018-09-20 09:25:59 -07:00
8f4601fbac renable test_scalar_fusion
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11378

Differential Revision: D9943578

Pulled By: zou3519

fbshipit-source-id: fb9e4303e844d5e2515acce7869bcbe11526ab56
2018-09-20 07:56:25 -07:00
23dd5b4a53 Back out "Open-source ThreadSafeActivationCleaningPredictor"
Summary:
Original commit changeset: bfe253ae5fc8

Apparently Ads push process detected some regression which normal
canaries don't show.
https://fb.facebook.com/groups/1274424122598505/permalink/2597819483592289/

Reviewed By: highker, Prowindy

Differential Revision: D9952807

fbshipit-source-id: 1a3ea249c3b1e2618220c61f3d51468824b6ef10
2018-09-19 21:26:51 -07:00
83740eae4a Avoid using PyThreadState.frame as it is not a public member. (#11855)
Summary:
The doc of PyThreadState [1] emphasizes that interp is its only public member. Use PyEval_GetFrame() instead.

[1] https://docs.python.org/3/c-api/init.html#c.PyThreadState
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11855

Differential Revision: D9954430

Pulled By: ezyang

fbshipit-source-id: 92da6781e45e2bcb5e3a37b162fa40e49d823215
2018-09-19 20:58:37 -07:00
c64331f48f Add test for verifying combine_spatial_bn values in DPM (#11710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11710

Added a test to check that output and gradient values are correctly
calculated wehn combine_spatial_bn is true on data parallel model

Reviewed By: enosair

Differential Revision: D9833660

fbshipit-source-id: 14d29fbebefa9dc303ffae06f9899ea4bde23025
2018-09-19 20:17:51 -07:00
aa8cd7319a Enable build_test on windows (#11802)
Summary:
This PR enables BUILD_TEST for Caffe2 on windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11802

Reviewed By: orionr

Differential Revision: D9951223

Pulled By: mingzhe09088

fbshipit-source-id: 7cdc1626b999daadeae482bd569eebdbd53eb6d4
2018-09-19 20:17:49 -07:00
c22dcc266f Show build output in verbose mode of C++ extensions (#11724)
Summary:
Two improvements to C++ extensions:

1. In verbose mode, show the ninja build output (the exact compile commands, very useful)
2. When raising an error, don't show the `CalledProcessError` that shows ninja failing, only show the `RuntimeError` with the captured stdout

soumith fmassa ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11724

Differential Revision: D9922459

Pulled By: goldsborough

fbshipit-source-id: 5b319bf24348eabfe5f4c55d6d8e799b9abe523a
2018-09-19 20:17:43 -07:00
1091c5e59f Throw error on indexing a 0 dim tensor (#11679)
Summary:
Following through on warning that indexing 0-dim tensor would be an
error in PyTorch 0.5 and to use `item()` instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11679

Reviewed By: soumith

Differential Revision: D9833570

Pulled By: driazati

fbshipit-source-id: ac19f811fa7320d30b7f60cf66b596d6de684d86
2018-09-19 18:10:03 -07:00
6831d64591 Fix the symbolic for embedding_bag in ONNX_ATEN_FALLBACK (#11840)
Summary:
The ATen interface was changed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11840

Reviewed By: BIT-silence

Differential Revision: D9932452

Pulled By: houseroad

fbshipit-source-id: dd2040fcaa0f6052e5856ee19823cf3064124585
2018-09-19 17:40:39 -07:00
ae1a972d78 Fix #11752: correct numerical issue with log_softmax (#11866)
Summary:
This fixes the numerical problem in log_softmax cpu code when inputs are big but their differences are small.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11866

Differential Revision: D9946799

Pulled By: soumith

fbshipit-source-id: 11fe8d92b91ef6b7a66f33fbce37ec2f0f0929be
2018-09-19 17:09:45 -07:00
6302e4001a Delete unnecessary include from allocator.cc/event_cpu.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11862

Reviewed By: Yangqing

Differential Revision: D9942428

fbshipit-source-id: dea03f5ba0e621a047aa50bc4aa97acc834d2a39
2018-09-19 16:45:54 -07:00
f4d25039cb Fix Array.h when compiled with C++17 (#11816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11816

The file isn't in the std:: namespace, so is_same
must be qualified.

Reviewed By: smessmer

Differential Revision: D9923774

fbshipit-source-id: 126532e27f08b5616ca46be1293d5d837920f588
2018-09-19 16:45:53 -07:00
b06e35b568 Back out "Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h"
Summary: Original commit changeset: 0d1792804d73

Reviewed By: Yangqing

Differential Revision: D9940725

fbshipit-source-id: 540a8ac7afcfe56a6b63abc6ed297c9434320998
2018-09-19 16:45:51 -07:00
cedd12d86a Explicitly qualify references to CPU. (#11819)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11819

Differential Revision: D9928730

Pulled By: ezyang

fbshipit-source-id: 3140b6ef168586558f04fa8ee90f6f2169605d7d
2018-09-19 16:45:49 -07:00
24e958a0a7 Move bernoulli into ATen (#10273)
Summary:
+ https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken
  fixed in moving `bernoulli_out` to ATen
+ https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken
  fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods
+ https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results
  fixed by adding CUDA asserts

In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors.

The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like:
```cpp

  // The template argument `4` below indicates that we want to operate on four
  // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details.
  at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>(
      ret, p,
      [seeds] __device__(
          int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4,
          const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) {
        curandStatePhilox4_32_10_t state;
        curand_init(
            seeds.first,
            blockIdx.x * blockDim.x + threadIdx.x,
            seeds.second,
            &state);
        float4 rand = curand_uniform4(&state);
        switch (n) {
          case 4: {
            assert(0 <= p4 && p4 <= 1);
            v4 = static_cast<scalar_t>(rand.w <= p4);
          }
          case 3: {
            assert(0 <= p3 && p3 <= 1);
            v3 = static_cast<scalar_t>(rand.z <= p3);
          }
          case 2: {
            assert(0 <= p2 && p2 <= 1);
            v2 = static_cast<scalar_t>(rand.y <= p2);
          }
          case 1: {
            assert(0 <= p1 && p1 <= 1);
            v1 = static_cast<scalar_t>(rand.x <= p1);
          }
        }
      }
    );
```

Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops:

post patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
6.841588497161865 +- 0.05413117632269859
torch.bernoulli(xc)
0.05963418632745743 +- 0.0008014909108169377
x.bernoulli_()
0.4024486541748047 +- 0.0021550932433456182
xc.bernoulli_()
0.02167394384741783 +- 2.3818030967959203e-05

```

pre-patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
12.394511222839355 +- 0.0966421514749527
torch.bernoulli(xc)
0.08970972150564194 +- 0.0038722590543329716
x.bernoulli_()
1.654480218887329 +- 0.02364428900182247
xc.bernoulli_()
0.058352887630462646 +- 0.003094920190051198

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273

Differential Revision: D9831294

Pulled By: SsnL

fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088
2018-09-19 16:45:47 -07:00
cf5a21e4a1 Add back proto opt disable feature that was lost during refactor (#11875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11875

Seems like the refactor to predictor_config dropped some functionality that is now blocking other teams

rFBS2b30208263c14ce7039f27c618a3b232bf11ee33 is the change that was missed

hoping to land this quickly :)

Reviewed By: jonmorton

Differential Revision: D9948324

fbshipit-source-id: 1628f7c51c06319fa7ca5dc9d59799135bb82c5f
2018-09-19 15:33:26 -07:00
c30790797f Minor data loader doc improvements
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821

Differential Revision: D9948292

Pulled By: SsnL

fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820
2018-09-19 15:33:25 -07:00
ce55767091 Add the missing header (#11864)
Summary:
Otherwise, some macro doesn't have the definition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11864

Reviewed By: BIT-silence

Differential Revision: D9943327

Pulled By: houseroad

fbshipit-source-id: 53e1bfc7a6b832f249f169b75a8fc15cdab63bf4
2018-09-19 14:40:19 -07:00
3b1a5a1b8a Refactor tests part 2 (#11811)
Summary:
Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811

Reviewed By: houseroad

Differential Revision: D9923074

Pulled By: ajyu

fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9
2018-09-19 10:09:28 -07:00
52472508e9 Add env:// rendezvous test (#11782)
Summary:
A missing environment variable raised a missing key error. Now it
raises a more descriptive error of the actual problem, for example:

ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11782

Differential Revision: D9888962

Pulled By: pietern

fbshipit-source-id: 5947e7a7bf7aa45f13bbd7b5e997529f26cc92d6
2018-09-19 09:56:06 -07:00
fa32317780 Add empty tensor tests to test_sparse (#11228)
Summary:
This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass.

**[NOTE] API CHANGE:**
- `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228

Differential Revision: D9930449

Pulled By: yf225

fbshipit-source-id: 7c62439b216a6badf7938a10741c358ff18a556d
2018-09-19 09:40:26 -07:00
8c3a94eaf2 Improve autograd profiler performance (#11773)
Summary:
To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine.

| Run                                          | Time                    |
|----------------------------------------------|-------------------------|
| No profiler                                  | 45ms                    |
| With profiler                                | 56ms                    |
| Use `clock_gettime` instead of `std::chrono` | 48ms                    |
| Touch all pages on block allocation          | 48ms (less jitter)      |
| Use `const char*` instead of `std::string`   | 47ms (even less jitter) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773

Differential Revision: D9886858

Pulled By: apaszke

fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0
2018-09-19 09:25:43 -07:00
b3a2665e0f Code-reorg to have TORCH_ARG in its own header (#11787)
Summary:
I noticed I was including `torch/nn/pimpl.h` in the optimizer library just to access `TORCH_ARG`, even though that file includes a lot of irrelevant code. Let's save some re-compilation time by refactoring this macro into a separate logical file. #small-wins

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11787

Differential Revision: D9924447

Pulled By: goldsborough

fbshipit-source-id: 5acd4ba559ffb2a3e97277e74bb731d7b1074dcf
2018-09-19 09:25:41 -07:00
32494c226e OperatorDef <==> NodeProto Conversion (#11621)
Summary:
Operator level proto conversion between (new) torch proto and (old) caffe2 proto.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11621

Reviewed By: BIT-silence

Differential Revision: D9892422

Pulled By: houseroad

fbshipit-source-id: 01a55ec0a09479876a27082d90fc970723f4d431
2018-09-19 08:41:33 -07:00
8601b33c07 fix half grad assignment (#11781)
Summary:
currently grad assignment for half type fails with a misleading RuntimeError
```
RuntimeError: torch.cuda.sparse.HalfTensor is not enabled.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11781

Differential Revision: D9931884

Pulled By: soumith

fbshipit-source-id: 03e946c3833d1339a99585c9aa2dbb670f8bf459
2018-09-18 23:00:49 -07:00
b46f1b8ca7 Open-source ThreadSafeActivationCleaningPredictor (#11779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11731

This Predictor provides threadsafe interface and also
cleans-up activations after each run. So in multi-model setup
activation space doesn't explode

Reviewed By: highker

Differential Revision: D9842374

fbshipit-source-id: bfe253ae5fc813e73a347c5147ff6b58d50781ea
2018-09-18 21:56:58 -07:00
77af40c025 prioritize Accelerate over OpenBLAS (#11812)
Summary:
might fix some binary build issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812

Reviewed By: ezyang

Differential Revision: D9927309

Pulled By: soumith

fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800
2018-09-18 21:56:57 -07:00
53b5f14f59 Remove inclusion of caffe2 pb (#11820)
Summary:
Probably not needed, but fwiw.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11820

Reviewed By: orionr

Differential Revision: D9924953

Pulled By: Yangqing

fbshipit-source-id: 4d340e3d4f4dadc50fb68bed9572b8e1e54b5f6d
2018-09-18 21:16:19 -07:00
a26ad5a332 Remove unnecessary check on device option pointer (#11845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11845

The device pointer will be used by cudaPointerGetAttributes, which handles nullptr already. So this check is not necessary.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#group__CUDART__UNIFIED_1gd89830e17d399c064a2f3c3fa8bb4390

Reviewed By: salexspb

Differential Revision: D9929828

fbshipit-source-id: d862f7e5590998ffafe9bfc7754b0f83d2ae4af4
2018-09-18 21:16:18 -07:00
8aedc27a63 checking device types of input and weights at RNN (#10185)
Summary:
- fixes #9534
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185

Differential Revision: D9141222

Pulled By: weiyangfb

fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476
2018-09-18 20:26:02 -07:00
e80d1d2876 Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h
Differential Revision:
D9924348

Original commit changeset: 8d92b9e8b424

fbshipit-source-id: 0d1792804d7387023af3a9c29477f1da6f40044a
2018-09-18 18:27:00 -07:00
2c358eaf51 Caffe2: add plan name to logging (#11704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11704

Add plan name to the logging in RunPlan

Reviewed By: Tianshu-Bao

Differential Revision: D9802416

fbshipit-source-id: 45c359dba0a5d992e303b3cdcf34624881a631d8
2018-09-18 18:10:13 -07:00
1f34be47d9 Raise error when perf test result is NaN (#11588)
Summary:
Currently one of our GPU perf tests `test_gpu_speed_mnist` reports NaN after this commit (https://github.com/pytorch/pytorch/pull/8018), and we didn't have the logic in place to raise error when this happens. This PR fixes the problem and will also update the baseline properly even if its previous value is NaN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11588

Differential Revision: D9831798

Pulled By: yf225

fbshipit-source-id: b95eee38d69b3b8273f48b8ac7b7e0e79cf756ed
2018-09-18 18:10:12 -07:00
a79f5d77ad Add pretty printer for JIT IR (#10319)
Summary:
Adds some pretty-printing capability to the IR graph to make debugging easier/more human readable, see `torch/csrc/jit/test_jit.cpp:925` and onwards for example outputs. Results aren't perfect yet but it's a start.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10319

Reviewed By: zdevito

Differential Revision: D9558402

Pulled By: driazati

fbshipit-source-id: 1d61c02818daa4c9bdca36d1477d1734cfc7d043
2018-09-18 17:39:44 -07:00
1c8686001f Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h (#11818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11818

To do this, I have to move the static context registry into ATen/core.
I take the opportunity to convert it into an unordered_map.

Reviewed By: Yangqing

Differential Revision: D9924348

fbshipit-source-id: 8d92b9e8b4246ce608eba24ecef7ad5f8b9b6582
2018-09-18 17:25:46 -07:00
3da8d71d7d remove protobuf inclusion in core/logging.h (#11814)
Summary:
This should not be there since logging does not depend on protobuf.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11814

Reviewed By: ezyang

Differential Revision: D9923819

Pulled By: Yangqing

fbshipit-source-id: 4d4edaea1a2e317f5db6e92c35d58c85dd35c5fb
2018-09-18 17:10:02 -07:00
53cf628503 Simplify Blob move constructor/assignment (#11402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11402

- Simplify move constructor/assignment
- Make more things noexcept

Reviewed By: ezyang

Differential Revision: D9728631

fbshipit-source-id: 92562e30ea1e4d05ca857665a02b0ca66b0739e3
2018-09-18 15:09:40 -07:00
e585f2fb48 Polish CPP docs, Minor Python Docs Fixes (#11722)
Differential Revision: D9919120

Pulled By: goldsborough

fbshipit-source-id: bf14cbe4ab79524495957cb749828046af864aab
2018-09-18 14:55:57 -07:00
8ad846fda5 Don't build Detectron ops with NO_CAFFE2_OPS=1 (#11799)
Summary:
cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11799

Differential Revision: D9922745

Pulled By: orionr

fbshipit-source-id: b88724b7c2919aabc00d98658e8e563233e01c85
2018-09-18 14:09:33 -07:00
d4e1fa45d0 allow no-alpha add/sub in onnx symbolic (#10972)
Summary:
The PR fixes #10873

The context is aten::add and aten::sub ST overloads don't have alpha, so onnx symbolic does not match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10972

Reviewed By: jamesr66a

Differential Revision: D9724224

Pulled By: wanchaol

fbshipit-source-id: eb5d1b09fa8f1604b288f4a62b8d1f0bc66611af
2018-09-18 13:55:39 -07:00
7d25fa3c72 Emit Undefined type for value when it is Dynamic type (#11810)
Summary:
For example, outputs of control blocks often have Dynamic type, and when we try to export them to ONNX we get an invalid proto, since `elem_type` is not populated on the TypeInfoProto. This makes it so at least we can get past the checker, since having a dynamic typed output from a control block should still be semantically valid
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11810

Differential Revision: D9922754

Pulled By: jamesr66a

fbshipit-source-id: 5c66113cc302a9d9b8b9f5a8605473d3c6ad5af1
2018-09-18 13:55:36 -07:00
1d399a80a0 Handle pollution of MAX, MIN and CHECK macros. (#11805)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805

Some of our headers in Caffe2 pollute the macro namespace with things like MAX,
MIN, CHECK, so I renamed these in places where this is a problem.

This patch courtesy of gchanan, extracted out of #11721

Reviewed By: Yangqing

Differential Revision: D9917757

fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c
2018-09-18 13:18:31 -07:00
9eb72889b4 Add successor/predecessor functions
Summary: More functionality to prep nomnigraph for scheduler implementations

Reviewed By: duc0

Differential Revision: D9794686

fbshipit-source-id: b460859d8ff965d0049b2a696bd8d2f5c97f3f86
2018-09-18 12:27:06 -07:00
47956ddf7e Revert D9755189: [pytorch][PR] [API CHANGE] Add empty tensor tests to test_sparse
Differential Revision:
D9755189

Original commit changeset: e9d36f437db1

fbshipit-source-id: 8b99edf626418a953a8bd786847a6e0174a3a14d
2018-09-18 11:26:10 -07:00
540ef9b1fc Add distributed get_backend (#11715)
Summary:
I have no idea how to run distributed tests locally so I'll let CI do this. Hopefully everything still works with `IntEnum`.

cc mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11715

Reviewed By: pietern

Differential Revision: D9889646

Pulled By: SsnL

fbshipit-source-id: 1e2a487cb6fe0bd4cc67501c9d72a295c35693e2
2018-09-18 10:56:24 -07:00
2732c8bae1 improve aten/convolution error message (#11768)
Summary:
fixes https://github.com/pytorch/pytorch/issues/11762
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11768

Differential Revision: D9884185

Pulled By: soumith

fbshipit-source-id: 2a0c3e1f5a4fb4833ae6e9fc791abcf45f7fbea2
2018-09-18 10:56:22 -07:00
98aebed88e Refactor tests part 1 (#11350)
Summary:
Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594)

Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner.

I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase).

1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests.
2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests.
3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients.
4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object.
5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo.

I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do
```
settings(...)
given(...)
def test_my_stuff(...)
```
But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350

Reviewed By: houseroad

Differential Revision: D9693857

Pulled By: ajyu

fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
2018-09-18 10:42:10 -07:00
6073f3073e Document torch::nn::init (#11778)
Summary:
Doc fixes and documentation for `torch::nn::init`.

ebetica soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11778

Differential Revision: D9886648

Pulled By: goldsborough

fbshipit-source-id: 22eb78add1dc32b92cc32253683ab3d746505a64
2018-09-18 10:26:21 -07:00
c8fbeb3aa2 Add empty tensor tests to test_sparse (#11228)
Summary:
This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass.

**[NOTE] API CHANGE:**
- `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228

Differential Revision: D9755189

Pulled By: yf225

fbshipit-source-id: e9d36f437db1a132c423d3a282ff405a084ae7cc
2018-09-18 10:26:18 -07:00
e00fb69b25 Use CATCH prefix to avoid name conflicts with Caffe2.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11780

Differential Revision: D9889925

Pulled By: gchanan

fbshipit-source-id: 5eca849c36ced00b8ae7482b7945b445a3e1687e
2018-09-18 08:12:45 -07:00
4ee0a78ee6 varargs for meshgrid (#11600)
Summary:
Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device.

Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446

The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience.

Differential Revision: D9892876

Pulled By: ezyang

fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556
2018-09-18 07:41:31 -07:00
e2bc95e1bd add ModuleList.insert (#11664)
Summary:
fixes #11652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664

Differential Revision: D9892845

Pulled By: ezyang

fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5
2018-09-18 07:41:28 -07:00
91b6458e2d Container __getitem__ slicing for subclasses (#11694)
Summary:
Simple change to allow ModuleList subclasses's `__getitem__(slice)` to return class of subclass rather than ModuleList
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11694

Differential Revision: D9892824

Pulled By: ezyang

fbshipit-source-id: b75e9c196487f55cb93f0dab6c20d850e8e759ff
2018-09-18 01:26:18 -07:00
e734c94fa2 Quick update to embedding_bag doc (#11784)
Summary:
Related to #11624 adding maxes to the function def of embedding_bag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784

Differential Revision: D9892598

Pulled By: ezyang

fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b
2018-09-17 23:56:05 -07:00
407a9fee0c make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061)
Summary:
- fix https://github.com/pytorch/pytorch/issues/10876
- the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source
- with this fix, the behavior becomes
```
>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=True)
>>> print(copy)
tensor([[-1.2001,  1.9869],
        [-1.0134,  1.3096]], grad_fn=<CopyBackwards>)

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source, requires_grad=False)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])

>>> source = torch.randn(2, 2, requires_grad=True)
>>> copy = torch.tensor(source)
>>> print(copy)
tensor([[-0.7402,  0.0467],
        [ 0.4344, -0.0420]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061

Differential Revision: D9569714

Pulled By: weiyangfb

fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1
2018-09-17 23:29:09 -07:00
63c811b3a6 Include some JIT things in C++ docs (#11712)
Summary:
Since we're making parts of the JIT public as part of loading script modules, they should be on the cppdocs website.

Orthogonal: We decided not to export things like `IValue` into the `torch` namespace, so `RegisterOperators` shouldn't be there either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11712

Differential Revision: D9837578

Pulled By: goldsborough

fbshipit-source-id: 4c06d2fa9dd4b4216951f27424c2ce795febab9c
2018-09-17 23:29:04 -07:00
bd43d64dd5 Add strides to Tensor (#11763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11763

baseline-std vector
```
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative  time/iter  iters/s
============================================================================
TensorConstructionDestruction                                6.74us  148.26K
TensorShareData                                              5.89us  169.78K
TensorShareExternalPointer                                   1.01us  994.35K
TensorReallocation                                           2.46us  405.78K
============================================================================
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative  time/iter  iters/s
============================================================================
TensorConstructionDestruction                                7.50us  133.27K
TensorShareData                                              7.07us  141.38K
TensorShareExternalPointer                                   1.05us  955.19K
TensorReallocation                                           2.55us  391.62K
============================================================================

```

baseline-smallvector
```
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative  time/iter  iters/s
============================================================================
TensorConstructionDestruction                                6.56us  152.34K
TensorShareData                                              5.84us  171.32K
TensorShareExternalPointer                                 962.49ns    1.04M
TensorReallocation                                           2.32us  431.73K
============================================================================
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative  time/iter  iters/s
============================================================================
TensorConstructionDestruction                                6.29us  159.04K
TensorShareData                                              5.73us  174.39K
TensorShareExternalPointer                                 914.90ns    1.09M
TensorReallocation                                           2.29us  435.80K
============================================================================
```

Reviewed By: ezyang

Differential Revision: D9694097

fbshipit-source-id: c462e770a4b40e640d8c9d38e0ae7036a4e6e84a
2018-09-17 22:09:40 -07:00
a02685e109 Fix test_torch's test_potri (#11770)
Summary:
tset_potri -> test_potri, even though it has been like this for a long time

More a curiosity than grave functionality...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770

Reviewed By: ezyang

Differential Revision: D9884767

Pulled By: soumith

fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387
2018-09-17 21:58:18 -07:00
3cbec5453b Reorder statements for readability (#11764)
Summary:
I was reading this a couple times before figuring out it's also the entry point for the MPI_COMM_WORLD.

Reordered statements and added comment to clarify.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11764

Differential Revision: D9882834

Pulled By: pietern

fbshipit-source-id: a9282d55368815925fd695a2541354e5aec599da
2018-09-17 21:58:15 -07:00
a7cbcb1bb9 Enable build_python on windows (#11385)
Summary:
The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385

Reviewed By: orionr

Differential Revision: D9884906

Pulled By: mingzhe09088

fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6
2018-09-17 21:40:03 -07:00
63e384a381 SNNTest with Data Preproc Service (#11707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11707

Trigger SNN offline training test with data preproc service.

Reviewed By: xsh6528

Differential Revision: D9826978

fbshipit-source-id: f98405ca1e61a7662bf0d9313aaba42436025a83
2018-09-17 21:25:49 -07:00
7f0dd2487d Move AT_HOST_DEVICE macro to Macros.h (#10945)
Summary:
```
I'm using AT_HOST_DEVICE outside of Half.h in an upcoming PR. Since this
changes code without making any semantic changes, I wanted to make this
change in a separate PR.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10945

Differential Revision: D9539821

Pulled By: colesbury

fbshipit-source-id: 0daae40ea78b077a543f7bfeec06b225634540de
2018-09-17 18:25:51 -07:00
e8ecbcdf01 Move IValue to ATen/core (#11610)
Summary:
unblocks D9202320
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11610

Differential Revision: D9774853

Pulled By: bwasti

fbshipit-source-id: 4798223f6de680a7152283e8cad8814da7f90209
2018-09-17 18:25:50 -07:00
d4dde0bcaf Detect number of amd gpus in ROCM CI (#11771)
Summary:
We now have CI machines with different number of amd gpus.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11771

Differential Revision: D9889837

Pulled By: bddppq

fbshipit-source-id: dacf728a282f209e3f2419da186e59528a08ca6a
2018-09-17 18:11:09 -07:00
24a8c13f36 Add barrier to fix distributed test flakiness (#11775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11775

This should fix #11582.

Reviewed By: ezyang

Differential Revision: D9885546

fbshipit-source-id: 3544f42ebe8b595cdf6941859c67484d3ea9b3f8
2018-09-17 17:31:45 -07:00
7d0657f13c Migrate test in cpp/api/ to use gtest (#11556)
Summary:
The second part of T32009899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556

Differential Revision: D9888224

Pulled By: zrphercule

fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908
2018-09-17 17:31:43 -07:00
3819d25418 Clean up converter and accept less-valid networks
Summary: Cleaning up converter.cc and allowing networks that have "pass through" inputs (that are also outputs but aren't actually consumed by the network)

Reviewed By: duc0

Differential Revision: D9759435

fbshipit-source-id: 1ddfcc60a1b865a06682e4022230dfecc4b89ec3
2018-09-17 17:31:41 -07:00
ca5def1b8f Expose annotations (#11649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11649

Putting annotations in python interface

Reviewed By: duc0

Differential Revision: D9784750

fbshipit-source-id: d877c886ac52559ca3f009a1fd848dd1779b7d04
2018-09-17 16:39:37 -07:00
3ce17bf8f6 Generate ATen/core to source if env GEN_TO_SOURCE is set. (#11759)
Summary:
It is currently tedious to change code generation because it takes two steps: change the code gen, then gen.py fails because of file mismatch.  Just add an environment option of generating directly to source.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11759

Differential Revision: D9867259

Pulled By: gchanan

fbshipit-source-id: 3cf8024d9e302f382cf8b8a44cb843fb086f8597
2018-09-17 15:25:33 -07:00
7df6650e9c Fix empty embedding bag on cuda (#11740)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11739
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740

Differential Revision: D9881392

Pulled By: SsnL

fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760
2018-09-17 14:40:03 -07:00
7671f4ab1c Add math to scope when using inf in tests (#11302)
Summary:
This fixes #8515 which was mostly issues in the test themselves. As long
as `math` is imported in the scope in which the script runs it resolves
to a `prim::Constant` with value `inf` correctly. This PR adds this to
the `test_jit.py` tests involving `inf` and adds a test to demonstrate
`inf` in a non-generated test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11302

Differential Revision: D9684336

Pulled By: driazati

fbshipit-source-id: 73df2848dfdb45ab50690a7c88df8fda269a64eb
2018-09-17 14:08:32 -07:00
29610621ec 64B align for avx512 (#11748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748

For avx512, we need to align at a multiple of 64B not 32B
Regardless of avx512, it's in general a good idea to be cache line aligned.

Reviewed By: ilia-cher

Differential Revision: D9845056

fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512
2018-09-17 14:08:31 -07:00
336323f53c return aten::gt to the list of fusable operations, add expected graphs (#11150)
Summary:
Fixes one of #11118 issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11150

Differential Revision: D9861372

Pulled By: apaszke

fbshipit-source-id: 98b196b89e991d3936360b30568360367fd32e8b
2018-09-17 13:40:41 -07:00
73738ec570 bump version to 1.0 (#11717)
Summary:
I'm just doing the honors and bumping the version to 1.0.0.

1.0 preview and RC releases will have the 1.0.0.dev{date} tag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717

Reviewed By: SsnL

Differential Revision: D9840857

Pulled By: soumith

fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace
2018-09-17 12:13:48 -07:00
47d65ed34f Fix issue 10492 (#11634)
Summary:
- pass infos vector by reference
- checkErrors takes infos vector by reference
- modified gesv tests to not cause infs or nans sporadically
- also clean up error messages

Reviewed By: ezyang

Differential Revision: D9818550

Pulled By: soumith

fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8
2018-09-17 12:13:45 -07:00
39520ffec1 remove Type/Tensor/TensorMethods include order dependencies. (#11720)
Summary:
Previously, it was a necessity to include TensorMethods.h after Tensor.h in order to get the tensor method definitions.
We abstracted this away from users by making sure ATen.h did this correctly; but we don't have any equivalent for ATen/core.

In order to solve this dependency issue, we now forward declare Tensor in the Type declaration, which breaks the dependency cycle.
Type.h now includes Tensor.h (for backwards compatibility) and Tensor.h now includes TensorMethods.h, so there is no longer include dependency restrictions.

We could get rid of TensorMethods.h completely now, but that would involve coordinating a code generation change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11720

Reviewed By: ezyang

Differential Revision: D9841488

Pulled By: gchanan

fbshipit-source-id: 1668199095e096c1790e646b5dc9f61ec1b33c0a
2018-09-17 11:10:32 -07:00
e125e61824 Fix flake8
Summary: Fix flake8

Reviewed By: ezyang

Differential Revision: D9873872

fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd
2018-09-17 11:10:29 -07:00
cdefc27795 Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162

as title, fix pr test failure

Reviewed By: chocjy

Differential Revision: D9619308

fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146
2018-09-17 10:29:03 -07:00
7949250295 Fixes for Torch Script C++ API (#11682)
Summary:
A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial:

1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature.

2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr  AFAIK this is a much, much better way of doing all of this, no?

3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target.

4. Fixes to JIT docs.

These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap.

zdevito dzhulgakov soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682

Differential Revision: D9839431

Pulled By: goldsborough

fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65
2018-09-17 09:54:50 -07:00
a7e3cd09e0 Fix ctc gradient handling (#11753)
Summary:
Fixes: #11750

Also fix cuda ctc with double to enable gradient check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11753

Differential Revision: D9861318

Pulled By: ezyang

fbshipit-source-id: 2e7afea2b60dbbd891bb5d0bda61ee75fe01d933
2018-09-17 09:54:47 -07:00
07fd4450ab Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0)
Differential Revision:
D9831398

Original commit changeset: db119d3f9c26

fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549
2018-09-17 09:39:23 -07:00
f6a6d7fae1 Switch at::TensorImpl to store TypeMeta rather than ScalarType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11702

Reviewed By: cpuhrsch

Differential Revision: D9831384

fbshipit-source-id: 1b1233a70ed70b47a3dab4a5797b6cfcb7a2c265
2018-09-17 09:09:35 -07:00
6660a128a5 Cache and use TypeMeta in TensorImpl (#11706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11706

This is necessary to handle use-cases when Storage is not set (because the
tensor in question doesn't have a notion of storage.)

Reviewed By: orionr

Differential Revision: D9833361

fbshipit-source-id: e90a384019f44f57682b687d129b54e85b6fabb9
2018-09-17 08:58:13 -07:00
2baba7f835 Add storage_offset to Caffe2 (#11701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11701

There's one extra multiply from TypeMeta::itemsize() which needs
to be characterized.  For all existing Caffe2 uses, storage_offset
is zero.

Reviewed By: li-roy

Differential Revision: D9831230

fbshipit-source-id: 353678edf76d2ccc297a73475a34f6ab2a20d1e1
2018-09-17 08:58:11 -07:00
35518b3dc7 Back out "Back out "Refactor Tensor/TensorImpl constructors."" E2: Confirm problem with old patch (#11744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11744

Original commit changeset: 093e4c47d557

Restores D9813742

Reviewed By: dzhulgakov

Differential Revision: D9847835

fbshipit-source-id: f3f467891e01c923dd9d3352d892cf59e10402f1
2018-09-17 08:58:09 -07:00
0d345cfa18 Remove Type method defaults in ATen. (#11675)
Summary:
This will allow us to break the dependency cycle between Tensor and Type, because currently Type has defaulted Tensor (reference)  arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11675

Reviewed By: ezyang

Differential Revision: D9819720

Pulled By: gchanan

fbshipit-source-id: a9577ac34a358120075129ab0654e7862d1dace6
2018-09-17 08:58:07 -07:00
5bfd8f583c Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726)
Summary:
This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726

Reviewed By: orionr

Differential Revision: D9844075

Pulled By: pjh5

fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa
2018-09-17 08:58:05 -07:00
a8b1755de6 Check device argument makes sense for legacy tensor constructors. (#11669)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/11427.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669

Differential Revision: D9817881

Pulled By: gchanan

fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6
2018-09-17 08:24:25 -07:00
d63bb72d89 Remove symbol export annotations in THC/generic/*.cu (#11367)
Summary:
We use these annotations during function declarations, not definitions. See the description of compiler error [C2491](https://msdn.microsoft.com/en-us/library/62688esh.aspx) for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11367

Reviewed By: ezyang

Differential Revision: D9697923

Pulled By: orionr

fbshipit-source-id: 1e539c02957851386f887e6d0510ce83117a1695
2018-09-17 08:24:23 -07:00
f5bc2aef07 Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563)
Summary:
Fix the link OpenMP link error for AppleClang 9.0 compiler.

Built with the following command:
python setup.py build develop

The error message:

```
Undefined symbols for architecture x86_64:
  "___kmpc_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_critical", referenced from:
      _THFloatTensor_addmm in THTensorMath.cpp.o
      _THDoubleTensor_addmm in THTensorMath.cpp.o
      _THByteTensor_addmm in THTensorMath.cpp.o
      _THCharTensor_addmm in THTensorMath.cpp.o
      _THShortTensor_addmm in THTensorMath.cpp.o
      _THIntTensor_addmm in THTensorMath.cpp.o
      _THLongTensor_addmm in THTensorMath.cpp.o
      ...
  "___kmpc_end_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_end_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_for_static_fini", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_4", referenced from:
      _.omp_outlined. in init.cpp.o
      _.omp_outlined..35 in init.cpp.o
      _.omp_outlined..36 in init.cpp.o
      _.omp_outlined..37 in init.cpp.o
      _.omp_outlined..49 in init.cpp.o
      _.omp_outlined..52 in init.cpp.o
      _.omp_outlined..220 in init.cpp.o
      ...
  "___kmpc_for_static_init_8", referenced from:
      _.omp_outlined..9 in Embedding.cpp.o
      _.omp_outlined. in EmbeddingBag.cpp.o
      _.omp_outlined. in GridSampler.cpp.o
      _.omp_outlined..42 in GridSampler.cpp.o
      _.omp_outlined..44 in GridSampler.cpp.o
      _.omp_outlined..45 in GridSampler.cpp.o
      _.omp_outlined..47 in GridSampler.cpp.o
      ...
  "___kmpc_for_static_init_8u", referenced from:
      _.omp_outlined..203 in init.cpp.o
      _.omp_outlined..207 in init.cpp.o
      _.omp_outlined..209 in init.cpp.o
      _.omp_outlined..210 in init.cpp.o
  "___kmpc_fork_call", referenced from:
      at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o
      ...
  "___kmpc_global_thread_num", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "___kmpc_push_num_threads", referenced from:
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "___kmpc_reduce_nowait", referenced from:
      _.omp_outlined..270 in THTensorMoreMath.cpp.o
      _.omp_outlined..271 in THTensorMoreMath.cpp.o
      _.omp_outlined..273 in THTensorMoreMath.cpp.o
      _.omp_outlined..275 in THTensorMoreMath.cpp.o
      _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o
      _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o
      ...
  "___kmpc_serialized_parallel", referenced from:
      at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o
      at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o
      at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o
      at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o
      at::TensorIterator::for_each(std::__1::function<void (int, char**, long long const*, long long)> const&) in TensorIterator.cpp.o
      ...
  "_omp_get_max_threads", referenced from:
      _THGetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o
      ...
  "_omp_get_num_procs", referenced from:
      _THGetNumCores in THGeneral.cpp.o
  "_omp_get_num_threads", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_get_thread_num", referenced from:
      _.omp_outlined. in Embedding.cpp.o
      _.omp_outlined. in SoftMax.cpp.o
      _.omp_outlined..35 in SoftMax.cpp.o
      _.omp_outlined..37 in SoftMax.cpp.o
      _.omp_outlined..38 in SoftMax.cpp.o
      _.omp_outlined..46 in SoftMax.cpp.o
      _.omp_outlined..47 in SoftMax.cpp.o
      ...
  "_omp_in_parallel", referenced from:
      _THFloatTensor_copy in THTensorCopy.cpp.o
      _THDoubleTensor_copy in THTensorCopy.cpp.o
      _THByteTensor_copy in THTensorCopy.cpp.o
      _THCharTensor_copy in THTensorCopy.cpp.o
      _THShortTensor_copy in THTensorCopy.cpp.o
      _THIntTensor_copy in THTensorCopy.cpp.o
      _THLongTensor_copy in THTensorCopy.cpp.o
      ...
  "_omp_set_num_threads", referenced from:
      _THSetNumThreads in THGeneral.cpp.o
      caffe2::Caffe2SetOpenMPThreads(int*, char***) in init_omp.cc.o
ld: symbol(s) not found for architecture x86_64
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563

Differential Revision: D9831398

Pulled By: ezyang

fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed
2018-09-17 08:24:20 -07:00
6f6b03566b Vectorize grid sample 2d CPU kernels (#10980)
Summary:
This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically,

 1. add `.data()` in `TensorAccessor`
 2. support non-void return value for declaring CPU kernel stub
 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)`
1. The following vectorized CPU primitives are added:

    + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]`
    + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`.
    + comparison ops
    + binary logical ops
    + `min(a, b)`
    + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation
    + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`.
    + ctor with multiple values (i.e., `setr`)
    + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters
    + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size
    + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`:
        ```
        inputs:
          {a0, a1, a2, a3, a4, a5, a6, a7}
          {b0, b1, b2, b3, b4, b5, b6, b7}
        outputs:
          {a0, b0, a1, b1, a2, b2, a3, b3}
          {a4, b4, a5, b5, a6, b6, a7, b7}
        ```

  2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`:

  ```
   NOTE [ Grid Sample CPU Kernels ]

   Implementation of vectorized grid sample CPU kernels is divided into three
   parts:

   1. `ComputeLocation` struct
      Transforms grid values into interpolation locations of the input tensor
      for a particular spatial dimension, basing on the size of that dimension
      in input tensor, and the padding mode.
```
```cpp
      template<typename scalar_t, GridSamplerPadding padding>
      struct ComputeLocation {
        using Vec = Vec256<scalar_t>;

        // ctor
        ComputeLocation(int64_t size);

        // Given grid values `in`, return the interpolation locations after
        // un-normalization and padding mechanism (elementwise).
        Vec apply(const Vec &in) const;

        // Similar to `apply`, but also returns `d apply(in) / d in`
        // (elementwise).
        // this is often used in gradient computation.
        std::pair<Vec, Vec> apply_get_grad(const Vec &in) const;
      };
```
```
   2. `ApplyGridSample` struct
      Owns N `ComputeLocation` structs, where N is the number of spatial
      dimensions. Given N input grid vectors (one for each spatial dimension)
      and spatial offset, it gets the interpolation locations from
      `ComputeLocation`s, applies interpolation procedure, and then writes to
      the output (or grad_input & grad_grid in backward).
```
```cpp
      template<typename scalar_t, int spatial_dim,
               GridSamplerInterpolation interp,
               GridSamplerPadding padding>
      struct ApplyGridSample {

        // ctor
        ApplyGridSample(const TensorAccessor<scalar_t, 4>& input);

        // Applies grid sampling (forward) procedure:
        //   1. computes interpolation locations from grid values `grid_x` and
        //      `grid_y`,
        //   2. interpolates output values using the locations and input data
        //      in `inp_slice`, and
        //   3. writes the first `len` values in the interpolated vector to
        //      `out_slice` with spatial offset being `offset`.
        //
        // This assimes that `grid_x` and `grid_y` all contain valid grid
        // values \in [-1, 1], even at indices greater than `len`.
        //
        // The `*_slice` argument namess mean samples within a batch (i.e.,
        // with the batch dimension sliced out).
        void forward(TensorAccessor<scalar_t, 3>& out_slice,
                     const TensorAccessor<scalar_t, 3>& inp_slice,
                     int64_t offset, const Vec& grid_x, const Vec& grid_y,
                     int64_t len) const;

        // Applies grid sampling (backward) procedure. Arguments semantics
        // and strategy are similar to those of `forward`.
        void backward(TensorAccessor<scalar_t, 3>& gInp_slice,
                      TensorAccessor<scalar_t, 3>& gGrid_slice,
                      const TensorAccessor<scalar_t, 3>& gOut_slice,
                      const TensorAccessor<scalar_t, 3>& inp_slice,
                      int64_t offset, const Vec& grid_x, const Vec& grid_y,
                      int64_t len) const;
      }
```
```
   3. `grid_sample_2d_grid_slice_iterator` function
      Among the tensors we work with, we know that the output tensors are
      contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in
      backward), we need to randomly read `input` anyways, and `grad_output`
      usually comes from autograd and is often contiguous. So we base our
      iterating strategy on the geometry of grid.
      `grid_sample_2d_grid_slice_iterator` function provides an abstract to
      efficiently iterates through a `grid` slice (without batch dimension).
      See comments of that function on the specific cases and strategies used.
```
```cpp
      template<typename scalar_t, typename ApplyFn>
      void grid_sample_2d_grid_slice_iterator(
        const TensorAccessor<scalar_t, 3>& grid_slice,
        const ApplyFn &apply_fn);

      // `apply_fn` is a function/lambda that can be called as if it has
      // declaration:
      //   void apply_fn(const Vec256<scalar_t>& grid_x,
      //                 const Vec256<scalar_t>& grid_y,
      //                 int64_t spatial_offset, int64_t len);
```
```
      `apply_fn` will be called multiple times, and together cover the entire
      output spatial space. Therefore, e.g., to implement forward 2d grid
      sample, we can do
```
```cpp
      ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor);

      for (int n = 0; n < input_accessor.size(0); n++) {
        grid_sample_2d_grid_slice_iterator(
          grid_accessor[n],
          [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y,
              int64_t spatial_offset, int64_t len) {
            grid_sample.forward(out_accessor[n], input_accessor[n],
                                spatial_offset, grid_x, grid_y, len);
          });
      }
   ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980

Differential Revision: D9564867

Pulled By: SsnL

fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4
2018-09-16 20:41:10 -07:00
10c29c8970 Fix CUDA 8 build on Windows (#11729)
Summary:
Tested via https://github.com/pytorch/pytorch/pull/11374.
Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11729

Differential Revision: D9847807

Pulled By: orionr

fbshipit-source-id: 69af3e6c5bba0abcbc8830495e867a0b1b399c22
2018-09-16 08:09:24 -07:00
ca6f08f359 Set correct dtype for fp16 op inference function (#11693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11693

as desc.

Reviewed By: hyuen

Differential Revision: D9829061

fbshipit-source-id: 0f4c8a9d2b95d4cf5fa20a2aefd5671f273a8e76
2018-09-15 23:40:41 -07:00
b3e726042c Do not use FixedDivisor in ROCM order switch op (#11697)
Summary:
Fix the recent order_switch_test failure in ROCM CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11697

Reviewed By: BIT-silence

Differential Revision: D9831039

Pulled By: bddppq

fbshipit-source-id: 2368fd1ac7b1bab335ff3377071246cfd3392f3f
2018-09-15 18:24:51 -07:00
eb3c47bdd5 max -> fmaxf in cross_entropy kernel (#11733)
Summary:
Changing `max` to `fmaxf` in `LabelCrossEntropy` kernel for hip to work correctly.

bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11733

Differential Revision: D9846783

Pulled By: bddppq

fbshipit-source-id: c1b394d2ba7ee0e819f7bf3b36b53d1962de5522
2018-09-15 18:13:42 -07:00
f09054f8d0 Remove deprecate warning for Upsampling (#11568)
Summary:
Fixes #11452 .

Based on the discussion with SsnL  and soumith , we want to bring back Upsample as a module instead of introducing a new nn.interpolate module for now. If anyone want to do downsample, they should use `nn.functional.interpolate ` instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11568

Differential Revision: D9804359

Pulled By: ailzhang

fbshipit-source-id: 2b232d55fc83c2b581bf336f1ee8d1cf1c1159ca
2018-09-14 17:54:48 -07:00
bb6f18c44f Simplify IValue::toTensor() (#11355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11355

There is no reason to implement refcounting manually in this case.
Given the correct NullType, toIntrusivePtr() and moveToIntrusivePtr() will do the right thing.

Reviewed By: ezyang

Differential Revision: D9694918

fbshipit-source-id: 8aae4d66aec32ca5f85c438d66339bd80b72b656
2018-09-14 16:57:15 -07:00
690c999bba Simplify union payload copying (#11353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11353

Before, there was one extra member in the union that had to be at least as large as the largest other member, because it was used for copying.

Now, this isn't needed anymore and we copy the union directly.

Reviewed By: ezyang

Differential Revision: D9694326

fbshipit-source-id: 42b2f7d51ac5d4ea5ebafea3a598b018e10fed68
2018-09-14 16:57:14 -07:00
270fb22bd8 Remove intrusive_ptr::reclaim() in Storage (2/2) (#11547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11547

Pushing manual refcounting further back, making things safer.

Reviewed By: ezyang

Differential Revision: D9778042

fbshipit-source-id: c9572edc440c5ce5ea1b2355b5c54f87078ea28e
2018-09-14 16:57:12 -07:00
f4d9fe395d Remove intrusive_ptr::reclaim() in Storage (#11352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352

Pushing manual refcounting further back, making things safer.

Reviewed By: ezyang

Differential Revision: D9694327

fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd
2018-09-14 16:57:10 -07:00
2c8a1b957e Back out "Refactor Tensor/TensorImpl constructors."
Summary: Original commit changeset: 7501b54fe5f3

Reviewed By: gchanan

Differential Revision: D9838097

fbshipit-source-id: 093e4c47d5574ce99f706b0683ef369a89b62b38
2018-09-14 16:39:31 -07:00
8e76dcf173 Prevent raising KeyboardInterrupt in worker (#11718)
Summary:
Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print
```
RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace.
```
due to our SIGCLD handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718

Differential Revision: D9840844

Pulled By: SsnL

fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187
2018-09-14 16:09:35 -07:00
d24bcfd930 Suppress hiprand "duplicate-decl-specifier" warning (#11698)
Summary:
Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug.

iotamudelta Jorghi12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698

Differential Revision: D9840356

Pulled By: bddppq

fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8
2018-09-14 15:51:43 -07:00
8e3f8c52e8 Document the Sequential module (#11648)
Summary:
1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level
2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes.

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648

Differential Revision: D9834526

Pulled By: goldsborough

fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1
2018-09-14 15:51:41 -07:00
96d3f968eb Splits CPU and CUDA fusion compilers (#10981)
Summary:
This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular:

- A fusion interface is created that allows "fusion handles" to be requested
- The CPU and CUDA fusers implement this interface, with dispatch determined by device
- The fusion compilers, fusion function specializations and resource strings are split
- CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser
- Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common
- There is still some specialization in jit/fusers/common, but these specializations are small(-ish)
- Updates the build system to remove the dummy interface on Windows and minimize the use of macros

This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981

Reviewed By: soumith

Differential Revision: D9701999

Pulled By: apaszke

fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3
2018-09-14 14:05:34 -07:00
70e68e755a Casting for binary ops (#11708)
Summary:
Fixes #11663

`TensorIterator` was replacing the op tensors with type casted tensors
which ended up producing side effects in binary ops like `a.float() * b`
where `a` and `b` are `LongTensor`s.

colesbury ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708

Differential Revision: D9834016

Pulled By: driazati

fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d
2018-09-14 13:40:21 -07:00
224e62bbec respect USE_CUDA_STATIC_LINK in build_libtorch.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11713

Differential Revision: D9835972

Pulled By: anderspapitto

fbshipit-source-id: 046363b132e5487c05ef7e6e6d88b508196386a1
2018-09-14 12:25:08 -07:00
0c2648830f Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881)
Summary:
Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script.  If the op is in the forward pass that's easy:  ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward.  If the op is in the backward pass, it's much more difficult.  From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information.  Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are.

This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions.

The method is simple and surgical.  During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number.  During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op.

Double-backward is not a problem.  If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward:  The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass.

Minor caveats:

- The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name).  So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1.  However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker.
- Feel free to litigate my use of stringstream in profiler.cpp.  I did it because it was easy and clean.  If that's too big a hammer, let's figure out something more lightweight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881

Differential Revision: D9833371

Pulled By: apaszke

fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088
2018-09-14 11:56:55 -07:00
b90872c00e Get rid of default arguments for TH/THC factory functions. (#11673)
Summary:
This is causing codegen problems in caffe2, when we try to remove the circular Tensor/Type declarations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11673

Differential Revision: D9819341

Pulled By: gchanan

fbshipit-source-id: f2c2cd96e8a16f6de6aa4889e71b8a78e12e9256
2018-09-14 10:55:38 -07:00
7535d98ec4 Add message tag parameter to send/recv
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490

Reviewed By: teng-li

Differential Revision: D9828116

Pulled By: pietern

fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7
2018-09-14 10:55:37 -07:00
3258fc11a7 Delete torch/csrc/api/README.md (#11703)
Summary:
We'll have separate docs for the C++ frontend, right now this file is just misleading
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11703

Differential Revision: D9832847

Pulled By: goldsborough

fbshipit-source-id: 2e8b30ccf6b5cba9d0526e6261160f7c6211a35c
2018-09-14 10:55:35 -07:00
278e304c18 Implement elif in string frontend (#11667)
Summary:
Closes #11625
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11667

Differential Revision: D9828145

Pulled By: jamesr66a

fbshipit-source-id: c72dc41cb310a4211b4e4c6b33f7e2c1fb3581a0
2018-09-14 10:09:46 -07:00
115b13ffab clean up some old Half stuff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11687

Differential Revision: D9829027

Pulled By: li-roy

fbshipit-source-id: f35dcdf93ea57ba4fa775e36e9d6378bed46a710
2018-09-14 09:54:45 -07:00
eb039dc92c Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597

We should always CHECK pointers which we plan to dereference
if they are inputs to the function. Nobody knows how the function will
be called in the future.

Reviewed By: yinghai

Differential Revision: D9800002

fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03
2018-09-14 09:40:27 -07:00
0d9b9100f9 Fix gesv and gels docs (#11699)
Summary: Closes #9935 and closes #5431 .

Differential Revision: D9830448

Pulled By: soumith

fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514
2018-09-14 09:24:45 -07:00
72822ee6b2 Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533)
Summary:
…cuda())

While I was at it, I audited all other ways I know how we might get a CUDA
type from PyTorch and fixed more constructors which don't work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533

Differential Revision: D9775786

Pulled By: ezyang

fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17
2018-09-14 09:10:08 -07:00
2631da0822 Move some Tensor method definitions from Type.h to TensorMethods.h. (#11650)
Summary:
There's no reason they need to be in Type.h and this moves us along the path of not having circular dependencies (so we can get rid of TensorMethods.h).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11650

Reviewed By: ezyang

Differential Revision: D9812271

Pulled By: gchanan

fbshipit-source-id: 8b70db9a5eb0a332398ab2e8998eeaf7d2eea6d7
2018-09-14 08:56:02 -07:00
6c3792b9ec Implement UndefinedType::typeMeta.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11666

Differential Revision: D9816212

Pulled By: gchanan

fbshipit-source-id: 079899590150009bc2e2a3bbdc78a98de9380e37
2018-09-14 08:40:26 -07:00
cda71e2600 Disallow scalar parameters in Dirichlet and Categorical (#11589)
Summary:
This adds a small check in `Dirichlet` and `Categorical` `__init__` methods to ensure that scalar parameters are not admissible.

**Motivation**
Currently, `Dirichlet` throws no error when provided with a scalar parameter, but if we `expand` a scalar instance, it inherits the empty event shape from the original instance and gives unexpected results.

The alternative to this check is to promote `event_shape` to be `torch.Size((1,))` if the original instance was a scalar, but that seems to add a bit more complexity (and changes the behavior of `expand` in that it would affect the `event_shape` as well as the `batch_shape` now). Does this seem reasonable? cc. alicanb, fritzo.

```python
In [4]: d = dist.Dirichlet(torch.tensor(1.))

In [5]: d.sample()
Out[5]: tensor(1.0000)

In [6]: d.log_prob(d.sample())
Out[6]: tensor(0.)

In [7]: e = d.expand([3])

In [8]: e.sample()
Out[8]: tensor([0.3953, 0.1797, 0.4250])  # interpreted as events

In [9]: e.log_prob(e.sample())
Out[9]: tensor(0.6931)  # wrongly summed out

In [10]: e.batch_shape
Out[10]: torch.Size([3])

In [11]: e.event_shape
Out[11]: torch.Size([])  # cannot be empty
```

Additionally, based on review comments, this removes `real_vector` constraint. This was only being used in `MultivariateNormal`, but I am happy to revert this if we want to keep it around for backwards compatibility.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11589

Differential Revision: D9818271

Pulled By: soumith

fbshipit-source-id: f9bbba90ed6f04e0b5bdfa169e70ca20b280fc74
2018-09-14 07:55:35 -07:00
c391c20063 Adding .expand method for TransformedDistribution (#11607)
Summary:
This PR:
 - adds a `.expand` method for `TransformedDistribution` along the lines of #11341.
 - uses this method to simplify `.expand` in distribution classes that subclass off of `TransformedDistribution`.
 - restores testing of `TransformedDistribution` fixtures.
 - fixes some bugs wherein we were not setting certain attributes in the expanded instances, and adds tests for `.mean` and `.variance` which use these attributes.

There are many cases where users directly use `TransformedDistribution` rather than subclassing off it. In such cases, it seems rather inconvenient to have to write a separate class just to define a `.expand` method. The default implementation should suffice in these cases.

cc. fritzo, vishwakftw, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11607

Differential Revision: D9818225

Pulled By: soumith

fbshipit-source-id: 2c4b3812b9a03e6985278cfce0f9a127ce536f23
2018-09-14 07:55:33 -07:00
74197c7115 Restore support for dim=None on WeightNorm. (#11661)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661

Reviewed By: veenix

Differential Revision: D9826799

Pulled By: ezyang

fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d
2018-09-14 07:39:43 -07:00
19065f91fc Centralize TypeExtendedInterface casts. (#11576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11576

Previously, they were spattered throughout the codebase.
We now follow this convention:

- LegacyTypeDispatch gives you Type
- Context gives you TypeExtendedInterface
- Tensor::type() gives you Type
- at::getType() gives you TypeExtendedInterface

I change some sites to use getType() over type().

Reviewed By: SsnL

Differential Revision: D9790187

fbshipit-source-id: 5e2577cb590a5bbf5df530f3763d3b3c0b4625ca
2018-09-14 07:39:41 -07:00
c5f7da3f4a Support FP16 sparse lookup (#11674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658

Reviewed By: hyuen

Differential Revision: D9676950

fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d
2018-09-14 02:40:08 -07:00
1637729620 Fix ci by skipping some tests (#11668)
Summary:
scalar_tensor_test skipped
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11668

Differential Revision: D9825819

Pulled By: zrphercule

fbshipit-source-id: 6e62a001bcde49be8f7af1501b303bd93d09d005
2018-09-13 20:25:14 -07:00
e6fe8d9cf5 Try to delete codeowners for ATen/core (#10693)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10693

Reviewed By: soumith

Differential Revision: D9772210

Pulled By: ezyang

fbshipit-source-id: 14560eaf77441980e9784536acd0ffe20b15c5b8
2018-09-13 20:25:11 -07:00
2431eac7c0 Ensure most Distribution methods are jittable (#11560)
Summary:
This adds tests in tests/test_distributions.py to ensure that all methods of `Distribution` objects are jittable.

I've replaced a few samplers with jittable versions:
- `.uniform_()` -> `torch.rand()`
- `.exponential_()` -> `-(-torch.rand()).log1p()`
- `.normal_()` -> `torch.normal(torch.zeros(...), torch.ones(...), ...)`

Some jit failures remain, and are marked in test_distributions.py
- `Cauchy` and `HalfCauchy` do not support sampling due to missing `.cauchy_()`
- `Binomial` does not support `.enumerate_support()` due to `arange` ignoring its first arg.
- `MultivariateNormal`, `LowRankMultivariateNormal` do not support `.mean`, `.entropy`

- [x] Currently some tests fail (I've skipped those) due to unavailability of `aten::uniform` and `aten::cauchy` in the jit. Can someone suggest how to add these? I tried to add declarations to `torch/csrc/ir.cpp` and `torch/csrc/passes/shape_analysis.cpp`, but that resulted in "Couldn't find operator" errors.
- [x] There are still lots of `TracerWarning`s that something doesn't match something. I'm not sure whether these are real.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11560

Differential Revision: D9816327

Pulled By: apaszke

fbshipit-source-id: 72ec998ea13fc4c76d1ed003d9502e0fbaf728b8
2018-09-13 19:55:01 -07:00
99c0b96f68 optimize norm on ATen CPU backend (#11565)
Summary:
current torch.norm() runs sequentially on CPU. This PR did parallelization and vectorization of torch.norm() on ATen CPU path, roughly provide 2 order of magnitude performance boost.

Performance is benchmarks on Xeon skylake 8180, 2*28 cores 2.5GHz, using the following script:
```python
import torch
from time import time

count = 1000
size = 1000*1000

def test_norm(p=2):
    a = torch.randn(size)
    tstart = time()
    for i in range(count):
        torch.norm(a, p)
    tend = time()
    print("norm on size %d tensor p = %d: %f s" % (size, p, (tend-tstart)))

for p in range(4):
    test_norm(p)
```

without this optimization,
```
(intel-pytorch) [mingfeim@mlt-skx065 unit_tests]$ python test_norm.py
norm on size 1000000 tensor p = 0: 1.071235 s
norm on size 1000000 tensor p = 1: 1.069149 s
norm on size 1000000 tensor p = 2: 1.068212 s
norm on size 1000000 tensor p = 3: 69.735312 s
```

and with this optimization,
```
(pytorch-tf) [mingfeim@mlt-skx053 unit_tests]$ python test_norm.py
norm on size 1000000 tensor p = 0: 0.127507 s
norm on size 1000000 tensor p = 1: 0.011867 s
norm on size 1000000 tensor p = 2: 0.011907 s
norm on size 1000000 tensor p = 3: 0.014470 s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11565

Differential Revision: D9804484

Pulled By: ezyang

fbshipit-source-id: 52899f30ac26139d00684d07edfb47cb9b25d871
2018-09-13 19:40:43 -07:00
98e04db955 Implement requires_grad propagation in the JIT (#11586)
Summary:
Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute.

I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586

Reviewed By: ezyang

Differential Revision: D9813648

Pulled By: apaszke

fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8
2018-09-13 19:25:26 -07:00
513fd3dd36 Improve doc of torch.nn.functional.pad (#11623)
Summary:
I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623

Differential Revision: D9818255

Pulled By: soumith

fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536
2018-09-13 19:25:24 -07:00
760679352e Move Pixel Shuffle to ATen (#9721)
Summary:
<del>#9692 </del>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721

Differential Revision: D8955829

Pulled By: SsnL

fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399
2018-09-13 18:25:48 -07:00
e1cd220b90 Reimplement swap() using default move constructor. (#11659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11659

This is less error-prone and less code.

Reviewed By: smessmer

Differential Revision: D9814536

fbshipit-source-id: 028510e31e2fa7a9fa11c1398b0743c5cd085dd5
2018-09-13 16:32:55 -07:00
02980d7f8c Refactor Tensor/TensorImpl constructors. (#11657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11657

Previously, we had a constructor in TensorImpl for every constructor in Tensor.
This was unnecessary and wordy: Tensor is the user-visible class, so it deserves
the constructors, but TensorImpl is internal and doesn't need it.  So
I replaced TensorImpl with a single, Storage accepting constructor, and then
rewrote Tensor to use that constructor.

Reviewed By: jerryzh168

Differential Revision: D9813742

fbshipit-source-id: 7501b54fe5f39180f1bc07573fd7c1640b0f4e89
2018-09-13 16:32:53 -07:00
7607b49538 s/GetDevicetype/device_type/ (#11656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656

The mis-capitalization really sticks up my craw.  I know why (we
already have a static function named GetDeviceType), but let's
name it differently.

```
codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type
```

Reviewed By: jerryzh168

Differential Revision: D9813544

fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636
2018-09-13 16:32:51 -07:00
c18510463b Reduce includes in tensor_impl.h (#11643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11643

- Reduce the tensor_impl.h includes to the bare
  minimum necessary
- Explicitly namespace std::

Reviewed By: jerryzh168

Differential Revision: D9811028

fbshipit-source-id: 44e32720962b35c12a7b2c93605721b9f6c5b254
2018-09-13 16:32:49 -07:00
8402fde279 Revert D9778043: Pass Storage by value
Differential Revision:
D9778043

Original commit changeset: b1381cd60a82

fbshipit-source-id: 40f1de67e939cb41605978d632105a48a91e7629
2018-09-13 16:32:48 -07:00
85ff72348d Only involve tensor device in CUDA -> CPU copy, not current device. (#11592)
Summary:
This also unifies the device usage between the async and sync case.

Fixes https://github.com/pytorch/pytorch/issues/10832.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11592

Differential Revision: D9797355

Pulled By: gchanan

fbshipit-source-id: e496cd371111cfaf9a6c664167967b395e3d72e9
2018-09-13 16:32:46 -07:00
4672280b55 Pass Storage by value (#11546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11546

-

Reviewed By: ezyang

Differential Revision: D9778043

fbshipit-source-id: b1381cd60a826055ce8771d6c67eac4cc375b3b4
2018-09-13 15:26:05 -07:00
05e06f7de2 migrating deprecated calls without abc module for containers (#11515)
Summary:
Implementing #10540.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515

Reviewed By: apaszke

Differential Revision: D9771045

Pulled By: jeffreyksmithjr

fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6
2018-09-13 15:09:22 -07:00
29e29ca6ee Use MPI_Isend/MPI_Irecv to back send/recv (#11630)
Summary:
The isCompleted function is changed to being non-const to accomodate
setting some internal status on the work object in the case of
completion. Previously, it was only checking a member field, but for the
MPI backend it calls MPI_Test to poll for completion of an asynchronous
request.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11630

Reviewed By: SsnL

Differential Revision: D9808008

Pulled By: pietern

fbshipit-source-id: 18b70825b1fb4d561a552fa75e9475a522852cd4
2018-09-13 15:01:24 -07:00
f129da1a47 Add max to the ValueError for EmbeddingBag mode check (#11655)
Summary:
Related to #11624
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655

Differential Revision: D9815454

Pulled By: SsnL

fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a
2018-09-13 14:39:40 -07:00
90537289a0 Constexpr std::move / std::forward for C++11 (#11396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11396

std::move and std::forward in C++11 aren't constexpr (they are in C++14).
This caused a build issue orionr was working on.
It should be fixed by this diff

Reviewed By: orionr

Differential Revision: D9724805

fbshipit-source-id: 0d9047dce611385d659cc71a6c04cc7a6a40a5ae
2018-09-13 12:56:17 -07:00
0f1ca569ce End-to-end dynamic slicing with ONNX DynamicSlice experimental operator (#11255)
Summary:
Requires https://github.com/onnx/onnx/pull/1377

This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255

Differential Revision: D9790216

Pulled By: jamesr66a

fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2
2018-09-13 12:39:52 -07:00
acb6f18bab fix generate_code.py caching (#11644)
Summary:
Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed.

This updated logic fixes the input caching
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644

Reviewed By: orionr

Differential Revision: D9814348

Pulled By: soumith

fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f
2018-09-13 12:39:48 -07:00
75f49befeb move instance_norm to aten (#10792)
Summary:
This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792

Differential Revision: D9800643

Pulled By: li-roy

fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1
2018-09-13 12:26:22 -07:00
912d3626c8 Split tensor.h into tensor_impl.h and tensor.h (#11642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11642

This is just a preparatory change to help with future
refactoring:

- I want to reduce the number of includes that tensor_impl.h
  depends on, but
- I need to keep tensor.h providing all Caffe2 headers, because
  users may be relying on tensor.h transitively providing those
  headers.

Introducing a level of indirection lets me do both at the same time.

Reviewed By: jerryzh168

Differential Revision: D9810823

fbshipit-source-id: 8dfaac4b8768051a22898be8fcaf787ecc57eb13
2018-09-13 12:26:20 -07:00
45e9ee096e Fix test_mnist_training_leaks_no_memory_cuda warning (#11639)
Summary:
Before this PR it would warn that "dropout is non deterministic and can
cause problems when checking trace", so I disabled the trace checking.

cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11639

Differential Revision: D9812493

Pulled By: zou3519

fbshipit-source-id: fab86928a5fba8b218b47543533aaf7c82a10b4a
2018-09-13 12:09:20 -07:00
9abc666745 stop allowing extra positional args in arg parser (#10499)
Summary:
Arg parser allowed additional positional args to be parsed into keyword-only params.

Fixes a couple cases:
- The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected.
- The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message.

Pre-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
tensor([6, 0], device='cuda:1')
```
Post-fix:
```
>>> torch.cuda.LongTensor((6, 0), 1, 1, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of:
 * (torch.device device)
 * (torch.Storage storage)
 * (Tensor other)
 * (tuple of ints size, torch.device device)
 * (object data, torch.device device)
```

Pre-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int
```

Post-fix:
```
>>> a = torch.tensor(5)
>>> a.new_zeros((5,5), 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: new_zeros() takes 1 positional argument but 2 were given
```

fixes #8351
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499

Differential Revision: D9811093

Pulled By: li-roy

fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76
2018-09-13 11:56:12 -07:00
6f53b4efea Remove implicit bool casts (#11503)
Summary:
In order to comply with Python's rules on implicit casting of
non-booleans to booleans, this PR removes implicit casting in favor of
explicit casts via `bool()`

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503

Differential Revision: D9780869

Pulled By: driazati

fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9
2018-09-13 11:26:45 -07:00
ab3a2d25fb Improve error messages when trying to use nested lists.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11606

Differential Revision: D9806949

Pulled By: zdevito

fbshipit-source-id: c38abc4ce745a63d26a64f6aa1b41350e4b1acd5
2018-09-13 11:10:38 -07:00
5bc90b8554 support conversion and dispatch of complex numbers (#11603)
Summary:
- Just a simple fix to support `fill_`
- And a fix for indexing in `pytorch-complex`

Differential Revision: D9804061

Pulled By: ezyang

fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf
2018-09-13 11:10:37 -07:00
a861573e36 fix tensor export bug in IR export (#11613)
Differential Revision: D9811094

Pulled By: li-roy

fbshipit-source-id: 012792dbedc70bd3fa242fdf2e39da0b21ce158d
2018-09-13 11:10:35 -07:00
d278344e36 Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb (#11622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11622

Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52

Included changes:
- **[39dd0d4](https://github.com/onnx/onnx/commit/39dd0d4)**: [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson>
- **[944db4f](https://github.com/onnx/onnx/commit/944db4f)**: cmake (#1401) <zrphercule>
- **[8ccc8dd](https://github.com/onnx/onnx/commit/8ccc8dd)**: Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan>
- **[df14e74](https://github.com/onnx/onnx/commit/df14e74)**: Change onnxifi test driver classname (#1396) <zrphercule>
- **[0c885cc](https://github.com/onnx/onnx/commit/0c885cc)**: ONNXIFI cpp test driver (#1290) <zrphercule>
- **[a557848](https://github.com/onnx/onnx/commit/a557848)**: Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana>
- **[31fd87f](https://github.com/onnx/onnx/commit/31fd87f)**: fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu>
- **[8ff08c2](https://github.com/onnx/onnx/commit/8ff08c2)**: Do not export onnx symbols in the python extension (#1388) <bddppq>

Reviewed By: orionr

Differential Revision: D9806635

fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f
2018-09-13 10:40:48 -07:00
1f49b879d1 Add missing include for __half (#11638)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11638

Differential Revision: D9811063

Pulled By: ezyang

fbshipit-source-id: dd103bb152485bcdbb0108b4d3de2443c30d5572
2018-09-13 10:33:09 -07:00
d4d72b87e3 Sphinx is case sensitive
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11646

Differential Revision: D9811355

Pulled By: SsnL

fbshipit-source-id: d484561baa2ac5b3113870b4ee06fa3560b686e4
2018-09-13 10:33:06 -07:00
57f149a861 Only join pin_memory_thread after it started (#11599)
Summary:
Same reason as in #11432 .

Example error:
```
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28>
Traceback (most recent call last):
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__
    self._shutdown_workers()
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers
    self.pin_memory_thread.join()
AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599

Differential Revision: D9801143

Pulled By: SsnL

fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e
2018-09-13 09:40:49 -07:00
36fc1a0a58 Merge caffe2::/at::Storage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11637

Reviewed By: gchanan

Differential Revision: D9806425

Pulled By: ezyang

fbshipit-source-id: e20ec93bff6dc7fb22ca9b7e7348d060b3876b67
2018-09-13 09:40:48 -07:00
77f6998e54 Guard against inputting or returning sparse tensors (#11550)
Summary:
Add guards against using sparse tensor by checking the conversion from IValue -> PyObject & PyObject -> IValue.

This diff also changes the behavior in constant propagation to not run python ops even if all ops are constant because of possible mutation to global state. This came up in trying to run get_sparse(), and I'm including it here to make it easier to land.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11550

Differential Revision: D9804712

Pulled By: eellison

fbshipit-source-id: 9fe7daf721c6d6e48df4925c0f9c775873bcdc77
2018-09-13 08:58:29 -07:00
cac11a4ac3 Merge caffe2::/at::StorageImpl (#11543)
Summary:
Merges caffe2::StorageImpl methods with at::StorageImpl methods and defines caffe2::StorageImpl as at::StorageImpl.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11543

Differential Revision: D9795228

Pulled By: cpuhrsch

fbshipit-source-id: fbd6fa3cbf6c9099a4803337286c30e00652f95c
2018-09-13 01:25:50 -07:00
44b2b6b150 clean up jit generated tests (#11403)
Summary:
Clean up some generated tests after we have newly nice features like var args.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11403

Differential Revision: D9800545

Pulled By: wanchaol

fbshipit-source-id: e9973b113f78dc38cf99a81b6ede3fa3485f1cfa
2018-09-12 22:55:03 -07:00
e998038bc0 Use TypeMeta instead of TypeIdentifier within at::StorageImpl (#11236)
Summary:
Further aligns at::StorageImpl with caffe2::StorageImpl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11236

Differential Revision: D9776286

Pulled By: cpuhrsch

fbshipit-source-id: f2c53995fcece013b77b3a1f709ab0f9df8ab23e
2018-09-12 22:26:00 -07:00
6f05b5ee54 Pin Sphinx to 1.7.9 (#11620)
Summary:
Sphinx 1.8.0 breaks us.  Upgrading is tracked in #11618.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11620

Differential Revision: D9806440

Pulled By: ezyang

fbshipit-source-id: 7a8d849c78e697a8775d00cd3a463a7bdbcddabe
2018-09-12 21:55:21 -07:00
17637f2b03 enable_mkl support for resnet18+lstm model
Summary:
* Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU
* Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part

* The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash.

Reviewed By: viswanathgs

Differential Revision: D8888763

fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6
2018-09-12 18:56:46 -07:00
0a6931cfee Only reference ONNX through onnx_pb.h (#11609)
Summary:
I think this is needed to land https://github.com/onnx/onnx/pull/1407 without CI errors.

cc mingzhe09088 houseroad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11609

Reviewed By: houseroad

Differential Revision: D9803490

Pulled By: orionr

fbshipit-source-id: 26193f38ab0a2eef9ad7d0da9a0310dc40ef0f2d
2018-09-12 18:25:58 -07:00
5da0b31bee More native docs on TensorOptions. (#11558)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11558

Differential Revision: D9783655

Pulled By: ezyang

fbshipit-source-id: 17c749c9ef99fd9dfd0ff365ebfe22102fb891d7
2018-09-12 17:39:39 -07:00
f00f99ebcc use at::Half in THC (#11322)
Summary:
- use Half instead of half in THC
- clean up TH_float2half, TH_half2float, etc. conversions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322

Differential Revision: D9799553

Pulled By: li-roy

fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed
2018-09-12 17:39:37 -07:00
daa379ffd7 Disable flaky test ObserverTest.TestMultipleNetBase (#11596)
Summary:
Tracked in https://github.com/pytorch/pytorch/issues/9137

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11596

Differential Revision: D9803256

Pulled By: ezyang

fbshipit-source-id: 973393203ed8343a3a0feef36d34e561d9f653c4
2018-09-12 17:39:36 -07:00
e2cd627cce Temporarily disable docs build. (#11608)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11608

Differential Revision: D9803369

Pulled By: ezyang

fbshipit-source-id: a206d6137e8e729f702189c926ec898444d1dc53
2018-09-12 17:39:34 -07:00
7f7cda99cd Optimize order_swich_ops on GPU (#11404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404

Optimize order_swich_ops on GPU

Reviewed By: houseroad

Differential Revision: D9728642

fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632
2018-09-12 16:56:15 -07:00
776a9992e1 topk test fix, hgemm integration (#11593)
Summary:
After discussions in #11584 , new PR for just the test skip and hgemm integration.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593

Differential Revision: D9798527

Pulled By: ezyang

fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e
2018-09-12 16:56:13 -07:00
def44c96fd Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap.
Differential Revision:
D9779866

Original commit changeset: 96753eead790

fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4
2018-09-12 16:56:11 -07:00
5b2efcf425 Document the Conv module (#11566)
Summary:
Document the C++ API conv module. No code changes.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11566

Differential Revision: D9793665

Pulled By: goldsborough

fbshipit-source-id: 5f7f0605f952fadc62ffbcb8eca4183d4142c451
2018-09-12 16:56:09 -07:00
130d55a5f4 Allow building the C++ API without cereal (#11498)
Summary:
I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package.

For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498

Differential Revision: D9784803

Pulled By: goldsborough

fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4
2018-09-12 16:56:07 -07:00
12efef166a Split out copy_op from utility_ops (#11470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11470

In order to reduce build sizes, we are identifying files that can be split up into smaller units, allowing us to only include the ops we need.

Reviewed By: orionr, ajtulloch

Differential Revision: D9725819

fbshipit-source-id: def1074a33dffe99bd6a7e6e48aa9e5be3d04a6a
2018-09-12 16:25:48 -07:00
316c167940 Add checking of nullptrs in GetTensorInfo (#11587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11587

To help debug the issue in T33295362, we add some checks in the function.

Possible crashing site in `GetTensorInfo`
1. tc is nullptr, which is checked.
2. tc->capacity_nbytes() hits nullptr, this is unlikely because storage is not a pointer and compute of capacity_nbytes doesn't involve pointers. It's numel * itermsize().
3. tc->ExtractDeviceOption hits nullpt. One possibility raw_data() is nullptr because tc->ExtractDeviceOption will use that. This is checked.
4. Tensor itself which is not a reference. This is also checked.

Reviewed By: salexspb

Differential Revision: D9793484

fbshipit-source-id: 3fc72746fc310a23ae45553bbe0d269a4b9edb72
2018-09-12 16:25:46 -07:00
eb7a298489 Add resnext model to OSS (#11468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468

Add resnext model into OSS Caffe 2 repo.

Reviewed By: orionr, kuttas

Differential Revision: D9506000

fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8
2018-09-12 15:59:20 -07:00
c81406c514 Document Any (#11580)
Summary:
Documents the `AnyModule` class in the C++ API.

Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write

```cpp
any_module.forward(...).get<int>();
any_module.forward(...).get<torch::Tensor>();
```

you now write

```cpp
any_module.forward<int>(...);
any_module.forward(...);
```

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580

Differential Revision: D9798626

Pulled By: goldsborough

fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73
2018-09-12 15:59:19 -07:00
ac94889939 Add jit doc entry to sidebar (#11598)
Summary:
cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11598

Differential Revision: D9801230

Pulled By: SsnL

fbshipit-source-id: f0c8d2468b64a50c3c437667d462722dcd2682d1
2018-09-12 15:29:23 -07:00
b663b7ce7e Update ROCm Docker image with latest AMD debians (#11507)
Summary:
Building at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/194/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11507

Differential Revision: D9772474

Pulled By: ezyang

fbshipit-source-id: ab00f05744547dc7ec9f97511e2c8495ac282fac
2018-09-12 15:29:21 -07:00
02c4cd3c8a Skip flaky distributed tests (#11594)
Summary:
context: https://github.com/pytorch/pytorch/issues/11582

cc pietern The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11594

Differential Revision: D9798871

Pulled By: SsnL

fbshipit-source-id: 9f9e1871c7fd9505ca898865eb8068fab4d3416d
2018-09-12 14:57:57 -07:00
d4e05f4e1e Move function deletion from the stack to the heap. (#11534)
Summary:
This eliminates the need for any heuristics regarding stack size limits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534

Differential Revision: D9779866

Pulled By: resistor

fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347
2018-09-12 14:39:59 -07:00
958ba4e913 Aibench for asr decoder
Summary: as title

Reviewed By: sf-wind

Differential Revision: D9738021

fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251
2018-09-12 14:25:19 -07:00
f0a440007e Explicitly set locale on docs build. (#11595)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11595

Differential Revision: D9798567

Pulled By: ezyang

fbshipit-source-id: ac05458347e181960a07cacae1dfc68d2837451f
2018-09-12 14:11:24 -07:00
504126e705 Documentation for debugging JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11540

Differential Revision: D9798647

Pulled By: jamesr66a

fbshipit-source-id: 968a4af22c735a848fa27cbadaed9b7023ba8276
2018-09-12 14:11:22 -07:00
a3036b3bb3 Fused weightnorm for ATen (#10842)
Summary:
This PR contains a C++ implementation of weight norm.  The user-side exposure of weight norm through torch.nn.utils.weight_norm is unchanged.

If running on the GPU, and the norm is requested over the first or last dimension of the weight tensor, the forward pass is carried out using the fused kernels I wrote for our Fairseq GTC hero run, which offer superior performance to primitive ops and superior numerical stability when running in FP16.  In the common case that the backward pass is not itself constructing a graph (ie not attempting to set up double backward) the backward pass will be carried out using another fused kernel.  If the backward pass is constructing a graph, an alternate code path is taken, which does the math using differentiable primitive ops. In this way, the implementation allows double backward, even if the fused kernel was used in forward (although in this case, you don't benefit from the performance and stability of the fused backward kernel).

If running on the CPU, or if norming over an interior dim, the forward pass is carried out using double-differentiable primitive ops.

Figuring out how to generate all the right plumbing for this was tricky, but it was a fun experience learning how the autogenerator works and how the graph is constructed.  Thanks to colesbury for useful guidance on this front.

I do have a few lingering questions:

- Should I unify my return statements (ie by default-constructing Tensors outside if blocks and using operator= within)?
- What is the significance of `non_blocking` when calling e.g. `auto norms = saved_norms.to(saved_g.type().scalarType(), non_blocking=True/False);`?  I am currently omitting `non_blocking`, so it defaults to False, but I didn't see any associated synchronizes on the timeline, so I'm wondering what it means.
- Is there an "official" mapping from at::ScalarTypes to corresponding accumulate types, as there are for the PODs + Half in [AccumulateType.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h)?  I looked for an equivalent mapping for ScalarTypes, didn't find one, and ended up rigging it myself (`  at::ScalarType AccType = g.type().scalarType() == at::ScalarType::Half ? at::ScalarType::Float : g.type().scalarType();`).
- Are sparse tensors a concern?  Should I include another check for sparse tensors in the `_weight_norm` entry point, and send those along the fallback CPU path as well?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10842

Differential Revision: D9735531

Pulled By: ezyang

fbshipit-source-id: 24431d46532cf5503876b3bd450d5ca775b3eaee
2018-09-12 13:55:27 -07:00
9a7c196040 Move Type, Tensor, TensorMethods to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519

Reviewed By: yf225

Differential Revision: D9771684

Pulled By: gchanan

fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0
2018-09-12 13:10:54 -07:00
739e6af869 Add reminder % to the jit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11557

Reviewed By: apaszke

Differential Revision: D9784642

Pulled By: wanchaol

fbshipit-source-id: b7c60c3e9534555c9d7db83769965b3f2f277cdf
2018-09-12 12:40:38 -07:00
ad7936e108 Fix reloading modules back into python (#11552)
Summary:
This changes the way module import works so that when a module
is reloaded in python it becomes a ScriptModule and not a _C.ScriptModule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11552

Differential Revision: D9782751

Pulled By: zdevito

fbshipit-source-id: 9576850b75494b228ce3def94c0d371a4a44b11d
2018-09-12 12:25:15 -07:00
17e76e26c8 Add trigonometry functions to docs/source/onnx.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11581

Differential Revision: D9794449

Pulled By: soumith

fbshipit-source-id: 1218fcf8969a10ffbfefd3ced7fee9fe7df296f1
2018-09-12 12:10:01 -07:00
13b05c8c78 Add EndToEndHybridModel CUDA tests (#11544)
Summary:
Also adds two additional tests that check for memory leaks while the relevant graph executors are alive:
- (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive
- (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11544

Reviewed By: apaszke

Differential Revision: D9778479

Pulled By: zou3519

fbshipit-source-id: 2d6cdea81dd1264f2c0396b662f70fdafecb3647
2018-09-12 11:25:18 -07:00
23d55883c0 minor formatting error log (#11528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11528

as title

Reviewed By: chocjy

Differential Revision: D9773214

fbshipit-source-id: b7dd4c19ab83a18f344de8e71ce5b3bf74d1af72
2018-09-12 11:25:17 -07:00
6398d626f4 Warn that export+import module always load onto the CPU (#11485)
Summary:
Test Plan
`cd docs && make html`
![image](https://user-images.githubusercontent.com/5652049/45325074-ed04e480-b51d-11e8-9d2d-685dbe8a08e9.png)

cc zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11485

Differential Revision: D9772119

Pulled By: zou3519

fbshipit-source-id: 3dcb16c9edc2e8deebef17accf91a1c7d4dc9063
2018-09-12 10:55:39 -07:00
12f4c46eea caffe2::StorageImpl use at::DataPtr (#11282)
Summary:
See title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11282

Reviewed By: ezyang

Differential Revision: D9658503

Pulled By: cpuhrsch

fbshipit-source-id: 42fa73c979692cb1069c0345744a85d12150745c
2018-09-12 09:39:23 -07:00
e5dd77c7ad Sync all libnccl soversions, not just libnccl.so.1 (#11575)
Summary:
Fixes:

```
/bin/ld: warning: libnccl.so.1, needed by /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so, not found (try using -rp
ath or -rpath-link)
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather'
/data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce'
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11575

Differential Revision: D9789956

Pulled By: ezyang

fbshipit-source-id: 63e48763cc233be9d137cec721b239159b511a24
2018-09-12 09:24:51 -07:00
f0a284502a Document BatchNorm and update default behavior (#11484)
Summary:
This PR:

1. Documents `BatchNorm`,
2. Makes a number of API changes after reconsidering some quirks:
    1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful.
    2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed.
3. Adds tests for `BatchNorm`, finally.

ebetica apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484

Reviewed By: pjh5

Differential Revision: D9779618

Pulled By: goldsborough

fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459
2018-09-12 09:09:53 -07:00
6fc18a7541 Typo fix in randomness.rst (#11571)
Summary:
"need to be" -> "need not be"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11571

Differential Revision: D9786001

Pulled By: soumith

fbshipit-source-id: 7cc408f5c8bfcc56d4b5c153646f30e1cec37539
2018-09-12 08:25:46 -07:00
efc0f6784a Move some bmm/baddbmm to ATen (#11292)
Summary:
- Incorporates MKL addition by mingfeima  Thank you! (but all errors are my own)
- Native CPU implementation: defer to matrix multiplication for
  small batches and parallelize over batch dimension for large
  batches.
- Add bmm test for CUDA just to be sure.

This is a partial fix for #10661, getting down to a factor ~5.
Considerable overhead is incurred for the setup in einsum. It might
be more efficient to eventually define an optimized contraction
functions for arbitrary and several dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11292

Differential Revision: D9784941

Pulled By: ezyang

fbshipit-source-id: f6dded2c6f5e8f0461fb38f31f9a824992a58358
2018-09-12 07:09:55 -07:00
76070fe73c Make c10d test work on CPU only build (#11567)
Summary:
Make test work with CPU only build, also fixed the test failures for a long time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11567

Differential Revision: D9785740

Pulled By: teng-li

fbshipit-source-id: 61c43b758c1ee53117e30de8074583e6faea863a
2018-09-12 01:39:44 -07:00
6597779847 Clean up some C++ cruftiness in the script lexer.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11408

Differential Revision: D9772843

Pulled By: resistor

fbshipit-source-id: 07f16bf7eaf4f1d8700e46e91a485de4b2d9ed83
2018-09-11 23:55:31 -07:00
3e3d8caecd Allow setting deletion constant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529

Differential Revision: D9775398

Pulled By: goldsborough

fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665
2018-09-11 23:11:46 -07:00
6dcdbd3a1d Make C10d support CPU only build (#11513)
Summary:
This makes torch.distributed works for CPU only build.

Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513

Differential Revision: D9784546

Pulled By: teng-li

fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7
2018-09-11 22:10:34 -07:00
90e31f4896 Improve tracer warnings (#11545)
Summary:
Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would *always* happen in the hot path and be for nothing.

A sample of how the new warnings look like:
```
tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho
n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  int(x)
tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun
ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus
e the trace to be incorrect.
  torch.tensor([1.])
tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t
he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha
nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  torch.split(y, 2, dim=1)[0].add_(2)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545

Differential Revision: D9782975

Pulled By: apaszke

fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9
2018-09-11 22:10:32 -07:00
62c9d4ac96 Make .to() methods native functions (to fix JIT tracing)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491

Differential Revision: D9771121

Pulled By: apaszke

fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7
2018-09-11 21:55:42 -07:00
a00fa2c614 Release GIL when calling into JIT interpreter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541

Differential Revision: D9777909

Pulled By: apaszke

fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec
2018-09-11 21:55:40 -07:00
1a246c9c7e guard spurious cudnn.h include (#11562)
Summary:
This fixes the build when CuDNN was not found on the system.

From the `git blame`, it looks like the bug has been around for 2 years :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562

Differential Revision: D9784589

Pulled By: soumith

fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4
2018-09-11 21:09:54 -07:00
a11ebfa195 Add explicit "this->" for nvcc. (#11196)
Summary:
Fix #11195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11196

Differential Revision: D9737625

Pulled By: ezyang

fbshipit-source-id: fb62076f005bd619eba53c0ed3f07683633f6d91
2018-09-11 21:09:52 -07:00
8aa8ad8b01 WIP: Reproducibility note (#11329)
Summary:
This adds a Note on making experiments reproducible.

It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements.

I'm not sure what to do about the submodule changes. Please advise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329

Differential Revision: D9784939

Pulled By: ezyang

fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e
2018-09-11 21:09:51 -07:00
b75c32ded9 link against TORCH_CUDA_LIBRARIES
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11475

Differential Revision: D9784616

Pulled By: anderspapitto

fbshipit-source-id: bb8b443bcb308bbbe9707d265f21e5d00d717d65
2018-09-11 20:39:53 -07:00
f4d9f39a94 Test libtorch on cuda
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11554

Differential Revision: D9784772

Pulled By: goldsborough

fbshipit-source-id: c3e071695f56c1f427984f427b1f7722722947d3
2018-09-11 20:39:51 -07:00
35348dab10 WIP: Include note on cudnn determinism in each function backed by cudnn (#11434)
Summary:
Ping ezyang
This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong?

Once #11329 is merged it might make sense to link to the reproducibility note everywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434

Differential Revision: D9751208

Pulled By: ezyang

fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9
2018-09-11 20:27:09 -07:00
54107ae8cf convert output_device at data_parallel from torch.device to index (#10189)
Summary:
- fixes #9984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189

Differential Revision: D9545390

Pulled By: weiyangfb

fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e
2018-09-11 20:27:07 -07:00
045f862574 Use torch::nn::init::xavier_normal_
Summary: The PyTorch C++ API has `torch.nn.init` equivalents that the RNNG can use to initialize the state of its StackRNNs. This gets rid of the `fanInOut_` methods on `Parser` and tidies up `xavierInitialState` a little.

Reviewed By: wowitsmrinal

Differential Revision: D9472595

fbshipit-source-id: c202116f32383d3b4bba064c2c0d2656311e1170
2018-09-11 20:27:06 -07:00
d95fedb436 Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458)
Summary:
This PR does two things:
1. Replaces the implementation of the `Dropout` module with a call to the ATen function,
2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`.

I also replaced the implementation of `dropout3d`  with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly).

ebetica ezyang SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458

Differential Revision: D9756603

Pulled By: goldsborough

fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c
2018-09-11 20:16:12 -07:00
3121c8f526 Update gtest and remove the macro guide on gtest from #11321 (#11417)
Summary:
Last PR seems to have test failures, re-issuing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417

Reviewed By: orionr

Differential Revision: D9784706

Pulled By: Yangqing

fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609
2018-09-11 20:16:08 -07:00
92fd69f256 Split Type into TypeExtendedInterface and Type (#11520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11520

Previously, we had Type which was a catch all interface for all
functions and methods we could possibly want to do dynamic dispatch
on. However, we want to check in a non-autogenerated Tensor class
to ATen/core, and to do this, we must also check in a non-autogenerated
Type class which we can do dispatch on. In principle, we could
put the full Type interface in ATen/core, but this would be
a bad developer experience, since any time you add a new free
function, you'd have to regenerate the checked in Type header.

For a better dev experience, we split Type into a two parts,
Type, which will be checked in (though not in this diff), and
TypeExtendedInterface, which will NOT be checked in. Type contains
just enough methods to let Tensor be defined, and leaves the
rest to TypeExtendedInterface.

Some complications:

- We (very unfortunately) have overloaded virtual methods. Because
of C++'s rules, we cannot move one overload without doing some
extra work to make sure that overload in a superclass and an
overload in a subclass resolve together. I've chosen to resolve
this problem simply by moving ALL overloads of a method which
occurs in Tensor to Type.

- There are some places where we take a type() object and call
a method on it, which is not a Tensor base method. I've eliminated
some where possible, but in other cases calling the method on type
is the ONLY way to invoke it; in that case, I've just inserted
a cast. Further refactoring is necessary.

Reviewed By: gchanan

Differential Revision: D9771708

fbshipit-source-id: c59d39fe919cd6f42be6dca699d474346ea3c614
2018-09-11 20:16:04 -07:00
35d52dbb0e re-enable USE_MPI (#11416)
Summary:
The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem.

Not tested locally - waiting for CI test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416

Reviewed By: mingzhe09088

Differential Revision: D9771694

Pulled By: Yangqing

fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91
2018-09-11 18:26:12 -07:00
bbf54ea37c Ensure .enumerate_support() methods are jittable (#11542)
Summary:
This works around #11535 by avoiding `arange(n, out=x)` and `eye(n, out=x)` in `torch.distributions`. I've confirmed that the `.enumerate_support()` methods are now jittable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11542

Differential Revision: D9777805

Pulled By: apaszke

fbshipit-source-id: fa38f2f1acfc0a289f725fd8c92478573cfdbefb
2018-09-11 18:26:09 -07:00
cda74ac476 fix nested no_grad decorator and with-statement (#11479)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/10858
- allow `no_grad` decorator to apply `with torch.no_grad()` at the correct context
- current behavior:
```
import torch

torch.no_grad()
def nothing(x):
    return x

testin = torch.Tensor([0])
with torch.no_grad():
    print(torch.is_grad_enabled()) # False
    testout = nothing(testin)
    print(torch.is_grad_enabled()) # False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11479

Differential Revision: D9758691

Pulled By: weiyangfb

fbshipit-source-id: 87de2219c6c45f65a2c0406ae152c3ad760be8f2
2018-09-11 17:56:40 -07:00
8b196d671b Allow tracing random functions (only when using default generators) (#11539)
Summary:
Fixes #11504.

zdevito, neerajprad, fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11539

Differential Revision: D9777897

Pulled By: apaszke

fbshipit-source-id: 56983260f5b93da7d5540a6242769ea7bd50eb06
2018-09-11 17:56:39 -07:00
b6b0b5222d fix missing libnccl.so.1 error (#11553)
Summary:
what it says on the tin.

I broke the build in https://github.com/pytorch/pytorch/pull/11487 but contbuild didn't end up catching it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11553

Differential Revision: D9781557

Pulled By: soumith

fbshipit-source-id: 2a1fa314af4b85b5491d74110bfee3d80599aa95
2018-09-11 17:25:58 -07:00
3a39006d38 Fix some more doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531

Differential Revision: D9776541

Pulled By: SsnL

fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7
2018-09-11 16:26:55 -07:00
3a8e39b215 Support load and store between Py_complex and std::complex (#11493)
Summary: Printing for complex numbers requires loading and storing between `Py_complex` and `std::complex`. This patch aims to support this for the plugin.

Differential Revision: D9771808

Pulled By: ezyang

fbshipit-source-id: 024865f1945d63ddb5efc775a35438c8ea06408e
2018-09-11 15:55:11 -07:00
289a8c9b7d Allow train/eval, and non-Tensor arguments to python functions (#11505)
Summary:
This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work.

This also changes the code for calling python functions from script to allow non-tensor inputs/outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505

Differential Revision: D9765466

Pulled By: zdevito

fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69
2018-09-11 15:05:09 -07:00
17776db2ee Add gtest dependency on aten tests. (#11429)
Summary:
ezyang delivering my promise to you :)

Basically, now aten tests can use gtest as part of our test harness unification effort. I also converted one test (atest.cpp) to show how one can do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11429

Reviewed By: ezyang

Differential Revision: D9762934

Pulled By: Yangqing

fbshipit-source-id: 68ec3a748403c6bd88399b1e756200985a4e07e3
2018-09-11 13:39:51 -07:00
4db21a1d8e Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413

LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.

Reviewed By: manojkris, xianjiec

Differential Revision: D9724988

fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
2018-09-11 13:25:35 -07:00
c1dce21fd5 Cuda TensorAccessor (#11373)
Summary:
Provide a TensorAccessor-Like interface for CUDA as discussed in #8366.

Compared to TensorAccessor
- the CUDATensorAccessor copies the sizes and strides while on the host (I didn't implement a host indexing function, though) to enable transfer to the device, on the device, `[]` works like for TensorAccessors,
- instantiation is from TensorAccessors in order to allow using `.accessor<..>`. The drawback is that it you cannot use `auto` for the variable declaration, but the alternative would be a cuda-specific `.accessor`-like function,
- there is a PtrTraits argument to enable `__restrict__`,

Example for the intended use:
```
...
template <typename scalar_t>
__global__ void
apply_homography_2d_kernel(cuda::CUDATensorAccessor<scalar_t, 4> dest_a,
			   cuda::CUDATensorAccessor<scalar_t, 4> src_a,
			   cuda::CUDATensorAccessor<float, 2> transform) {
...
}

template <typename scalar_t>
Tensor apply_homography_2d_template(Tensor& res, const Tensor& image, const Tensor& transform) {
  ...
  cuda::CUDATensorAccessor<scalar_t, 4> image_a(image.accessor<scalar_t, 4>());
  cuda::CUDATensorAccessor<scalar_t, 4> res_a(res.accessor<scalar_t, 4>());
  cuda::CUDATensorAccessor<float, 2> transform_a(transform.accessor<float, 2>());
  auto stream = at::cuda::getCurrentCUDAStream();

  apply_homography_2d_kernel<scalar_t>
    <<<grid, block, 0, stream>>>(res_a, image_a, transform_a);
  return res;
}

...
```

I could use a hint where to put a test for this (e.g. doing a plain vanilla matrix multiplication with a custom kernel) and comparing with the aten mm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11373

Differential Revision: D9735573

Pulled By: ezyang

fbshipit-source-id: 482b218a0d514e19a8b692bbc77c0e37082cfded
2018-09-11 13:09:33 -07:00
c56a7cfc37 More use of AT_CHECK and AT_ERROR (#11457)
Summary: Considering these increase the size of the message stack, I didn't touch the code outside `ATen/native`

Differential Revision: D9754283

Pulled By: soumith

fbshipit-source-id: 04198ec4fd0c4abae09eeba92c493a783408537a
2018-09-11 12:55:09 -07:00
5952acc041 Add "merge to master" step before build in CircleCI (#11443)
Summary:
This PR adds the "merge to master" step before the build step in CircleCI, so that all PR commits are built against master instead of against the PR's branch. Note that all PRs still need to rebase to master to pick up this new config, so it won't apply to old PR branches retroactively.

To check in CI: make sure it's performing the git merge to master appropriately in "Merge Onto Master" step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11443

Differential Revision: D9775628

Pulled By: yf225

fbshipit-source-id: 8083db6b098d234a44ae4481f40a486e9906f6f8
2018-09-11 12:39:37 -07:00
fbc17321fd Update pybind11 to fix Python 3.7 support for script (#11473)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11419

In particular pulling in https://github.com/pybind/pybind11/pull/1454
as well as pending bugfix in https://github.com/pybind/pybind11/pull/1517 (documenting in comment)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11473

Differential Revision: D9776003

Pulled By: jamesr66a

fbshipit-source-id: a225dcfb66c06bcae98fd2508d9e690c24be551a
2018-09-11 12:39:36 -07:00
781737f84c Remove time prefix from rsync (#11525)
Summary:
This fails with zsh saying "time: command not found".

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11525

Differential Revision: D9772522

Pulled By: apaszke

fbshipit-source-id: b80d108fa6b174d68ada08a9fdbf7260ee37e08f
2018-09-11 12:10:24 -07:00
a566bc2f11 Disable all CircleCI jobs (#11523)
Summary:
Disable all CircleCI jobs until we are ready to move forward with them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11523

Differential Revision: D9774462

Pulled By: yf225

fbshipit-source-id: c5724e71eb68bac4df958b4f7bcc380050668b3c
2018-09-11 11:25:17 -07:00
d09041bd81 Add an option to statically link cuda (#10596)
Summary:
Need to link CUDA statically for benchmarking purpose.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10596

Reviewed By: llyfacebook

Differential Revision: D9370738

Pulled By: sf-wind

fbshipit-source-id: 4464d62473e95fe8db65b0bd3b301f262bf269bf
2018-09-11 11:09:29 -07:00
727a4453aa New Serialization Proto
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11166

Reviewed By: mingzhe09088

Differential Revision: D9623522

Pulled By: houseroad

fbshipit-source-id: f21153034a398de7959404321d8534234cd58a40
2018-09-11 10:55:43 -07:00
f80f15866b Get rid of manual dispatch on Type. (#11486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11486

I discovered these by narrowing the interface on Type, and then
fixing call sites outside of core plumbing code which depended
on these methods being provided.

Reviewed By: cpuhrsch

Differential Revision: D9757935

fbshipit-source-id: 3abda0c98919a448a326a757671d438964f6909f
2018-09-11 10:40:22 -07:00
01c7542f43 Use -isystem for system includes in C++ extensions (#11459)
Summary:
I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I`

I hope this works on Windows.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459

Differential Revision: D9764444

Pulled By: goldsborough

fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235
2018-09-11 10:40:20 -07:00
d32b41003a Copy protos on install same as develop (#11517)
Summary:
This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop.

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517

Differential Revision: D9771911

Pulled By: orionr

fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9
2018-09-11 10:09:56 -07:00
deac304b6b Bugfix for basic slicing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11428

Differential Revision: D9753999

Pulled By: jamesr66a

fbshipit-source-id: cfc4163a5a06b41beb808a4e24650d71f5d91f4f
2018-09-11 09:39:29 -07:00
4e8d9a4a58 Introducing python setup.py rebuild develop (#11487)
Summary:
This speeds up incremental builds by doing the following changes:

- Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy"
- Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`.
   *Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun*

Current no-op rebuild speedup:
- 1m 15s -> 20s

There are some lingering bugs. No-op rebuilds rerun `cmake`  for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild).

So what you see

```
python setup.py rebuild develop    # first time - ~5 mins
python setup.py rebuild develop    # second time - ~3 mins
python setup.py rebuild develop    # third time - ~2 mins
python setup.py rebuild develop    # fourth time - ~20 seconds
python setup.py rebuild develop    # fifth time - ~20 seconds
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487

Differential Revision: D9769087

Pulled By: soumith

fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5
2018-09-11 08:56:25 -07:00
31850163ac Remove separate ATen build target (#11488)
Summary:
ATen has had a separate build target in the past, but with our move to a root-level CMakeLists.txt file this makes less sense and is harder to maintain. Also, as we blend code between Caffe2 and ATen this will become even less maintainable.

Talked to ezyang about this, but also cc zdevito, Yangqing, and soumith. If this is too difficult, I will revert, but want to see if we can simplify for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11488

Differential Revision: D9770266

Pulled By: orionr

fbshipit-source-id: c7ba52a1676d84e2d052dad4c042b666f49451cd
2018-09-11 08:56:23 -07:00
de460c7ad3 Improvements on conv/pool/fold/stft/ParamDict docs (#11106)
Summary:
Also fixes some incorrect formula rendering.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106

Differential Revision: D9752433

Pulled By: SsnL

fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21
2018-09-11 08:56:21 -07:00
86ab92b0a9 Move TensorImpl / UndefinedTensor(Impl) to core (#11441)
Summary:
Moves TensorImpl to core.
Renames UndefinedTensor to UndefinedTensorImpl and moves to core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11441

Differential Revision: D9736620

Pulled By: gchanan

fbshipit-source-id: 0322ae3b903e338de253b35a0d74a9d3e219204b
2018-09-11 07:45:56 -07:00
80fa8e1007 Add .expand() method to distribution classes (#11341)
Summary:
This adds a `.expand` method for distributions that is akin to the `torch.Tensor.expand` method for tensors. It returns a new distribution instance with batch dimensions expanded to the desired `batch_shape`. Since this calls `torch.Tensor.expand` on the distribution's parameters, it does not allocate new memory for the expanded distribution instance's parameters.

e.g.
```python
>>> d = dist.Normal(torch.zeros(100, 1), torch.ones(100, 1))
>>> d.sample().shape
  torch.Size([100, 1])
>>> d.expand([100, 10]).sample().shape
  torch.Size([100, 10])
```

We have already been using the `.expand` method in Pyro in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py#L10) of `torch.distributions`. We use this in our models to enable dynamic broadcasting. This has also been requested by a few users on the distributions slack, and we believe will be useful to the larger community.

Note that currently, there is no convenient and efficient way to expand distribution instances:
 - Many distributions use `TransformedDistribution` (or wrap over another distribution instance. e.g. `OneHotCategorical` uses a `Categorical` instance) under the hood, or have lazy parameters. This makes it difficult to collect all the relevant parameters, broadcast them and construct new instances.
 - In the few cases where this is even possible, the resulting implementation would be inefficient since we will go through a lot of broadcasting and args validation logic in `__init__.py` that can be avoided.

The `.expand` method allows for a safe and efficient way to expand distribution instances. Additionally, this bypasses `__init__.py` (using `__new__` and populating relevant attributes) since we do not need to do any broadcasting or args validation (which was already done when the instance was first created). This can result in significant savings as compared to constructing new instances via `__init__` (that said, the `sample` and `log_prob` methods will probably be the rate determining steps in many applications).

e.g.
```python
>>> a = dist.Bernoulli(torch.ones([10000, 1]), validate_args=True)

>>> %timeit a.expand([10000, 100])
15.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit dist.Bernoulli(torch.ones([10000, 100]), validate_args=True)
11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

cc. fritzo, apaszke, vishwakftw, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11341

Differential Revision: D9728485

Pulled By: soumith

fbshipit-source-id: 3b94c23bc6a43ee704389e6287aa83d1e278d52f
2018-09-11 06:56:18 -07:00
120d769432 Add support for tracing strings (#11506)
Summary:
This enabled `torch.einsum` both in tracing and in script mode. It's used all over Pyro at the moment, and is needed for any use of the JIT in there.

Fixes #11157.

zdevito fritzo neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11506

Differential Revision: D9764787

Pulled By: apaszke

fbshipit-source-id: 9b5251b9e7c5897034602bd07ff67b425d33326c
2018-09-11 06:02:41 -07:00
0ddbe668cd Improve shape analysis to cover all most commonly used ops (#11358)
Summary:
[Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358

Differential Revision: D9753693

Pulled By: apaszke

fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c
2018-09-11 06:02:39 -07:00
f84693efa9 nomnigraph - Improvements to subgraph matching APIs (#11418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418

Several improvements that aim to make the APIs more straightforward to use

- Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API

- Get rid of operatorSubgraph helper method

- invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion.

- additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity).

- Tests are updated to reflect the changes

Follow up changes:
- Possibly clean up the tests further. This change aims to minimally modify the unit tests.
- Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria.
- Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference.

Differential Revision: D9732565

fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1
2018-09-11 04:39:27 -07:00
3d5fd12488 Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450)
Summary:
This is the new documentation for c10d release, and it also deprecates the old torch.distributed document.

This PR depends on https://github.com/pytorch/pytorch/pull/11405

and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450

Differential Revision: D9765504

Pulled By: teng-li

fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db
2018-09-11 02:10:28 -07:00
0988bbad2d C10d release to torch.distributed for PT1 (#11405)
Summary:
The old `torch.distributed` will go to `torch.distributed.deprecated`
The old DDP will go to `torch.nn.parallel.deprecated`

Now `torch.nn.parallel.DDP` will use c10d DDP
Now `torch.distributed` will use C10d frontend API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405

Reviewed By: pietern

Differential Revision: D9733733

Pulled By: teng-li

fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08
2018-09-10 23:27:22 -07:00
b14a80553d Ignore functional doc error
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11508

Differential Revision: D9764380

Pulled By: goldsborough

fbshipit-source-id: 3abb9c04f46137be833ea26d67734741e14f8010
2018-09-10 20:55:48 -07:00
f9d12eeb27 Give copy an optional device argument.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11497

Differential Revision: D9762014

Pulled By: gchanan

fbshipit-source-id: 996419cc5e86d000af953d030ff361adafb921ad
2018-09-10 20:40:03 -07:00
dd8defeb3f Document the Functional module (#11460)
Summary:
Document the `Functional` module in the C++  API.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11460

Differential Revision: D9757555

Pulled By: goldsborough

fbshipit-source-id: 15f8bf6d60bd26f3f4e69fb8e414e186e3c220ee
2018-09-10 19:58:38 -07:00
9cfdf0d677 Document the Embedding module (#11469)
Summary:
ebetica soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11469

Differential Revision: D9757547

Pulled By: goldsborough

fbshipit-source-id: a95673abe949bb81d716dbc03c5c3e2a11cc15d3
2018-09-10 18:25:08 -07:00
a175282776 Flags for LMDB, LevelDB, and Caffe2 ops (#11462)
Summary:
Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with

```
USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps
```

Also add a flag to build Caffe2 ops, which is default `ON`. Disable with

```
NO_CAFFE2_OPS=1 python setup.py build_deps
```

cc Yangqing soumith pjh5 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462

Reviewed By: soumith

Differential Revision: D9758156

Pulled By: orionr

fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63
2018-09-10 17:27:50 -07:00
e1e69446f6 Lockdown NO_TEST=1 for tests even more (#11415)
Summary:
Skip torch tests as well when NO_TEST=1 environment variable is set. Also remove the separate ATen code path for not being built with Caffe2, since it will always be built with Caffe2.

cc The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11415

Reviewed By: soumith

Differential Revision: D9758179

Pulled By: orionr

fbshipit-source-id: e3e3327364fccdc57a703aeaad8c4f30452973fb
2018-09-10 17:27:48 -07:00
3e49a69466 Resolve ambiguity when including both caffe2 and aten registries (#11411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11411

Simple fix

Reviewed By: goldsborough

Differential Revision: D9730371

fbshipit-source-id: f841327c01faa13cfb6b7fc6e279b8fc50fad1db
2018-09-10 17:27:46 -07:00
3ad67c60f0 Traceable explicit Variable instantiation (#11463)
Summary:
There's a bunch of legacy code where people are explicitly instantiating Variable, and these call-sites have thus far been untraceable (appearing as prim::Constant nodes with the tensor value at the time of tracing). This makes it so that the new variable inherits the traced Value* from the tensor it's being constructed from
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11463

Differential Revision: D9756529

Pulled By: jamesr66a

fbshipit-source-id: da99c6a7621957a305f2699ec9cb9def69b1b2d7
2018-09-10 17:03:24 -07:00
f2f43ad2da Add new LengthsSplit operator (#10974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291

This new operator will do the following:

Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:

1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)

Reviewed By: bddppq, chocjy

Differential Revision: D9013119

fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
2018-09-10 15:40:28 -07:00
0b78ae86c5 Cleanup byte swapping utilities to generate optimal code on the platforms we care about. (#11394)
Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
>         ldrb    w8, [x0, #3]
>         ldrb    w9, [x0, #2]
>         bfi     w8, w9, #8, #8
>         ldrb    w9, [x0, #1]
>         bfi     w8, w9, #16, #8
>         ldrb            w9, [x0]
>         bfi     w8, w9, #24, #8
>         mov      x0, x8
>         ret

To this:
>         ldr             w8, [x0]
>         rev     w0, w8
>         ret
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394

Reviewed By: SsnL

Differential Revision: D9728659

Pulled By: resistor

fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
2018-09-10 15:40:24 -07:00
a0d4106c07 Integrate custom op tests with CI (#10611)
Summary:
This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them.

I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think?

Also ezyang for CI changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611

Differential Revision: D9597627

Pulled By: goldsborough

fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3
2018-09-10 15:40:21 -07:00
3e665cc29b Improve support for tracing sizes, add more tracer warnings (#11288)
Summary:
Many constructors like `torch.zeros` or `torch.randn` didn't support
size tracing correctly which is fixed by this pass. Same issue has been
fixed in legacy tensor constructors.

Additionally, new tensor constructors, which do not participate in
tracing (most notably `torch.tensor`, `torch.as_tensor` and
`torch.from_numpy`) raise a warning when they are used.

Finally, entering a traceable operation disables the tracing in its body.
This is needed because

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288

Reviewed By: ezyang

Differential Revision: D9751183

Pulled By: apaszke

fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f
2018-09-10 15:22:48 -07:00
70d93f4777 Check for maximum numel in NCCL broadcasting (#11466)
Summary:
NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit.

No test because I don't think our test suite should broadcast big tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466

Differential Revision: D9754753

Pulled By: SsnL

fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685
2018-09-10 14:39:15 -07:00
35008e0a1a Add flags to fix half comparison and test (#11395)
Summary:
The controller you requested could not be found.  found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue.

Reported in https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333
Related: https://github.com/pytorch/tutorials/pull/292

soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11395

Differential Revision: D9725102

Pulled By: goldsborough

fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad
2018-09-10 14:10:21 -07:00
18e5fd36c2 Normalize gradients before reduction in DistributedDataParallelC10d (#11109)
Summary:
Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11109

Differential Revision: D9594708

Pulled By: myleott

fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338
2018-09-10 13:55:09 -07:00
ea0ee77c61 Fix katex math rendering (#11472)
Summary:
I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472

Differential Revision: D9755328

Pulled By: SsnL

fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74
2018-09-10 12:40:23 -07:00
198ade74f9 Remove manual refcounting from Tensor class (#11294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11294

The Tensor(ptr, retain) constructor is error prone and circumvents the intrusive_ptr safety.

This diff removes that and pushes the responsibility to callers.
Step by step, manual refcounting can be pushed back and possibly eliminated in the end.

Reviewed By: ezyang

Differential Revision: D9663476

fbshipit-source-id: 7f010e5e47b137a9575960201c5bf5d552c5c2f5
2018-09-10 12:40:21 -07:00
b0c1397271 Fix intrusive_ptr move/copy for different NullType's (#11260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11260

This is needed to make something like this work:

    intrusive_ptr<TensorImpl, UndefinedTensorImpl> a = make_intrusive<SparseTensorImpl>(...);

Reviewed By: ezyang

Differential Revision: D9652089

fbshipit-source-id: 19c65e98460ccb27bc69e36d7e558cb9d6e67615
2018-09-10 12:40:20 -07:00
252f93df09 Improve Tensor() constructor (#11258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11258

The two intrusive_ptr constructors in Tensor can be combined into one implementation that does both, moving and copying.

Reviewed By: ezyang

Differential Revision: D9652088

fbshipit-source-id: 5efca02654ba305c99c20bbeb83551469d17a51d
2018-09-10 12:40:19 -07:00
09292f2c03 Some improvements to IValue (#11238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11238

- when moving an IValue, free the old value instead of keeping it allocated
- making classes final
- moving std::string
- making ConstantList const

Reviewed By: ezyang

Differential Revision: D9644700

fbshipit-source-id: ab7228368e4f00f664ba54e1242b0307d91c5e7e
2018-09-10 12:40:17 -07:00
ce6906b051 Narrowing Blob (#11167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167

Narrow the Blob API as preparation for merging Blob/IValue

- get rid of templated IsType and Operator::InputIsType / OutputIsType
- Use 'using' instead of 'typedef' for DestroyCall (just for readability)

Reviewed By: ezyang

Differential Revision: D9623916

fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8
2018-09-10 12:40:16 -07:00
040d75d455 Add option to use CUDA memory leak testing as a context manager (#11380)
Summary:
cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11380

Reviewed By: ezyang

Differential Revision: D9705877

Pulled By: zou3519

fbshipit-source-id: 02470c25236f57fa02f4ac9d7ed63d38a6355db2
2018-09-10 12:40:15 -07:00
2158f4a9c8 add export import test to TestJitGenerated (#10982)
Summary:
Checking assertExportImport for all of the generated test jit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10982

Differential Revision: D9636935

Pulled By: eellison

fbshipit-source-id: f3f1ce77d454848098f2ac7e0fa18bf8564890be
2018-09-10 11:37:05 -07:00
cee743f639 Move backward/set_data to Type-based dispatch.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11440

Differential Revision: D9736565

Pulled By: gchanan

fbshipit-source-id: 1e66f54f1c87084f37c0b014030f0d6d2f8dfaee
2018-09-10 08:40:29 -07:00
87a9a8f80a Use AT_CHECK and AT_ERROR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11444

Differential Revision: D9736992

Pulled By: SsnL

fbshipit-source-id: bf5320e878c6ef71468f3e2aa12ce304b92d45ca
2018-09-09 21:26:12 -07:00
560d6efd3a Only join started dataloader workers (#11432)
Summary:
`Process.start()` actually take some time as it needs to start a
process and pass the arguments over via a pipe. Therefore, we
only add a worker to self.workers list after it started, so
that we do not call `.join()` if program dies before it starts,
and `__del__` tries to join it but will get:
    AssertionError: can only join a started process.

Example trace when such error happens:
```py
[unrelated]
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__
    return _DataLoaderIter(self)
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__
    w.start()
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
KeyboardInterrupt
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60>
Traceback (most recent call last):
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__
    self._shutdown_workers()
  File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers
    w.join()
  File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join
    assert self._popen is not None, 'can only join a started process'
AssertionError: can only join a started process
```

No test because hard to reliably trigger.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432

Reviewed By: ezyang

Differential Revision: D9735430

Pulled By: SsnL

fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351
2018-09-09 12:55:51 -07:00
87b2f05a9c Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435)
Summary:
Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435

Differential Revision: D9736396

Pulled By: soumith

fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f
2018-09-09 11:40:25 -07:00
581099a7b2 pybind conversion for IntList (#11425)
Summary:
as discussed with ezyang and slayton58 , this might be a nice convenience to be able to use code in extensions just as in ATen.

also split off `tracing_state.h` from `torch/jit/tracer.h` fix #11204 to bee able to use the utility functions

pytorchbot  it's not a jit patch per se.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11425

Differential Revision: D9735556

Pulled By: ezyang

fbshipit-source-id: 466c92bbdb1d7d7a970eba1c26b7583fe9756139
2018-09-09 10:39:40 -07:00
ee4309a9ac override BUILD_TEST when building gloo (#11431)
Summary:
A recent build regression is that we need a system GoogleTest for builds to pass.

This was because, when building with Gloo, gloo is trying to build it's own tests, which look for system gtest [here](https://github.com/facebookincubator/gloo/blob/master/cmake/Dependencies.cmake#L72-L80) (because we're not using full cmake build and making it aware of third_party/GoogleTest, but instead, we are building it isolated using tools/build_pytorch_libs.sh

Traditionally, we didn't ask Gloo to build it's tests, but because we added `-DBUILD_TEST=1` by default to all builds (in refactoring variable names), we accidentally started asking Gloo to build it's tests.

This PR overrides the Gloo flags and asks it to not build tests (like it used to)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11431

Differential Revision: D9736387

Pulled By: soumith

fbshipit-source-id: 59e84edae780123b793bdaea5fd9ac46156cd0af
2018-09-09 10:11:56 -07:00
1b94f5c6e6 optimize masked_fill on CPU (#11359)
Summary:
This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU.

the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores),
 it runs `4.20` sec without the PR and `0.11` sec with the PR.

```python
import torch
import random
from time import time

size = 10 * 1000 * 1000
count = 100

def test_masked_fill():
    dst = torch.randn(size)
    dst_ = dst.clone()
    mask = torch.rand(size).mul(2).floor().byte()
    val = random.random()

    tstart = time()
    for i in range(count):
        dst.masked_fill_(mask, val)
    tend = time()
    print("masked_fill_: %f" % (tend-tstart))

    for i in range(size):
        if mask[i]:
            if dst[i] != val:
                print("fail")
        else:
            if dst[i] != dst_[i]:
                print("fail1")
    print("test_masked_fill: PASS")

test_masked_fill()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11359

Differential Revision: D9735578

Pulled By: ezyang

fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4
2018-09-09 00:25:26 -07:00
b7ecf035dc Updates FindCUDA.cmake to 3.12.2 upstream version (#11406)
Summary:
This PR is just a copy-paste of the upstream FindCUDA.cmake. Since, cublas_device is deprecated in CUDA >= 9.2, this change is necessary for build.

Related: https://gitlab.kitware.com/cmake/cmake/merge_requests/2298
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11406

Differential Revision: D9735563

Pulled By: ezyang

fbshipit-source-id: c74d86ced7cc485cb2233f9066ce23e921832c30
2018-09-08 23:10:32 -07:00
6683fb56ca Add AVX optimizations for pdist (#11230)
Summary:
Added AVX optimizations for pdist using Vec256. This brings single threaded performance up to speed with scipy, but the current implementation greatly hurts performance without AVX enabled. Is there a way to special case out AVX on dispatch and call the non Vec256 code? Or is the way I used Vec256 completely wrong?

Single threaded comparison to scipy
============================

This is the time to compute the pdist of a 2048 x 2048 float matrix with only one thread for various values of p between torch and scipy. p = 3 is the code path for arbitrary p, and so is much slower than the other values.

p | torch | scipy
-----|-----------|------
0 | 6.27 s ± 393 ms | 7.23 s ± 498 ms
1 | 5.49 s ± 201 ms | 43.4 s ± 1.09 s
2 | 5.74 s ± 474 ms | 53.8 s ± 3.52 s
∞ | 5.59 s ± 292 ms | 47.4 s ± 2.03 s
3 | really slow | gave up

Result by AVX support
================

This is the time to compute the distance and gradient of a 2048 x 2048 float matrix with all threads by AVX support. `before` is the old code, `default` is no AVX support, etc. Interestingly the AVX optimizations provided a great benefit over the old unoptimized code, but drastically hurt performance when compiled without AVX optimizations. p = 3 is the code path for arbitrary p, and so is much slower than the other values.

Results for p = 0
----------------

avx | dist | grad
----|------|-----
before | 514 ms ± 87.5 ms | 191 µs ± 35 µs
default | 3.47 s ± 183 ms | 201 µs ± 24.6 µs
avx | 123 ms ± 18.2 ms | 281 µs ± 130 µs
avx2 | 103 ms ± 11.4 ms | 216 µs ± 74.4 µs

Results for p = 1
----------------

avx | dist | grad
----|------|-----
before | 426 ms ± 35 ms | 6.21 s ± 187 ms
default | 2.6 s ± 123 ms | 5.62 s ± 273 ms
avx | 104 ms ± 6.37 ms | 833 ms ± 44.3 ms
avx2 | 106 ms ± 3.59 ms | 924 ms ± 86.2 ms

Results for p = 2
-----------------

avx | dist | grad
----|------|-----
before | 425 ms ± 45.4 ms | 6.31 s ± 125 ms
default | 3.04 s ± 187 ms | 3.55 s ± 242 ms
avx | 110 ms ± 3.66 ms | 896 ms ± 21.8 ms
avx2 | 113 ms ± 4.68 ms | 934 ms ± 25.2 ms

Results for p = ∞
------------------

avx | dist | grad
----|------|-----
before | 501 ms ± 39.5 ms | 6.64 s ± 321 ms
default | 2.15 s ± 92.9 ms | 8.43 s ± 355 ms
avx | 104 ms ± 5.52 ms | 835 ms ± 36.7 ms
avx2 | 100 ms ± 3.41 ms | 864 ms ± 67 ms

Results for p = 3
-----------------

avx | dist | grad
----|------|-----
before | 22.6 s ± 413 ms | 11.1 s ± 242 ms
default | 24.9 s ± 1 s | 11.2 s ± 293 ms
avx | 2.69 s ± 148 ms | 5.63 s ± 88.4 ms
avx2 | 2.48 s ± 31.8 ms | 5.61 s ± 114 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11230

Differential Revision: D9735503

Pulled By: erikbrinkman

fbshipit-source-id: a9da619249e4ca2625b39ca1ca7f5543c3086bfb
2018-09-08 22:55:02 -07:00
538ea67437 Search for CMake config files for pybind11. (#11423)
Summary:
If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423

Differential Revision: D9735557

Pulled By: ezyang

fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569
2018-09-08 22:44:03 -07:00
02114e877f fix #10838 incorrect bidirectional output format (#11368)
Summary:
Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368

Differential Revision: D9734814

Pulled By: soumith

fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f
2018-09-08 17:09:57 -07:00
ac9268f25d Conversions to and from complex numbers. (#11420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420

Surprisingly tricky!  Here are the major pieces:

- We grow a even yet more ludicrous macro
  AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF
  which does what it says on the tin.  This is because I was
  too lazy to figure out how to define the necessary conversions
  in and out of ComplexHalf without triggering ambiguity problems.
  It doesn't seem to be as simple as just Half.  Leave it for
  when someone actually wants this.

- Scalar now can hold std::complex<double>.  Internally, it is
  stored as double[2] because nvcc chokes on a non-POD type
  inside a union.

- overflow() checking is generalized to work with complex.
  When converting *to* std::complex<T>, all we need to do is check
  for overflow against T.  When converting *from* complex, we
  must check (1) if To is not complex, that imag() == 0
  and (2) for overflow componentwise.

- convert() is generalized to work with complex<->real conversions.
  Complex to real drops the imaginary component; we rely on
  overflow checking to tell if this actually loses fidelity. To get
  the specializations and overloads to work out, we introduce
  a new Converter class that actually is specializable.

- Complex scalars convert into Python complex numbers

- This probably fixes complex tensor printing, but there is no way
  to test this right now.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: cpuhrsch

Differential Revision: D9697878

Pulled By: ezyang

fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946
2018-09-08 16:39:43 -07:00
d3f98b5ffc Add matrix power (#11421)
Summary:
vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it.

I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead.

Note to reviewers: patch was already approved at #10068 .

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421

Differential Revision: D9733407

Pulled By: SsnL

fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599
2018-09-08 15:25:56 -07:00
802380ac93 Improve LegacyTypeDispatch to handle initialization correctly. (#11331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11331

In the previous commit, we added a bare-bones LegacyTypeDispatch in ATen/core.
This is not sufficient for the use cases we need: we not only need to be able to
get a Type, but we also need to be able to *initialize* the Types if its the first time
we have retrieved a CPU/CUDA/Complex type. I hemmed and hawed about how
to do this; the strategy this PR takes is to introduce a new "hooks" interface
specifically for initializing CPU/CUDA/Complex (which still lives in Context). We then
move all "user-friendly" functions to LegacyTypeDispatch.

Here were some other options which I considered, but don't work:
- Assume that Type is already initialized, because we only intend to call Type
  from Tensor methods, where we already have a Tensor. This does not work
  because Caffe2 created tensors will not have gone through the standard
  Type codepath, and will have skipped initialization.
- Move CUDAHooks and ComplexHooks to ATen/core. Besides being sucky,
  this isn't even a complete fix, because I still need to initialize CPU hooks
  (so you *still* need another hooks interface).

Reviewed By: cpuhrsch

Differential Revision: D9666612

fbshipit-source-id: ac7004b230044b67d13caa81fdfaf3c6ab915e3f
2018-09-08 10:10:17 -07:00
9687a72794 Move the type registry out of Context, into LegacyTypeDispatch. (#11274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11274

We don't want to put all of Context into ATen/core, but one
particular part cannot be avoided: the type registry, because
implementations of TensorMethods will need to get a Type,
and then do a virtual call on it.

I needed to do a little bit of (temporary) footwork to get this
in without also moving Type, because unique_ptr<Type> expects
to be able to see the destructor of Type (but it's forward declared
right now).  So instead I put the destructor as an explicit functor.  We
can get rid of this once Type actually moves in ATen/core

Reviewed By: cpuhrsch

Differential Revision: D9657449

fbshipit-source-id: 940931493bf4f1f6a8dad03f34633cacdd63dd0b
2018-09-08 10:10:11 -07:00
b9b9ae935b Make torch.randint have default dtype int64 (#11040)
Summary:
cc gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040

Differential Revision: D9565728

Pulled By: SsnL

fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e
2018-09-08 07:55:06 -07:00
505ecab88d bumping up the default store timeout (#11409)
Summary:
to 300 seconds to be safe. It used to be no timeout in THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11409

Differential Revision: D9731709

Pulled By: teng-li

fbshipit-source-id: 0ce011dcca507cbf063176ad4995405c77dd0cdd
2018-09-07 23:55:23 -07:00
3d2862526b Support send/recv for the gloo process group (#11387)
Summary:
This change removes the skips for the existing send/recv tests in the backwards compatibility layer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11387

Reviewed By: teng-li

Differential Revision: D9729330

Pulled By: pietern

fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3
2018-09-07 20:25:18 -07:00
47c1de25e8 Test exporting batch norm, dropout, RNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11126

Differential Revision: D9727689

Pulled By: jamesr66a

fbshipit-source-id: f142257a2fba27d86844bf33084174f1f68a8ca5
2018-09-07 19:41:39 -07:00
b7a2c91eed remove unnecessary clone() when .grad is None (#11165)
Summary:
Currently gradient is copied into .grad if it is None. This PR aim to remove the copy when it is not absolutely needed.

It is generally an improvement of speed and memory usage. And here is a case it may help a lot:
Normally, people do optimizer.zero_grad() every minibatch before backward. It will translate into a memset, and later a point-wise add.
When there is some large weight in the network, one optimization people can always do is set parameter.grad to None instead of zero_grad. This will remove memset and change point-wise add to a memcpy.
Here is result running following script on V100 GPU. It is 100 iterations of forward/backward/zero_grad on single 1-billion word benchmark size embedding.
`Zero grad: 2.123847723007202`
`None grad: 1.3342866897583008`

With the backend change of this PR, the unnecessary memcpy is removed, thus further speed up is achieved.
`Zero grad: 2.124978542327881`
`None grad: 0.4396955966949463`

[benchmark.txt](https://github.com/pytorch/pytorch/files/2341800/benchmark.txt)

Some details on the code change:
.detach() is used because we need to get rid of new_grad being a view without copy data. This should be safe in first-order only mode.
data need to be contiguous, otherwise `grad_variable.data() += new_grad.data();` below will fail.
Only the last variable that has reference to the temp gradient will grab its buffer.

ngimel, mcarilli  and mruberry helped on finalizing this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11165

Differential Revision: D9728874

Pulled By: soumith

fbshipit-source-id: b8fb822a2dff6e812bbddd215d8e384534b2fd78
2018-09-07 19:41:37 -07:00
c49b01a8a0 Change default variants to 'function'. (#11247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11247

Previously, the default for a declaration in native_functions.yaml
was ['function', 'method'], i.e., generate both a method and
function for every binding.  We now believe this is inappropriate:
the majority of new kernels added to PyTorch should live as
free functions, NOT methods.  Thus, we change the default accordingly.

I also took the opportunity to de-method some "internal" functions
that had a leading underscore.  While, strictly speaking, this is a
BC breaking change, I believe it is highly unlikely anyone was using
these directly.

Reviewed By: yf225

Differential Revision: D9648570

fbshipit-source-id: 8b94647b824e0899d6d18aa5585aaedc9d9957d2
2018-09-07 17:56:08 -07:00
fa522d1aed Revert D9720931: [pytorch][PR] [third-party] Update googletest to release-1.8.1
Differential Revision:
D9720931

Original commit changeset: 18a60d0409e7

fbshipit-source-id: a05dcba71277eb4f8ac38886f307d6cf6e6955a9
2018-09-07 17:42:03 -07:00
c9843bd86b Update googletest to release-1.8.1 (#11388)
Summary:
This is mainly to pick up the change 20074be19a to avoid polluting the CMAKE_DEBUG_POSTFIX variable. cc orionr .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11388

Reviewed By: orionr

Differential Revision: D9720931

Pulled By: Yangqing

fbshipit-source-id: 18a60d0409e74316f74d364f4fe16bf0d0198413
2018-09-07 16:56:16 -07:00
31d36b1d31 move complex registration test out-of-line (#11397)
Summary:
Moves the code for the complex registration code into an out-of-line C++ extension to de-noise the test_cpp_extensions.py file. Let's keep it nice and tidy so we can point our users at it for usage examples.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11397

Differential Revision: D9725335

Pulled By: goldsborough

fbshipit-source-id: 290618f2ee711b1895cdb8f05276034dfe315c6d
2018-09-07 16:56:14 -07:00
4ae16c9ad9 Recursive descent for validation + convert expands in ATen fal… (#11356)
Summary:
…lback
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11356

Differential Revision: D9721002

Pulled By: jamesr66a

fbshipit-source-id: eeb50b56f8a72e929860c5e459a5ab50ac624814
2018-09-07 16:39:36 -07:00
4c8cc36e34 Fix igios build (#11392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11392

Fix igios build

Reviewed By: houseroad

Differential Revision: D9720833

fbshipit-source-id: 33acc3c658c22addd4bad142433824076233e901
2018-09-07 15:55:23 -07:00
4bf5fc44c8 Fix split_size test failures (#11051)
Summary:
~~This PR fixes #8525 by renaming `split_with_sizes` to `split` so that 2 `aten::split` ops are
generated (previously `aten::split(self, int, int)` and `aten::split_with_sizes(self, int[], int)` were generated)~~

~~`split_with_sizes` was made in PR #5443, but I don't see a reason for it to have
a different name than `split` rather than just overload `split`.~~

This PR fixes #8525 by adding `register_special_ops.cpp` to mirror Python dispatching from `split` to `split` and `split_with_sizes` in [tensor.py](https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L279).

It also fixes #8520 by adding an `int[]` wherever it sees `torch.Size`

In a follow up PR this could also be used to fix some of the other `unknown builtin op` test errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11051

Differential Revision: D9582443

Pulled By: driazati

fbshipit-source-id: d27201f85937d72e45e851eaa1460dd3dd1b61a9
2018-09-07 15:39:24 -07:00
9886ebeb24 Remove hardcoded system path from CMAKE_MODULE_PATH (#11386)
Summary:
This seems to be causing different versions of OpenMPI being picked up
by different parts of the build. Not a good practice to include absolute
paths anyway, so let's try removing it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11386

Reviewed By: teng-li

Differential Revision: D9724349

Pulled By: pietern

fbshipit-source-id: 3dfef91c81f2e97e5125284aff9e7e98f8761917
2018-09-07 15:25:38 -07:00
802d21c8f4 Remove FULL_CAFFE2 flag (#11321)
Summary:
Continuing pjh5's work to remove FULL_CAFFE2 flag completely.

With these changes you'll be able to also do something like

```
NO_TEST=1 python setup.py build_deps
```
and this will skip building tests in caffe2, aten, and c10d. By default the tests are built.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321

Reviewed By: mingzhe09088

Differential Revision: D9694950

Pulled By: orionr

fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8
2018-09-07 15:09:44 -07:00
93da5a21c9 Update variable view note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11393

Differential Revision: D9725444

Pulled By: SsnL

fbshipit-source-id: b1607d986ab93e64b0b0ff9e8f10d9e3f6e2160e
2018-09-07 15:09:43 -07:00
77b6d7d255 Doc improvements (#11347)
Summary:
1. Remove cudnn* symbols from C++ docs
2. Fix code examples for `nn::Module` and `jit::compile`
3. Document Dropout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11347

Differential Revision: D9716751

Pulled By: goldsborough

fbshipit-source-id: e0566cec35848335cac3eb9196cb244bb0c8fa45
2018-09-07 14:39:36 -07:00
7de0332e10 Add initial documentation for JIT (#11357)
Summary:
In addition to documentation, this cleans up a few error message formats.
It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs.

The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress.

Followup PRs will polish the docs and fix formatting issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357

Differential Revision: D9721277

Pulled By: zdevito

fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b
2018-09-07 14:27:47 -07:00
69b4b45f91 enable missing nn tests with single grad check, minor refactor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11366

Differential Revision: D9723305

Pulled By: wanchaol

fbshipit-source-id: 9e7e2e7e68cb4919610bccfbf76fa33b647f6eb7
2018-09-07 14:27:46 -07:00
576807ce1a flaky test fix trial (#11391)
Summary:
Add a barrier() to wait for all PG created before destroy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11391

Differential Revision: D9727383

Pulled By: teng-li

fbshipit-source-id: 689d62c978e642b68f4949dcf29982e34869ada4
2018-09-07 14:10:06 -07:00
e9da2dd3cc Do not use PERSISTENT cudnn mode for spatialBN (#11382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11382

We found this cudnn bug in S163230 that causes accuracy loss. We fix this in D9601217, but due to the reimplementation of spatialBN it's overwritten. Let's land this fix again.

Reviewed By: kuttas

Differential Revision: D9702347

fbshipit-source-id: 11547e9edaf7b2ba7f4aa7263ffb4f0281bbf078
2018-09-07 13:41:18 -07:00
01930a3145 Move sync_params to C++ (#9805)
Summary:
The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase.

I also split code into a `.h` and `.cpp` file for better code organization.

The controller you requested could not be found. pietern apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805

Differential Revision: D9688604

Pulled By: goldsborough

fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8
2018-09-07 12:56:40 -07:00
ba6f10343b update CUDAExtension doc (#11370)
Summary:
fix typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11370

Differential Revision: D9701777

Pulled By: soumith

fbshipit-source-id: 9f3986cf30ae0491e79ca4933c675a99d6078982
2018-09-07 12:56:38 -07:00
733402bef4 Fix issues with certain heterogeneous types in lists during tensor creation (#11377)
Summary:
Closes #9963
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377

Differential Revision: D9701824

Pulled By: soumith

fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c
2018-09-07 12:56:35 -07:00
5e400e9cae move context_base.h to ATen/core (#11336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336

Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc`

Reviewed By: ezyang

Differential Revision: D9670493

fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055
2018-09-07 12:20:25 -07:00
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
e80f7e1f64 Fix more warnings (#11320)
Summary:
also a missing space in fft error message
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11320

Differential Revision: D9676012

Pulled By: SsnL

fbshipit-source-id: a636e5fce042198510c8e456fa51fde714da8348
2018-09-07 11:26:58 -07:00
91089a7e17 Add GPU implementation of pdist (#11102)
Summary:
Add the gpu kernel version.

The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case.

Test Plan
---------
```
python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist
```

Current performance specs are a little underwhelming, I'm in the process of debugging.

size | torch | torch cuda | scipy
-----|-------|------------|------
16 x 16 | 9.13 µs ± 3.55 µs | 9.86 µs ± 81.5 ns | 15.8 µs ± 1.2 µs
16 x 1024 | 15 µs ± 224 ns | 9.48 µs ± 88.7 ns | 88.7 µs ± 8.83 µs
1024 x 16 | 852 µs ± 6.03 µs | 7.84 ms ± 6.22 µs | 4.7 ms ± 166 µs
1024 x 1024 | 34.1 ms ± 803 µs | 11.5 ms ± 6.24 µs | 273 ms ± 6.7 ms
2048 x 2048 | 261 ms ± 3.5 ms | 77.5 ms ± 41.5 µs | 2.5 s ± 97.6 ms
4096 x 4096 | 2.37 s ± 154 ms | 636 ms ± 2.97 µs | 25.9 s ± 394 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102

Differential Revision: D9697305

Pulled By: erikbrinkman

fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99
2018-09-07 09:09:46 -07:00
110191e5c7 Remove detach from TensorImpl, handle via Type. (#11337)
Summary:
This is so that TensorImpl does not have to depend on Tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11337

Differential Revision: D9684421

Pulled By: gchanan

fbshipit-source-id: d2af93420ca6d493429c251cfe5a34e9289c4484
2018-09-07 08:55:59 -07:00
52b37d8b66 Move VariableHooksInterface to ATen/core (#11273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11273

This one might strike you as a bit surprising, but it's necessary
to expose this interface in ATen/core, because we need to be
able to get a true Variable type from Variable tensors, and
to do that we need to go through the hooks interface.

Reviewed By: gchanan

Differential Revision: D9656548

fbshipit-source-id: 28bb5aee6ac304e8cd5fa1e4c65452c336647161
2018-09-07 08:11:53 -07:00
396e64fff7 Move ATen/Registry.h to ATen/core/Registry.h (#11270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11270

Still need to deduplicate this with caffe2/core/registry.h,
but this will be a bit tricky because the current formulation
of the macro is namespace sensitive (i.e., the macro for classes
defined in at:: namespace won't work if you call from caffe2::
namespace).

Reviewed By: gchanan

Differential Revision: D9654871

fbshipit-source-id: 2207d1f2cc6d50bd41bf64ce0eb0b8523b05d9d9
2018-09-07 08:11:52 -07:00
b02b125d16 Rename getMaybeVariableType back to getType. (#11250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250

```
codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType
```

Reviewed By: gchanan

Differential Revision: D9648830

fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851
2018-09-07 08:11:50 -07:00
68371b6d2e fast code path when partition=1 which makes LengthsPartition a simple copy (#11351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11351

When partitions == 1 (InputSize() == OutputSize()), LengthsPartition becomes just a copy.

Reviewed By: aazzolini

Differential Revision: D9693409

fbshipit-source-id: a9ea034d227af357b661477ab779a71600f58f58
2018-09-07 08:11:49 -07:00
da4ebc2971 Switch SVD on CPU from gesvd to gesdd (#11194)
Summary:
- Added a note to the doc string for `svd`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194

Differential Revision: D9683250

Pulled By: soumith

fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f
2018-09-07 07:39:57 -07:00
f9595e756e typo/grammar fixes (#11344)
Summary:
Fixes some minor grammar issues in the code base.

PS: I was actually looking for the following one but couldn't find it via grepping in this repo:

![screen shot 2018-09-06 at 3 27 39 pm](https://user-images.githubusercontent.com/5618407/45184280-1e16a980-b1ec-11e8-9cb1-87a96738bdd1.png)

Any idea in which file this issue is raised?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11344

Differential Revision: D9696454

Pulled By: soumith

fbshipit-source-id: 8ffe494b1bf1efb0e35563381d9da2e1e8032a3c
2018-09-06 21:57:14 -07:00
a2afad2b69 Improves ATen CUDAEvent (#11293)
Summary:
After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR:

- Adds set_device() to CUDAContext
- Adds three CUDAEvent tests to stream_test.cpp
- Fixes three bugs:
- Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards).
- Moving an event would cause a segfault.
- Events were not destroyed on the device they were created on. See PR #9415 (pietern)
- Adds the happened() and recordOnce() functions
- Changes the record() functions to not be const
- Adds additional assertions to verify correctness

This PR does not:

- Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR)

Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here:

542aadd9a7/aten/src/ATen/native/cudnn/RNN.cpp (L1130-L1132)

and that file is the only one which currently uses the ATen CUDAEvent.

Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs).

I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR.

apaszke ezyang pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11293

Differential Revision: D9665836

Pulled By: soumith

fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2
2018-09-06 21:39:44 -07:00
b3b1e7624d Optional expand=True kwarg in distribution.enumerate_support (#11231)
Summary:
This adds an optional `expand=True` kwarg to the `distribution.expand_support()` method, to get a distribution's support without expanding the values over the distribution's `batch_shape`.
 - The default `expand=True` preserves the current behavior, whereas `expand=False` collapses the batch dimensions.

e.g.
```python
In [47]: d = dist.OneHotCategorical(torch.ones(3, 5) * 0.5)

In [48]: d.batch_shape
Out[48]: torch.Size([3])

In [49]: d.enumerate_support()
Out[49]:
tensor([[[1., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0.]],

        [[0., 1., 0., 0., 0.],
         [0., 1., 0., 0., 0.],
         [0., 1., 0., 0., 0.]],

        [[0., 0., 1., 0., 0.],
         [0., 0., 1., 0., 0.],
         [0., 0., 1., 0., 0.]],

        [[0., 0., 0., 1., 0.],
         [0., 0., 0., 1., 0.],
         [0., 0., 0., 1., 0.]],

        [[0., 0., 0., 0., 1.],
         [0., 0., 0., 0., 1.],
         [0., 0., 0., 0., 1.]]])

In [50]: d.enumerate_support().shape
Out[50]: torch.Size([5, 3, 5])

In [51]: d.enumerate_support(expand=False)
Out[51]:
tensor([[[1., 0., 0., 0., 0.]],

        [[0., 1., 0., 0., 0.]],

        [[0., 0., 1., 0., 0.]],

        [[0., 0., 0., 1., 0.]],

        [[0., 0., 0., 0., 1.]]])

In [52]: d.enumerate_support(expand=False).shape
Out[52]: torch.Size([5, 1, 5])
```

**Motivation:**
 - Currently `enumerate_support` builds up tensors of size `support + batch_shape + event_shape`, but the values are *repeated* over the `batch_shape` (adding little in the way of information). This can lead to expensive matrix operations over large tensors when `batch_shape` is large (see, example above), often leading to OOM issues. We use `expand=False` in Pyro for message passing inference. e.g. when enumerating over the state space in a Hidden Markov Model. This creates sparse tensors that capture the markov dependence, and allows for the possibility of using optimized matrix operations over these sparse tensors. `expand=True`, on the other hand, will create tensors that scale exponentially in size with the length of the Markov chain.
 - We have been using this in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py) of `torch.distributions` in Pyro. The interface has been stable, and it is already being used in a few Pyro algorithms. We think that this is more broadly applicable and will be of interest to the larger distributions community.

cc. apaszke, fritzo, alicanb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11231

Differential Revision: D9696290

Pulled By: soumith

fbshipit-source-id: c556f8ff374092e8366897ebe3f3b349538d9318
2018-09-06 21:39:42 -07:00
c59c1a25b2 diagnose option: get_entry to print a whole row (#11308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11299

Reviewed By: xianjiec

Differential Revision: D9652844

fbshipit-source-id: 650d550317bfbed0c1f25ae7d74286cfc7c3ac70
2018-09-06 21:26:30 -07:00
2946b021e3 Disable flaky test, see #11360 (#11361)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11361

Reviewed By: yf225

Differential Revision: D9696524

Pulled By: ezyang

fbshipit-source-id: f6801d6f4f34090d467b16810db9cf576d5d519b
2018-09-06 20:40:00 -07:00
3149a72c63 Move TensorOptions.cpp to the correct place in ATen/core (#11244)
Summary:
This actually ended up being a lot more involved than I thought. The basic
problem is that in some of our build environments, thread local state is not
supported. The correct way to test if this is the case is using the
(undocumented) CAFFE2_FB_LIMITED_MOBILE_CAPABILITY macro.

On mobile, OptionGuard is not available, and you have to do everything
by hand. There's a static_assert to check if you accidentally use
OptionGuard in this case and give you a better error message in this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11244

Reviewed By: gchanan

Differential Revision: D9646190

fbshipit-source-id: cf4016f79b47705a96ee9b6142eb34c95abb2bd4
2018-09-06 20:11:39 -07:00
c45607f77f Static assert GetMutable is not passed with Tensor argument (#11323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11323

If you do pass it this, you'll get a pointer to
UndefinedTensor; probably not what you want!

Reviewed By: Yangqing

Differential Revision: D9676205

fbshipit-source-id: 0bd3c22c2c40ac2958f95fc7a73b908af291cf22
2018-09-06 20:11:37 -07:00
0f419abf40 Roll nomnigraph build into caffe2 (#11303)
Summary:
We need to remove nomnigraph from the list of public libraries in order to support libtorch extensions. Easiest way to do this is to include it into the Caffe2 source like all other caffe2/core/ code.

However, because the headers are in a different place, we need to include them for linked libraries (pybind, tests, etc).

On an upside, this means that nomnigraph is now default hidden visibility too.

FYI peterjc123 xkszltl goldsborough bwasti Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11303

Reviewed By: pjh5

Differential Revision: D9694932

Pulled By: orionr

fbshipit-source-id: 5db3eb20bc5ddc873ce9151236b74663fbb33ed8
2018-09-06 19:38:09 -07:00
9de2085806 Use custom hcc/HIP, purge hcSPARSE (#11198)
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198

Differential Revision: D9652340

Pulled By: ezyang

fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
2018-09-06 19:38:07 -07:00
ec5404a449 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888

Add cuda version of SpatialBNOp also optimize SpatialBN on CPU

Reviewed By: houseroad

Differential Revision: D9512435

fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
2018-09-06 18:26:13 -07:00
7726b36489 Full-fledged group testings and fixes for c10d frontend APIs (#11318)
Summary:
Fixed a few bugs that were not tested in the c10d frontend APIs, including
get_rank, get_world_size, and destroy_process_group of a given group.

These APIs are added to the CI tests.

Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths.

Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11318

Reviewed By: pietern

Differential Revision: D9675896

Pulled By: teng-li

fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc
2018-09-06 18:26:11 -07:00
1a01c75dde support gradClipping per blob in mtml (#10776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10776

as title

Reviewed By: chocjy

Differential Revision: D9458099

fbshipit-source-id: f840d4f1542e8180f41cc0732c8468fa43805ab8
2018-09-06 18:10:52 -07:00
c39216f8c4 Automatic update of fbcode/onnx to bff0b8835870c7df7762ef43498d000d2d8ffb52 (#11346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11346

Previous import was 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c

Included changes:
- **[bff0b88](https://github.com/onnx/onnx/commit/bff0b88)**: Add DynamicSlice experimental op (#1377) <James Reed>
- **[91a7b8e](https://github.com/onnx/onnx/commit/91a7b8e)**: statCoverage(model) (#1246) <Akshay Chalana>
- **[36643c6](https://github.com/onnx/onnx/commit/36643c6)**: fix the doc for softmax (#1374) <Lu Fang>
- **[8c64acd](https://github.com/onnx/onnx/commit/8c64acd)**: Silence usused result warning in ONNXIFI wrapper cleanup. Fix #1344 (#1371) <Marat Dukhan>
- **[53b20f6](https://github.com/onnx/onnx/commit/53b20f6)**: Add the ability to deprecate an OpSchema (#1317) <Ryan Hill>
- **[8aec4e2](https://github.com/onnx/onnx/commit/8aec4e2)**: [Anderspapitto patch] fix the shape inference for broadcasting (#1368) <Lu Fang>

Reviewed By: jamesr66a

Differential Revision: D9691533

fbshipit-source-id: 6aff6ce04ade37182e2ffe9bc83eb86846bc722d
2018-09-06 17:39:57 -07:00
4d678790c5 enable advanced indexing with tensors (#10862)
Summary:
On the way to #10774

This PR adds advanced indexing with tensors.
The approach is to desugar advanced indexing into an at::index op.
This is exactly how normal pytorch does it.
[(I used this code as reference)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp)

Supporting sequences is a little tricky because JIT script doesn't have
an easy way to turn arbitrary n-dimensional python lists into a tensor
(it would be easy if we supported `torch.tensor`), so that'll come
in a future PR.

cc jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10862

Differential Revision: D9659449

Pulled By: zou3519

fbshipit-source-id: 56d293720d44c0fd27909e18327ab3985ddfced6
2018-09-06 16:41:45 -07:00
148f7cc47a nomnigraph - nit - fix generated code to be consistent with style (#11343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11343

make the generated classes (OpClasses.h...) consistent with fb cpp code style

Reviewed By: yinghai

Differential Revision: D9689487

fbshipit-source-id: 450e742d2462115d1bf41b9ea88d20df0a842b2b
2018-09-06 16:27:17 -07:00
49231ab0a8 Reimplement storage slicing. (#11314)
Summary:
In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes #10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314

Reviewed By: ailzhang

Differential Revision: D9671966

Pulled By: ezyang

fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
2018-09-06 16:11:59 -07:00
1d406c04ae fix comment on Cost params_bytes (#11190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11190

As discussed with Alexander Sidorov, params_bytes refer to the number of bytes we're reading for parameters, not the size of parameters. They only differ in sparse operators.

Reviewed By: mdschatz

Differential Revision: D9628635

fbshipit-source-id: 9e2aed0cf59388928dc69b8534cf254f0347c9c8
2018-09-06 15:12:22 -07:00
68613cf5a2 Windows DLL build with Caffe2 code (#11266)
Summary:
This is an experimental build on top of what orionr and mingzhe09088 built.

Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266

Reviewed By: orionr

Differential Revision: D9682942

Pulled By: Yangqing

fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3
2018-09-06 15:12:20 -07:00
34c0043aae Force third_party Eigen from setup.py (#11334)
Summary:
We shouldn't use system Eigen in any cases when building with setup.py. If people want to use system Eigen (not from third_party) they can build with CMake for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11334

Reviewed By: pjh5

Differential Revision: D9689450

Pulled By: orionr

fbshipit-source-id: baf616b9f195692942151ad201611dcfe7d927ba
2018-09-06 14:56:53 -07:00
03ca7358af Add unit test for Parallel Spatial Batch Normalization (#11098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098

Added a test for testing CPU version across multiple devices.

Reviewed By: enosair, BIT-silence

Differential Revision: D9584520

fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e
2018-09-06 14:26:56 -07:00
5712fe3297 Fix out-of-boundary conversion issue (#11338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11338

The `min_` and `max_` value of the filler is in `double` format but when we are filling a specific type of tensor, their value can exceed the type limits, resulting in crash. This diff checks the type limits first and if `min_`/`max_` is out of the limits, it will clip it.

Reviewed By: highker

Differential Revision: D9684455

fbshipit-source-id: 6da98a03c57f3296abaddc7c5cfc1c836c611eb0
2018-09-06 13:39:52 -07:00
ec195129ec Adding setTimeout option in Store (#11265)
Summary:
This will allow users to set customized timeout option for the store.

Tested by my own debug print to make sure that C++ actually used the timeout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11265

Differential Revision: D9666164

Pulled By: teng-li

fbshipit-source-id: 4eb6441783da106a3fd59b95457e503e83e4640f
2018-09-06 12:55:50 -07:00
fef52cc1f8 Add resolver for 'torch' module (#10847)
Summary:
This lets you compile builtin functions from C++ without having a dependence on Python

```cpp
auto module = torch::jit::compile(JIT"(
def my_script_method(x, y):
    return torch.relu(x) + y
)");
IValue result = module->run_method("my_script_method", 1, 2);
```

goldsborough zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847

Differential Revision: D9543461

Pulled By: driazati

fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27
2018-09-06 12:42:21 -07:00
0f1ec07c57 nomnigraph - nit - rename unit test files (#11315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11315

Rename unit tests file to make it consistent with fb cpp style guideline "The unittest for MyFoo.cpp should be named MyFooTest.cpp."

Reviewed By: yinghai

Differential Revision: D9671519

fbshipit-source-id: 44ed6794f6e479d190916db8064eee692e3ad876
2018-09-06 12:28:18 -07:00
ed8849b640 Add include path to Doxygen preprocessing and add some documentation (#11313)
Summary:
1. Add documentation to Linear and improve documentation for RNNs
2. Fix preprocessing in C++ docs by adding correct include path
3. Make myself and ebetica codeowner of docs/cpp to improve development speed

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11313

Differential Revision: D9683615

Pulled By: goldsborough

fbshipit-source-id: 84ea32f9ea6b4060744aabbf5db368776a30f0b5
2018-09-06 12:28:17 -07:00
f98bd53b01 Small fix to the UniformIntFill tensor shape and type inference.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11028

Reviewed By: salexspb

Differential Revision: D7715107

Pulled By: costin-eseanu

fbshipit-source-id: a4f73d53c0192b9826451b4bba4ab0992abbb1a2
2018-09-06 12:11:32 -07:00
1ad61a18b2 Rename cuda tests to have 'cuda' in their names (#11332)
Summary:
Not a lot changed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11332

Differential Revision: D9683680

Pulled By: zou3519

fbshipit-source-id: 95f444e54049dd268fc10effe425ef2df79c6467
2018-09-06 11:57:52 -07:00
0ef2b318a2 fix empty net type (#11286)
Summary:
Turns out that '' net.type is not acceptable to CreateNet.

But empty net.type is acceptable.

Fix that in this diff. Also this is related to T33613083
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286

Reviewed By: Maratyszcza, wat3rBro

Differential Revision: D9659920

Pulled By: harouwu

fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946
2018-09-06 11:10:01 -07:00
936bba77d1 cudnn 7 upgrade with spatialBN fix (#11291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11291

In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%).

Our current theory for this accuracy loss is because of the new "CUDNN_BATCHNORM_SPATIAL_PERSISTENT" in spatialBN operator. In Caffe 2, we've made this mode as default. According to CuDNN manual (https://fburl.com/z996mr13), this mode may introduce some limitation in the input data range and cause overflow (which outputs NaN). NaN is probably not the case, because we're seeing a few percent of accuracy drop but not gradient explosion or failure. However, this "performance-optimized" code path may introduce accuracy loss (which is not caught by our unit test case because the input data range is [-0.5-0.5].

Reviewed By: kuttas, stephenyan1231

Differential Revision: D9601217

fbshipit-source-id: 73c2690c19cb1f02ea4e5e2200f50128df4f377b
2018-09-06 10:11:59 -07:00
4ae95738b2 Ignore FuseGraph Call on Windows (#11015)
Summary:
Fusion is NYI implemented on Windows, so ignore FuseGraph call instead of failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11015

Differential Revision: D9619121

Pulled By: eellison

fbshipit-source-id: ad09aeaa41b7fdeb9ca7bf5e1c166923ca405b15
2018-09-06 09:54:51 -07:00
a853a74217 defer resolution of mkl to a cmake wrapper library (#11298)
Summary:
this is a fix that's needed for building extensions with a
pre-packaged pytorch. Consider the scenario where

(1) pytorch is compiled and packaged on machine A
(2) the package is downloaded and installed on machine B
(3) an extension is compiled on machine B, using the downloaded package

Before this patch, stage (1) would embed absolute paths to the system
installation of mkl into the generated Caffe2Config.cmake, leading to
failures in stage (3) if mkl was not at the same location on B as on
A. After this patch, only a reference to the wrapper library is
embedded, which is re-resolved on machine B.

We are already using a similar approach for cuda.

Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298

Differential Revision: D9683150

Pulled By: anderspapitto

fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4
2018-09-06 09:10:39 -07:00
dda8402447 Cleanup dependency of distributed flags (#11221)
Summary:
Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set.

cc pietern The controller you requested could not be found. cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221

Reviewed By: Yangqing

Differential Revision: D9664267

Pulled By: orionr

fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73
2018-09-06 08:56:00 -07:00
68930c48cf Move minimal wrapdim functionality to core, remove THTensor include i… (#11283)
Summary:
…n TensorImpl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11283

Reviewed By: ezyang

Differential Revision: D9660015

Pulled By: gchanan

fbshipit-source-id: 263cba226d9ee981d55281c94e6fda5842a46b02
2018-09-06 08:10:33 -07:00
f6568b00f5 Change includes from ATen/Storage.h to ATen/core/Storage.h (#11217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11217

```
codemod -d . --extensions cc,cpp,cu,cuh,h 'ATen/Storage.h' 'ATen/core/Storage.h'
```

Reviewed By: gchanan

Differential Revision: D9634904

fbshipit-source-id: 35a177733f3816e32d8748513c9caa4cf13a6896
2018-09-06 08:10:30 -07:00
656e81db93 Fix scalar tensor assert in fusion compiler (#10952)
Summary:
Fixes #8560.
Unblocks #10715.

The assert (nDim <= uncompressedDims) was being triggered for a scalar
tensor because we compute nDim to be 1 for a scalar tensor but
uncompressedDim = 0.

This PR changes it so that we compute nDim to be 0 for a scalar tensor. This
works because indexing in a kernel depends on nDim. If nDim = 0, then
offset is always 0, which is what we want.

Some other (small) changes were necessary to make this work:
- One cannot define a 0-length array `IndexType arr[0]` so the code
  guards against that
- Needed to change some of the maxTensorInfoSize logic to handle the
  case when uncompressedDim == 0.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10952

Differential Revision: D9544607

Pulled By: zou3519

fbshipit-source-id: 2b873f47e2377125e1f94eb1b310a95cda51476c
2018-09-06 07:54:57 -07:00
bb7d1837bc Add dead code elimination pass (#10101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10101

Simple DCE enabled by knowledge of the actual outputs (stacked beneath this diff)

Reviewed By: yinghai

Differential Revision: D9107853

fbshipit-source-id: 0c38fe5fe408be2b7fc9e1fe6a5b7160c06ce79b
2018-09-05 23:55:17 -07:00
220c9e52b9 Distributed Data Parallel CPU module for C10D (#11168)
Summary:
Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed.

We will keep both in the first release and remove the THD one once c10d is stable enough.

Test fully covered just as THD too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11168

Differential Revision: D9674963

Pulled By: teng-li

fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7
2018-09-05 21:59:31 -07:00
126ac4b71f Back out "[pt1][tensor] Add strides to caffe2::Tensor"
Summary: Original commit changeset: 3643871b70f1

Differential Revision: D9665958

fbshipit-source-id: 46e22adbf39af92fb23abb66212991bd53a86317
2018-09-05 20:39:07 -07:00
fb836db4b2 Fix conv gradient conversion (#11312)
Summary:
Fix Windows build failure after https://github.com/pytorch/pytorch/pull/10744 landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11312

Reviewed By: mingzhe09088

Differential Revision: D9669907

Pulled By: orionr

fbshipit-source-id: d717ec4f8fdf17acf334528d7838b88c5c50e9c3
2018-09-05 20:09:31 -07:00
dccd0f2de6 Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050)
Summary:
Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive).

Fixed things like:
- `typedef` vs `using`
- Use `.empty()` instead of comparing with empty string/using `.size() == 0`
- Use range for loops instead of old style loops (`modernize-`)
- Remove some `virtual` + `override`
- Replace `stdint.h` with `cstdint`
- Replace `return Type(x, y)` with `return {x, y}`
- Use boolean values (`true`/`false`)  instead of numbers (1/0)
- More ...

ezyang apaszke cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050

Differential Revision: D9597505

Pulled By: goldsborough

fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2
2018-09-05 19:55:50 -07:00
83a1ab2136 Sparse tensor printing; add NotImplemented autograd fn (#10181)
Summary:
Commits:

1. Add autograd function `NotImplemented` (subclass of `Error`) so python `grad_fn` prints nicer. Since `Error` is used in `DelayedError` to implement `oncedifferentiable`, I can't just change its name. cc colesbury

2. Add printing for sparse tensors. Fixes https://github.com/pytorch/pytorch/issues/9412 . cc weiyangfb The controller you requested could not be found. .

3. Add tests for sparse printing

Examples:
```diff
  In [2]: x = torch.sparse.FloatTensor(torch.arange(4).view(2,2), torch.randn(2, 2), [10, 10, 2])

  In [3]: x
  Out[3]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]])
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]])
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo)

  In [4]: x.requires_grad_()
  Out[4]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]], grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, requires_grad=True)

  In [5]: x + x
  Out[5]:
- torch.sparse.FloatTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-2.3664, -1.1855],
-         [ 0.1662,  0.5021]], grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 3.0162,  0.6902],
+                       [-0.0785,  0.9553]]),
+        size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, grad_fn=<AddBackward0>)

  In [6]: x.double()
  Out[6]:
- torch.sparse.DoubleTensor of size (10,10,2) with indices:
- tensor([[0, 1],
-         [2, 3]], grad_fn=<Error>)
- and values:
- tensor([[-1.1832, -0.5927],
-         [ 0.0831,  0.2511]], dtype=torch.float64, grad_fn=<Error>)
+ tensor(indices=tensor([[0, 1],
+                        [2, 3]]),
+        values=tensor([[ 1.5081,  0.3451],
+                       [-0.0392,  0.4776]]),
+        size=(10, 10, 2), nnz=2, dtype=torch.float64, layout=torch.sparse_coo,
+        grad_fn=<NotImplemented>)

  In [7]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2, 0), [0])

  In [8]: x
  Out[8]:
- torch.sparse.FloatTensor of size (0,) with indices:
- tensor([], size=(0, 2), dtype=torch.int64)
- and values:
- tensor([], size=(2, 0))
+ tensor(indices=tensor([], size=(0, 2)),
+        values=tensor([], size=(2, 0)),
+        size=(0,), nnz=2, layout=torch.sparse_coo)

  In [9]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2), [])

  In [10]: x
  Out[10]:
- torch.sparse.FloatTensor of size () with indices:
- tensor([], size=(0, 2), dtype=torch.int64)
- and values:
- tensor([-0.0064,  0.8518])
+ tensor(indices=tensor([], size=(0, 2)),
+        values=tensor([ 0.9800, -0.5978]),
+        size=(), nnz=2, layout=torch.sparse_coo)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10181

Differential Revision: D9139845

Pulled By: SsnL

fbshipit-source-id: 353eebd55fac4049ed9bf85f8b0ee2c1418a744e
2018-09-05 19:41:22 -07:00
fa147abda4 Add convertToCaffe2Proto to python API
Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now

Reviewed By: duc0

Differential Revision: D9670495

fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5
2018-09-05 18:40:48 -07:00
425ea6b31e fix doc for functional.dropout* (#10417)
Summary:
- fixes #4177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417

Differential Revision: D9542876

Pulled By: weiyangfb

fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df
2018-09-05 17:26:00 -07:00
ad116210e5 typo fix Tranpose2D -> Transpose2D (#11281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11281

A simple typo fix

Reviewed By: BIT-silence

Differential Revision: D9658324

fbshipit-source-id: b6513c8d12d8fe75a9b18df1b443e9e66e692744
2018-09-05 17:25:58 -07:00
a9d8b021e9 Remove THFinalizer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11287

Reviewed By: ezyang

Differential Revision: D9662341

Pulled By: cpuhrsch

fbshipit-source-id: 306bea00694db1ae207167ee4bf10de01426911c
2018-09-05 16:56:27 -07:00
c0efe6f027 Forward declarations of needed curand functions (#10911)
Summary:
Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911

Reviewed By: pjh5

Differential Revision: D9636256

Pulled By: orionr

fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65
2018-09-05 16:56:26 -07:00
57728f71e7 nomnigraph - simplify core graph API and test (#11256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11256

- in deleteNode method, remove optional deleteEdge flag as it's not used
- in deleteEdge method, remove optional removeRef flag as it's not used
- in replaceNode method, remove optional newHead_ parameter as it's not used - also simplifying the implementation by just calling replaceInEdges and replaceOutEdges
- remove importNode & importEdge as they're not in used
- add getEdgeIfExists that is like getEdge() but returns nullptr instead of throwing when the edge does not exist
- reduce verbosity in the basic graph unit test and add more test cases for ReplaceEdges

Differential Revision: D9650913

fbshipit-source-id: 6c18b37bef0d2abe1b57fb4fc47bfdbcee387694
2018-09-05 16:40:49 -07:00
c43187291c Small fixes to cppdocs for sync script (#11300)
Summary:
I'm setting up an automatic sync job for cppdocs and need two fixes to the cpp docs config:

1. Right now the cppdocs use the `torch` package to figure out the version. For C++ docs all I really need from the built package are the generated Tensor.h and Functions.h files. I can actually generate those directly via `aten/src/ATen/gen.py`, so I can skip building PyTorch altogether and save 10 minutes in the sync job! For this I need to avoid using the torch package in the docs.
2. Internal proxy issues prevent using the git link for sphinx_rtd_theme. We can just use the pip package for the cppdocs (not for the normal PyTorch docs)

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11300

Differential Revision: D9667193

Pulled By: goldsborough

fbshipit-source-id: 5567e0b3d3bdce03f5856babdb4ff76bcee91846
2018-09-05 16:40:47 -07:00
c9e66351a7 Port all PyTorch and Caffe2 jobs to CircleCI (#11264)
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.

Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect

Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264

Differential Revision: D9656793

Pulled By: yf225

fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
2018-09-05 16:28:11 -07:00
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
ac9f0a6884 refactor preproc, support dense in TumHistory layer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11131

Reviewed By: xianjiec

Differential Revision: D9358415

fbshipit-source-id: 38bf0e597e22d540d9e985ac8da730f80971d745
2018-09-05 16:10:13 -07:00
3e85685f8f add persistent rnns with conservative criteria (#11248)
Summary:
Persistent rnns provide much better performance on V100 with half input data for a variety of cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11248

Differential Revision: D9665687

Pulled By: ezyang

fbshipit-source-id: 2bd09a7eb1f5190aadb580977b0ba956e21a7dd5
2018-09-05 16:10:11 -07:00
68c2e014cb Handling for py2/py3 division differences (#11016)
Summary:
- In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if
  `from __future__ import division` is not imported in the file.
- The / operator is universally set to do "true" division for integers
- Added a `prim::FloorDiv` operator because it is used in loop unrolling.

The error if users use '/' in python 2 without importing from __future__
occurs when building the JIT AST.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016

Differential Revision: D9613527

Pulled By: zou3519

fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6
2018-09-05 14:57:38 -07:00
9a0effb92c Update send/recv tests to reflect intended use (#11275)
Summary:
The existing tests had every rank run send to every other rank and only
then switch to recv mode. This only works if the send operations are
non-blocking and the passed tensors are immediately copied to some kind
of send buffer. Instead, every send must be matched with a recv on the
other side, because from the API perspective they may block.

E.g. imagine a 1GB tensor being sent to every other rank. It can only go
through if there is a recv on the other side, or it will deadlock.

This change reflects this in the send/recv unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11275

Differential Revision: D9658197

Pulled By: pietern

fbshipit-source-id: fb6a3fc03b42343a9dfeed0def30d94914e76974
2018-09-05 14:40:04 -07:00
8da081f7a5 Add cost inference to ConvGradient and WeightedSum operators (#10744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10744

As title

Reviewed By: jspark1105

Differential Revision: D9436387

fbshipit-source-id: 578b7a6d98843d57e3f8f4c564727e9cadbedd78
2018-09-05 13:56:05 -07:00
4fe3356ee0 Move collapse dims into a single place (#11272)
Summary:
Deduplicates implementations and reduces sources of failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11272

Differential Revision: D9659167

Pulled By: cpuhrsch

fbshipit-source-id: 759bfba4fd90795038afe684d9829f5f41f98109
2018-09-05 12:57:00 -07:00
5e2067ce30 Fix some more warnings (#11257)
Summary:
Found these when compiling the new master with gcc 7.3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11257

Differential Revision: D9656612

Pulled By: SsnL

fbshipit-source-id: 7acb19e13204c010238dab7bc6973cc97b96f9a4
2018-09-05 11:10:27 -07:00
f866574afc Fix the batchnorm onnx exporting when affine=False
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11249

Reviewed By: Ac2zoom

Differential Revision: D9652526

Pulled By: houseroad

fbshipit-source-id: 12a9038beddd227a2f9e2178edf4e8d623488c3e
2018-09-05 11:10:25 -07:00
55212507a2 Improve error message to include return types too (#11245)
Summary:
Fixes #11057.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11245

Differential Revision: D9652698

Pulled By: apaszke

fbshipit-source-id: 4c5006e32e599c35367aa5acfae45de3ab8ac176
2018-09-05 10:56:51 -07:00
e6d6aed12e Check doxygen output in travis (#11124)
Summary:
This PR adds a .travis.yml check for our C++ documentation. The goal is to avoid any documentation/comments in our C++ code that would break the doxygen output and possibly ruin the C++ documentation site (currently https://pytorch.org/cppdocs).

For this, we:
1. Run doxygen and record any warnings,
2. Filter out some known bogus warnings,
3. Count the remaining warnings,
4. Fail the check if (3) is non-zero.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11124

Differential Revision: D9651011

Pulled By: goldsborough

fbshipit-source-id: 30f776d23bb6d6c482c54db32828b4b99547e87b
2018-09-05 10:25:56 -07:00
267e1ec112 Accept more numpy scalars as doubles (#9659)
Summary:
Allows mulitplication of e.g. numpy.float32 with tensors.

This came up with #9468

If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659

Differential Revision: D8948078

Pulled By: weiyangfb

fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b
2018-09-05 10:25:55 -07:00
8bd80a6b74 Fixed log message (#10874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874

Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported

Reviewed By: panshen1, Novitial

Differential Revision: D9506606

fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22
2018-09-05 09:55:52 -07:00
434e943b08 Fix to distribution.__repr__ with lazy attributes (#11263)
Summary:
`__repr__` currently fails for distributions with lazy attributes in PyTorch master, throwing a `KeyError`. This fixes the issue.

**Additionally:**
 - Added `logits` to `arg_constraints` for distributions that accept either `probs` or `logits`. This is both to have `__repr__` display the `logits` param when available, and to be able to do validation checks (e.g. NaN checks) when the logit parametrization is used. fritzo, alicanb - I think there were reasons why we had not done so in the first place, but I am unable to recall now. It passes all the tests, but let me know if there is something that I am missing at the moment.
 - There are certain distributions, e.g. `OneHotCategorical` which won't show any parameters because it uses a `categorical` instance under the hood and neither `logits` / `probs` in `arg_constraints` are present in the instance's `__dict__`. This isn't addressed in this PR.

cc. vishwakftw, fritzo, nadavbh12, apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11263

Differential Revision: D9654959

Pulled By: apaszke

fbshipit-source-id: 16f5b20243fe8e2c13e9c528050d4df0b8ea6e45
2018-09-05 09:55:51 -07:00
9fc22cb772 Add import export step to end to end tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717

Differential Revision: D9562888

Pulled By: li-roy

fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a
2018-09-05 09:39:47 -07:00
1808e368e4 Add complex hooks for out of tree complex implementation. (#11216)
Summary:
This PR adds a hooks interface for registering types for complex
scalar types, and a sample implementation of the hook in
test_cpp_extensions.

The hook registration is patterned off of the existing CUDA hooks.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CC The controller you requested could not be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11216

Differential Revision: D9654840

Pulled By: ezyang

fbshipit-source-id: 7b97646280d584f8ed6e14ee10a4abcd04cf2987
2018-09-05 09:25:50 -07:00
aeb6094538 Unify opt flag for cmake codegen (#11227)
Summary:
Also enables debug for non-MSVC for kernel codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227

Differential Revision: D9656506

Pulled By: cpuhrsch

fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333
2018-09-05 08:55:49 -07:00
d612855b91 nomnigraph - fix memory error in NN subgraph matchOp (#11127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11127

it's invalid to capture `predicate` by reference as it's a local variable. capture it by value instead.

Differential Revision: D9600115

fbshipit-source-id: 92e0130d0a74908380b75ade5c3492df49e25941
2018-09-05 07:57:40 -07:00
6d6655e6be Port PackedSequences functions to C++ (#11224)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224

Differential Revision: D9652703

Pulled By: apaszke

fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950
2018-09-05 06:35:15 -07:00
b7038f7c37 Treat numerical differences as warnings instead of errors when tracing (#11246)
Summary:
Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246

Differential Revision: D9652701

Pulled By: apaszke

fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25
2018-09-05 06:35:13 -07:00
b7cd4b692c add a Float16UniformFill (#11123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11123

this adds an operator that fills a tensor with a uniform(min, max)
the implementation is to use the fp32 generator and convert to fp16

if performance becomes an issue we could resort to intrinsics

Reviewed By: jspark1105, chocjy

Differential Revision: D9598142

fbshipit-source-id: 5aeab99acf7c3596fa6c33611d9d2c484f7c1145
2018-09-04 23:28:22 -07:00
d4060d2d0e Implement torch.tensordot (#10025)
Summary:
Fixes: #8988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025

Reviewed By: ezyang

Differential Revision: D9540967

Pulled By: yf225

fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb
2018-09-04 21:10:07 -07:00
d1b920b44f keep net type info when generating model complete net (#11032)
Summary:
keep net type info when generating model complete net. This will keep the performance optimization option
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032

Reviewed By: wat3rBro

Differential Revision: D9564125

Pulled By: harouwu

fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df
2018-09-04 21:10:06 -07:00
56bdd87b40 Get rid of some uses of type() (#11215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11215

I found these by deleting the implicit conversion of Type to
TensorOptions and then fixing sites.  This isn't a complete
refactor, because I ran out of steam after fixing this many
and decided to keep the implicit conversion.  Still, why
waste a perfectly good refactor?

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9634750

fbshipit-source-id: 4d8fb778e13e6e24b888b1314a02709b2cb00b62
2018-09-04 20:26:22 -07:00
9ca63c5e63 Reorganize methods in Type, add CPUTypeDefault/CUDATypeDefault (#11205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11205

Our short term plan for supporting out of tree complex development requires an
external library to add a custom subclass of Type without access to the
code generation facilities in ATen.  This commit reorganizes Type so
as to minimize the amount of boilerplate you have to write when making
a subclass of Type.

In particular, it:
- Creates a new CPUTypeDefault/CUDATypeDefault class, which you are
  intended to inherit from, which provides default implementations
  of CPU/CUDA that is layout/dtype agnostic.
- Adds new getCPUAllocator() and getCUDAAllocator() functions, as
  a more public API to get your hands on Allocator
- Adds allocator() and getDeviceFromPtr(), abstracting the device
  specific parts of storage() methods; these methods are now
  implemented in base TypeDefault.
- Delete the static typeString() method, which is now dead.
- Move is_cuda/is_sparse/is_distributed to TypeDefault.

Reviewed By: SsnL

Differential Revision: D9631619

fbshipit-source-id: 40b600d99691230e36e03eb56434c351cbc2aa3a
2018-09-04 20:26:20 -07:00
f0d3fda064 Improve docs for torch::nn::Module (#11115)
Summary:
Added some documentation. Will rebuild docs to make sure it looks good. Can already accept approvals.

ebetica apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11115

Differential Revision: D9597880

Pulled By: goldsborough

fbshipit-source-id: 56b701da631702ba56e281a0de0f7ebe490f5c5a
2018-09-04 18:10:38 -07:00
7f74875304 Pull Context out of TensorMethods.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11241

Reviewed By: ezyang

Differential Revision: D9645514

Pulled By: gchanan

fbshipit-source-id: 43e65d1d2fa3183264ed7e4752c1512df5f69175
2018-09-04 18:10:37 -07:00
05cb40dc00 Move some includes from Tensor/Type to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11234

Reviewed By: ezyang

Differential Revision: D9642669

Pulled By: gchanan

fbshipit-source-id: 2c131bb46b54a0803c37b444ad48d861080056f1
2018-09-04 18:10:34 -07:00
c8672f0b42 Support environments with no libprotobuf (#11161)
Summary:
Just pulling this out of https://github.com/pytorch/pytorch/pull/10611

Make sure we can support environments where we don't have libprotobuf installed when we link-local protobuf.

cc goldsborough Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11161

Differential Revision: D9650282

Pulled By: orionr

fbshipit-source-id: 447b5e54cd2639973b4b10f58590d1c693a988d4
2018-09-04 17:27:54 -07:00
020501b7b0 Getting rid of USE_C10D for build (#11237)
Summary:
Will use USE_DISTRIBUTED for both c10d and THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237

Differential Revision: D9647825

Pulled By: teng-li

fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969
2018-09-04 17:27:53 -07:00
313e89d8db Fix dimension collapsing (#11226)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226

Differential Revision: D9646638

Pulled By: cpuhrsch

fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0
2018-09-04 17:27:52 -07:00
6219c4a28f Make Scalar::toTensor a free function, move Scalar to ATen/core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125

Reviewed By: ezyang

Differential Revision: D9599798

Pulled By: gchanan

fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4
2018-09-04 16:25:57 -07:00
033499cf56 Remove mention of USE_DISTRIBUTED_MW (#11240)
Summary:
This was lingering after #10731.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240

Differential Revision: D9645437

Pulled By: pietern

fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d
2018-09-04 16:10:20 -07:00
3f30c296d3 Export CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_* (#11233)
Summary:
This PR resolved the following compilation errors on devgpu:
/home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_Tan()'
/home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_MaxPool3D()'
....

The same error has been happening with caffe2 build with debug mode before build_caffe2 was removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11233

Reviewed By: orionr

Differential Revision: D9645527

Pulled By: mingzhe09088

fbshipit-source-id: 68a45aa7fd815cac41b7fd64cfd9838b3226345a
2018-09-04 14:56:43 -07:00
7e0a052a5d Adding synthetic data generation to the filler.h file (#11060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060

Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on).

Reviewed By: highker

Differential Revision: D9417594

fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7
2018-09-04 13:40:53 -07:00
1eed7d5f0b Report an error when trying to record a mutable operator when (#11129)
Summary:
there are multiple views of the tensor live.

Also adds recording for copy_ because this is the critical in place
op where these views will cause LHS indexing to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129

Differential Revision: D9600195

Pulled By: zdevito

fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3
2018-09-04 13:40:51 -07:00
0e8088d6f6 Fix typo in data_parallel_model
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11086

Differential Revision: D9581297

fbshipit-source-id: b164177bdbb309f56ff3231c1ffc0973f6c5299b
2018-09-04 13:15:31 -07:00
ec6f0ed560 Additional Python Bindings
Summary:
Major change:
- Addition of pattern matching bindings

Minor change:
- OperatorDef instantiation
- Generic Graph API

Reviewed By: duc0

Differential Revision: D9546205

fbshipit-source-id: ab5274014be23a3e9e3fcf18ae1815c4f387b83c
2018-09-04 12:10:10 -07:00
750cd48980 update expect file for short circuiting (#11229)
Summary:
Fix failing test by updating expect file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11229

Differential Revision: D9638587

Pulled By: eellison

fbshipit-source-id: e870ef3a4fbc7e07f299cc9413703d9f77e89895
2018-09-04 11:56:09 -07:00
684b55d762 In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020)
Summary:
TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020

Differential Revision: D9562548

Pulled By: Yangqing

fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5
2018-09-04 10:56:22 -07:00
539579aa9a Logical short circuit (#11116)
Summary:
Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated.

BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python.

One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean.

`tensor == (False or tensor)`
`tensor == (True and tensor)`

We do not currently support this.

edit: wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116

Differential Revision: D9618168

Pulled By: eellison

fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343
2018-09-04 09:25:13 -07:00
b2217109ec Move TensorOptions to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11147

Reviewed By: gchanan

Differential Revision: D9614321

fbshipit-source-id: 618cb342eb7c52181425f6bb9c17b9ecdb87a394
2018-09-04 08:55:54 -07:00
0ff1bb0d8a Remove Type constructor from TensorOptions, add Type::options (#11189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189

Replaces it with an operator TensorOptions() method on
Type, reestablishing the implicit conversion.  I originally
wanted to get rid of the implicit conversion entirely, but
there were a *lot* of use-sites, so I added it back to avoid
a huge codemod.  In this patch, I only had to fix sites that
used the optional device_index API.

Reviewed By: cpuhrsch

Differential Revision: D9628281

fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210
2018-09-04 08:10:04 -07:00
0d5e4a2c66 Allow passing through arguments to unittest (#11209)
Summary:
Example:
```sh
python run_test.py -i sparse -- TestSparse.test_factory_size_check -f
```

With this, the `--verbose` option is redundant (one can call `python run_test.py -- -v` instead of `python run_test.py -v`. But since this is (probably) a frequently used flag, I didn't remove the existing easier-to-use option.

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11209

Differential Revision: D9632215

Pulled By: SsnL

fbshipit-source-id: ff522802da11ef0a0714578be46e4a44f6343d44
2018-09-03 20:09:08 -07:00
050aa42e09 Fix some more compile warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11208

Differential Revision: D9632216

Pulled By: SsnL

fbshipit-source-id: b181f3ce114474e171146cd2ac5de150b0e23f75
2018-09-03 19:39:33 -07:00
cd4c32691d Add complex32, complex64 and complex128 dtypes (#11173)
Summary:
We don't generate a corresponding Type implementations for them,
so this doesn't do anything at the moment.

We don't plan on supporting complex32 in the near future, but
it is added to reserve the name and number in case we do at
some point in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173

Reviewed By: SsnL

Differential Revision: D9627477

Pulled By: ezyang

fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6
2018-09-03 19:19:36 -07:00
c5b021cc88 State dict loading arguments were in the wrong order (#11200)
Summary:
In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200

Differential Revision: D9631160

Pulled By: SsnL

fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be
2018-09-03 15:42:30 -07:00
7e2136c2b5 remove allclose from test_doc skipped list
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187

Differential Revision: D9628349

Pulled By: SsnL

fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6
2018-09-03 09:39:56 -07:00
24eb5ad0c5 Fix unit tests on CI (#11191)
Summary:
Disables two of the  unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11191

Differential Revision: D9628702

Pulled By: ezyang

fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265
2018-09-02 21:54:47 -07:00
0a8c8c1dbe Rename real to scalar_t. (#11163)
Summary:
This is necessary to allow us to use the complex header
which defines real (and is very sad if real is macro'ed).

We should also fix accreal, ureal, Real and REAL, but
only 'real' is the real blocker.

```
codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163

Reviewed By: SsnL

Differential Revision: D9619906

Pulled By: ezyang

fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942
2018-09-02 15:26:01 -07:00
43fd6b234d Make Type a (mostly) pure virtual class; TypeDefault for impls (#11013) (#11013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11013

Previously, the parent class Type also contained a large number
of implementations, for things like broadcasting and native
functions that didn't need dispatch.  We'd like to be able
to reference this interface from Tensor even when we don't
have any of these implementations are available.

To do this, we convert Type into a truly pure virtual interface,
and move all of the implementations to TypeDefault.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11181

Differential Revision: D9561478

Pulled By: ezyang

fbshipit-source-id: 13c49d80bc547551adf524b1cf1d691bfe311133
2018-09-02 15:25:59 -07:00
e1a17d5a42 Should not use CAFFE2_API when definition is already in header. (#11114)
Summary:
Remove or use CAFFE2_EXPORT.
Fix #11108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11114

Differential Revision: D9628293

Pulled By: ezyang

fbshipit-source-id: dc3bb7dc5bc299e3b6cfd1cdd640f618c206fb5a
2018-09-02 14:39:38 -07:00
cf10efb8d4 Fixes unclear exception message for F.conv2d (#11053)
Summary:
Fixes #11033
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11053

Differential Revision: D9573606

Pulled By: soumith

fbshipit-source-id: 9729cbd6c8afcef0fd487bdd425b0d1f55189009
2018-09-02 13:39:34 -07:00
593d74061f Document torch.allclose (#11185)
Summary:
- Modify torch.autograd.gradcheck to use torch.allclose instead
- Expose doc strings

Closes #10355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185

Differential Revision: D9628016

Pulled By: soumith

fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75
2018-09-02 09:26:07 -07:00
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
abe8b3391d LowRankMultivariateNormal cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11179

Differential Revision: D9627502

Pulled By: soumith

fbshipit-source-id: c7a4aa8be24bd8c688a7c655ff25ca901ed19704
2018-09-02 07:54:56 -07:00
4d28b65fb8 fix serialization of nn.Parameter with dill (#10296)
Summary:
Should resolve #9981.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10296

Differential Revision: D9196353

Pulled By: soumith

fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279
2018-09-01 23:55:40 -07:00
1350f76b62 Fix max and min with inf on CUDA (#11091)
Summary:
Fixes #10237 #11084

cc vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091

Differential Revision: D9582859

Pulled By: SsnL

fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10
2018-09-01 23:09:23 -07:00
7eba9849c1 Pool constants during script compilation. (#10231)
Summary:
This places all constants in the entry block of the graph, and de-duplicates them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10231

Differential Revision: D9601501

Pulled By: resistor

fbshipit-source-id: daa10ed8c99e9894830d6f3e5d65c8d3ab5ea899
2018-09-01 22:40:50 -07:00
7af6f9515f Move TensorAccessor to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11014

Reviewed By: cpuhrsch

Differential Revision: D9561802

fbshipit-source-id: d3dbe6d7e76e2419ead81fb448711f101daee19f
2018-09-01 21:41:26 -07:00
011f615945 Fix compile warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11177

Reviewed By: soumith

Differential Revision: D9626443

Pulled By: SsnL

fbshipit-source-id: e75d893e1e91e49d3e7b021892434489d8df7987
2018-09-01 21:41:25 -07:00
1506547771 Disable -Werror on macOS test build (#11090)
Summary:
cc goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11090

Reviewed By: soumith

Differential Revision: D9582525

Pulled By: apaszke

fbshipit-source-id: 5d2c6e930e7b09f0ed5a35fbf4fe36b8845a2580
2018-09-01 21:09:49 -07:00
f60a2b682e allow spaces in filename for jit-compiled cpp_extensions (#11146)
Summary:
Now, folder having spaces will not error out for `torch.utils.cpp_extensionload(name="xxx", sources=["xxx.cpp"], verbose=True)` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11146

Differential Revision: D9618838

Pulled By: soumith

fbshipit-source-id: 63fb49bfddc0998dccd8a33a6935543b1a6c2def
2018-09-01 20:39:51 -07:00
43e73f85ad Dont optimize slicing dispatch when we are tracing (#11156)
Summary:
Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization.

This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156

Differential Revision: D9622252

Pulled By: jamesr66a

fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc
2018-09-01 17:13:03 -07:00
b3d559cdd1 Optimize WeightedSumOp for two inputs (#11049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049

Optimize WeightedSumOp for two inputs

Reviewed By: houseroad

Differential Revision: D9566692

fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f
2018-09-01 11:54:55 -07:00
b834d9107e Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164

Revert D9566744

Reviewed By: enosair

Differential Revision: D9620272

fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02
2018-08-31 22:25:57 -07:00
1b7172a2b9 fix the slice onnx exporting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11117

Reviewed By: MisterTea

Differential Revision: D9597870

Pulled By: houseroad

fbshipit-source-id: 3a2a307ee327397939bedb9150f780682e18a89a
2018-08-31 17:40:03 -07:00
03c06ec93d Traceable detach (#11038)
Summary:
This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038

Differential Revision: D9588038

Pulled By: jamesr66a

fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7
2018-08-31 16:40:42 -07:00
861e1c430c Move StorageImpl and Storage to core (#11154)
Summary:
Will need to be accessible by caffe2

This also removes a bunch of unnecessary includes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11154

Reviewed By: ezyang

Differential Revision: D9618681

Pulled By: cpuhrsch

fbshipit-source-id: 838a87b75d9c3959e145fd5fca13b63bc5de7bd3
2018-08-31 15:55:26 -07:00
4abddad1a0 use py::str to remove deprecation warnings (#11107)
Summary:
```
In file included from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/cast.h:13:0,
                 from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/attr.h:13,
                 from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pybind11.h:43,
                 from caffe2/torch/csrc/utils/pybind.h:6,
                 from caffe2/torch/csrc/jit/pybind.h:5,
                 from caffe2/torch/csrc/jit/script/init.h:3,
                 from caffe2/torch/csrc/jit/script/init.cpp:1:
third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pytypes.h:118:19: note: declared here
In file included from caffe2/torch/csrc/jit/pybind.h:12:0,
                 from caffe2/torch/csrc/jit/python_ir.cpp:4:
caffe2/torch/csrc/jit/pybind_utils.h: In function 'torch::jit::IValue torch::jit::argumentToIValue(const torch::jit::FunctionSchema&, size_t, pybind11::handle)':
caffe2/torch/csrc/jit/pybind_utils.h:138:226: warning: 'pybind11::str pybind11::detail::object_api<Derived>::str() const [with Derived = pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>]' is deprecated: Use py::str(obj) instead [-Wdeprecated-declarations]
```

apaszke zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11107

Differential Revision: D9598040

Pulled By: goldsborough

fbshipit-source-id: 4a055353ac08d54a2bbca49573ff099310de3666
2018-08-31 15:25:04 -07:00
c48bf3a77e Automatic update of fbcode/onnx to 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c (#11153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11153

Previous import was bae6333e149a59a3faa9c4d9c44974373dcf5256

Included changes:
- **[1b09eb1](https://github.com/onnx/onnx/commit/1b09eb1)**: Fix the shape inference for concat (#1361) <Lu Fang>
- **[7b9b3ee](https://github.com/onnx/onnx/commit/7b9b3ee)**: ONNX v1.3.0 release (#1359) <bddppq>

Reviewed By: Ac2zoom

Differential Revision: D9615844

fbshipit-source-id: f1d4e2d6ef72a269d6ab3c1c347b272b5bdc4f2a
2018-08-31 14:55:15 -07:00
5987b44dda Remove aten doc/ folder (#11158)
Summary:
ATen's doc/ folder is manually maintained and can thus cause confusion with the generated file. We now have proper online documentation for ATen, which is superior to ATen doc/. Let's delete ATen/doc.

ezyang apaszke soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11158

Differential Revision: D9618782

Pulled By: goldsborough

fbshipit-source-id: 0ef14f84947601a0589aa4a41e5c8619783426fe
2018-08-31 14:55:13 -07:00
3081c8ea1d Lower trivial differentiable subgraphs (#11110)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110

Differential Revision: D9616408

Pulled By: apaszke

fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867
2018-08-31 14:55:10 -07:00
c87d082d26 Use ->data<real>() instead of THTensor_(data) and c10::raw::intrusive_ptr::decref instead of _free (#11039)
Summary:
Codemod used for this

```
grep -rnw "THTensor_(free)" aten | grep -v Binary | cut -f 1 -d ":" | xargs -I {} sed -i "s/THTensor_(free)(\([^)]*\))/c10::raw::intrusive_ptr::decref(\1)/g" {}
```

```
grep -rnw "THTensor_(data)" aten | grep -v Binary | cut -f 1 -d ":" | xargs -I {} sed -i "s/THTensor_(data)(\([^)]*\))/\1->data<real>()/g" {}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11039

Reviewed By: ezyang

Differential Revision: D9617265

Pulled By: cpuhrsch

fbshipit-source-id: d9e7581867a335703f82f4556cead2b32b97bd83
2018-08-31 14:27:09 -07:00
adeebed549 Delete TensorImpl::toString() (#11035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11035

Instead, inline its definition into Tensor.  We need
to do this so we can avoid needing to getType() from
TensorImpl.

Reviewed By: cpuhrsch

Differential Revision: D9564516

fbshipit-source-id: 19fdaa2b93419e21572b9916714aee4165cb3390
2018-08-31 14:27:08 -07:00
5286925d4a Add getMaybeVariableType(const TensorImpl*) (#11031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11031

The eventual plan is to get rid of TensorImpl::type()
entirely; but first we need a function to call.

Reviewed By: cpuhrsch

Differential Revision: D9564206

fbshipit-source-id: b59a9ccfaed44199f185eff392835cec89ccda8e
2018-08-31 14:27:06 -07:00
2c5ae8c4bf Get rid of type() method on TensorOptions; use at::getType instead (#11023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023

I'd like TensorOptions to not know anything about Context, so I can
move it to ATen/core without pulling in Context.  To do this, the
type() method has to go, since it consults the context to get a Type.

Reviewed By: cpuhrsch

Differential Revision: D9562467

fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0
2018-08-31 14:27:05 -07:00
fd110411b7 Don't convert TensorOptions to type before printing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11145

Reviewed By: cpuhrsch

Differential Revision: D9613897

fbshipit-source-id: eaa28b24992e8202cecb5ab97fa541fcf49a205f
2018-08-31 14:27:03 -07:00
48c2f3cf0f Move TensorOptions Tensor methods to TensorMethods.h (#11144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11144

We can move them now that TensorMethods no longer references Tensor.

Reviewed By: cpuhrsch

Differential Revision: D9613800

fbshipit-source-id: 99ad1dd7d77eb319000769230b7016294cf1980f
2018-08-31 14:27:02 -07:00
780d2792c5 Warn about non-traceable behavior when tracing (#11088)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088

Differential Revision: D9585527

Pulled By: apaszke

fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2
2018-08-31 14:27:00 -07:00
c31ebccd01 Clean up TupleType and SchemaParser (#11007)
Summary:
Some fixes to address your comments zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11007

Differential Revision: D9597750

Pulled By: goldsborough

fbshipit-source-id: f35f4801707dff2367e9dfc7d4e968357bc2b832
2018-08-31 14:26:59 -07:00
f4b2961af9 Simplify assignment operators (#11027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11027

Using swap() as a primitive, copy and move assignment become much easier.

Reviewed By: ezyang

Differential Revision: D9563753

fbshipit-source-id: e74faf39b596f097de758bfe038639565807040a
2018-08-31 13:43:41 -07:00
6508db7421 Remove BUILD_CAFFE2 and build everything (#8338)
Summary:
This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification.

cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338

Reviewed By: mingzhe09088

Differential Revision: D9600513

Pulled By: orionr

fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d
2018-08-31 13:10:24 -07:00
a2a584f347 Proper recompilation tracking for more files in tools/autograd (#11143)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11143

Differential Revision: D9613758

Pulled By: ezyang

fbshipit-source-id: 08ed143739438435e0e8219dff3a738ab424c3e1
2018-08-31 13:10:21 -07:00
3791bd12c8 PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128)
Summary:
Added MPI group support.
And this will make all previous group test cases of MPI passed.

Also, release the MPI thread level support by serializing different PG's MPI ops. This is required.

The build is fixed too
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128

Differential Revision: D9602188

Pulled By: teng-li

fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497
2018-08-31 12:39:56 -07:00
d95e68c8cc Delete Tensor constructor from TensorOptions. (#11101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101

I'd like to invert the dependency between Tensor and TensorOptions
(such that Tensor includes TensorOptions); to do this, I'd prefer
there to not be a Tensor constructor.  Eventually, all references
of Tensor will disappear from TensorOptions.h

Reviewed By: cpuhrsch

Differential Revision: D9585627

fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840
2018-08-31 09:55:01 -07:00
a585158c9e Some usage examples for TensorOptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11081

Reviewed By: goldsborough

Differential Revision: D9579371

fbshipit-source-id: 329a07fc2e58f57384c8a840bcdebc2c6d4f7bb1
2018-08-31 09:40:30 -07:00
e2bdd35cf0 fixes to device.cc (#11122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11122

these changes add fixes to device.cc that are appropriate to create the intra-device-copies for opencl

Reviewed By: bwasti

Differential Revision: D9553292

fbshipit-source-id: e59f17916b5df30a504adee0718f9cecfe28f35a
2018-08-31 09:25:26 -07:00
f30fd7fb5c Get rid of the runtime type in TensorOptions (#11021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11021

We can now store a boolean saying if we want a Variable or not,
and context can use VariableHooks to get a VariableType if we
request one.

Reviewed By: cpuhrsch

Differential Revision: D9562312

fbshipit-source-id: 84653cd789622764132252406a5ea1a83eee3360
2018-08-31 09:10:52 -07:00
1db5a7d8f0 Move variable getType lookup support to Context
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11017

Reviewed By: cpuhrsch

Differential Revision: D9562197

fbshipit-source-id: dd00c79592d6c59f2e21c9d62fea3a2c093b609b
2018-08-31 09:10:51 -07:00
9fac0a5093 Rename at::getType to at::getNonVariableType (#11096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096

To discourage willy-nilly use, and make it clearer that it
is not a Variable

Reviewed By: cpuhrsch

Differential Revision: D9583699

fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae
2018-08-31 09:10:49 -07:00
0961c923c0 Unbreak the build
Summary: The controller you requested could not be found.

fbshipit-source-id: 861021dbe88f84d1a8bd80e04dd684527384629f
2018-08-31 08:13:12 -07:00
3073051a18 Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam
Differential Revision:
D9554375

Original commit changeset: b88768f470ef

fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1
2018-08-31 07:54:31 -07:00
82aeebb3d9 Fix a bug in addmm fusion in the JIT (#11100)
Summary:
Fixes #10839.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100

Differential Revision: D9585533

Pulled By: apaszke

fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4
2018-08-31 07:24:34 -07:00
0555768e0f Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993

as title

Reviewed By: chocjy

Differential Revision: D9554375

fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422
2018-08-31 00:55:39 -07:00
f1bfe6750f Back out "[caffe2] Update blackbox predictor with new constructor" (#11105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11105

Reverts: D9516972

See this discussion for context: https://fburl.com/w45hb1oc

Reviewed By: highker

Differential Revision: D9587931

fbshipit-source-id: 715247929d819dfa88e1d051021e51c5bf0c4835
2018-08-31 00:55:36 -07:00
9fae8fcdff framework for committed serialized tests (#10594)
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.

To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.

Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)

Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594

Reviewed By: ezyang

Differential Revision: D9370359

Pulled By: ajyu

fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
2018-08-30 22:41:46 -07:00
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
a320e5cbd3 Move static_context outside of class (#11097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11097

att

Reviewed By: ezyang

Differential Revision: D9549702

fbshipit-source-id: 058b942311b00be20a0b557ba97eb3451ea55e33
2018-08-30 22:10:58 -07:00
750ede7215 Rename getType to getVariableTypeFromBaseType / getVariableType (#11095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095

We used getType to mean a lot of things.

- getVariableTypeFromBaseType: given a base Type (non-Variable type)
  compute the Variable Type which corresponds to it.

- getVariableType: like at::getType, but return the Variable type
  rather than the plain type.

This rename makes it clearer at the use-site what things are what,
and will make a subsequent rename of at::getType easier.

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9583630

fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca
2018-08-30 20:11:25 -07:00
c836a04dc8 Delete a bunch of uses of getType in favor of TensorOptions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11087

Reviewed By: cpuhrsch

Differential Revision: D9581560

fbshipit-source-id: ebe3c4c0956da8a7215ada287bf6526dbcb2b07d
2018-08-30 20:11:24 -07:00
34a0604d51 Eliminate use of getType from DLConvertor (#11080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11080

- Add a new TensorOptions(Device, ScalarType) constructor,
  which serves roughly the same role as getType used to.
  We shouldn't get too wild with these constructors, but
  since this particular one was widely used by getType,
  it seems worth adding.
- Change DLPack DeviceType conversion to at::DeviceType,
  rather than at::Backend.  While I'm at, add a few more
  conversions that at::DeviceType understands.
- Add a new overload of from_blob which understands strides.

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9578734

fbshipit-source-id: 28288ec053aae8765e23925ab91023398d632d6b
2018-08-30 20:11:23 -07:00
c283acce72 Rename getTypeRaw to getNonVariableTypeRaw (#11078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11078

```
codemod -d . --extensions cc,cpp,cu,cuh,h getTypeRaw getNonVariableTypeRaw
```

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9578399

fbshipit-source-id: 00a86ae8fb00d14116762ce39d15858da9a1671e
2018-08-30 20:11:21 -07:00
66c4d7e060 Rename getTypeOpt to getNonVariableTypeOpt (#11077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11077

getType now supports retrieving variable types, so make it clearer
when a getType function does NOT give you a variable type.

```
codemod -d . --extensions cc,cpp,cu,cuh,h getTypeOpt getNonVariableTypeOpt
```

Reviewed By: gchanan

Differential Revision: D9578398

fbshipit-source-id: 3ee502ac5c714849917f11ddc71de8eacfdaa9d3
2018-08-30 20:11:20 -07:00
f3c3127c67 Don't flatten output lists in the JIT IR (#10949)
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).

The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949

Reviewed By: zdevito

Differential Revision: D9556823

Pulled By: apaszke

fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
2018-08-30 19:54:39 -07:00
c8c21fa2b4 Allow same flags when glog is used or not (#11034)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/8338

cc mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11034

Reviewed By: mingzhe09088

Differential Revision: D9582801

Pulled By: orionr

fbshipit-source-id: b41ca1bebf6cf62fff2a2b8caf4c94af3e43db00
2018-08-30 19:24:51 -07:00
26409a4300 Caffe2 flags needs to be used after the GlobalInit function is called
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120

Reviewed By: llyfacebook

Differential Revision: D9598430

Pulled By: sf-wind

fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a
2018-08-30 19:10:39 -07:00
a6cb41486d update documentation for observers
Summary:
update to the latest observer usage syntax
add an example of HistogramObservers

Reviewed By: jspark1105

Differential Revision: D6878439

fbshipit-source-id: c9521f2daecfc7f0c17de6a944dce58e568e3dbe
2018-08-30 18:11:48 -07:00
15314c7b8e GCC-7 doesn't like the original syntax. (#10665)
Summary:
Replace with "this->template f<T>()".

Fix #7881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10665

Differential Revision: D9597187

Pulled By: ezyang

fbshipit-source-id: 8af4e7efd98edadabb97e2523a58bd21bc116d1a
2018-08-30 16:41:16 -07:00
684bd1b7bd size_ -> numel_ (#11112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11112

att

Reviewed By: ezyang

Differential Revision: D9474018

fbshipit-source-id: d9267e52e2d50dac7524a456a44f2e28b6c0b693
2018-08-30 16:41:13 -07:00
7ddc6f84c4 NULL -> nullptr (#11047)
Summary:
How did we get so many uses of `NULL` again?

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047

Differential Revision: D9566799

Pulled By: goldsborough

fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
2018-08-30 16:25:42 -07:00
302e9cb815 Update onnx submodule to onnx/onnx@bae6333 (#10961)
Summary:
ONNX v1.3.0 release

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961

Reviewed By: houseroad

Differential Revision: D9543998

Pulled By: bddppq

fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef
2018-08-30 15:25:57 -07:00
56c737a9b7 Inject GetEmptyStringAlreadyInited once for static proto (#11045)
Summary:
I've been seeing a lot of warnings about multiple declarations of this. Hopefully this fixes it.

cc Yangqing mingzhe09088 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11045

Reviewed By: mingzhe09088

Differential Revision: D9582756

Pulled By: orionr

fbshipit-source-id: 6171485609a2f2f357d6e1c44e26b4ecfcdb4ce6
2018-08-30 14:59:54 -07:00
a136d29fd1 Use intrusive_ptr in Storage (#10907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10907

replace shared_ptr with intrusive_ptr in Storage

Reviewed By: ezyang

Differential Revision: D9414388

fbshipit-source-id: d413549ffde24959166d2dff2042b99f0c5018af
2018-08-30 14:59:52 -07:00
f0142faab0 Expose arbitrary cpp autograd functions to Python (#11082)
Summary:
This is needed because the JIT declares some custom autograd functions.

colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11082

Differential Revision: D9580456

Pulled By: apaszke

fbshipit-source-id: 6bf00c1188a20b2ee6ecf60e5a0099f8263ad55a
2018-08-30 14:25:59 -07:00
93bd291e55 Change torch.jit.trace to no longer be a decorator (#11069)
Summary:
This was done because it surprising for a decorator to run a function
rather than wrap it, and not simplify the syntax for tracing modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069

Reviewed By: jamesr66a

Differential Revision: D9583192

Pulled By: zdevito

fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13
2018-08-30 13:56:05 -07:00
ebe9d204fa Add test cases to intrusive_ptr (#11026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11026

ezyang fixed a bug with moving or copying an intrusive_ptr into itself.
This diff adds test cases for it.

Reviewed By: ezyang

Differential Revision: D9563464

fbshipit-source-id: 3a3b3f681124730d2500b276c0135c3bba7875ae
2018-08-30 13:25:33 -07:00
e85f3fccb3 Fix relying on UB in test_data_parallel_nested_output (#11092)
Summary:
We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092

Reviewed By: ezyang

Differential Revision: D9583274

Pulled By: SsnL

fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7
2018-08-30 13:10:25 -07:00
9d4360c060 Creates stream pool (#9938)
Summary:
This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity.

Change notes:

- There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown.
- All stream refcounting has been removed (the pools pattern replaces it).
- Setting a stream now sets it on its device. Streams are associated with a device and the previous
requirement to specify that device was unnecessary.
- There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking.
- Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.)
- Unused THCStream and THCState stream-related functions were removed.
- A new test of pooling behavior was added in stream_test.

fyi: colesbury, apaszke, goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938

Reviewed By: SsnL

Differential Revision: D9569036

Pulled By: ezyang

fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d
2018-08-30 12:40:23 -07:00
23b0c90e71 caffe2: fix gcc8 warnings
Summary:
The warnings are erroneous as far as i can see,
so tweak things to avoid. The (unsigned int) cast is
to avoid passing -1 to a size_t time.  This was triggered
in gcc8's lto build only, giving:

  caffe2/aten/src/TH/generic/THTensor.cpp: In function ‘THFloatTensor_squeeze1d’:
  lto1: error: ‘__builtin_memset’ specified size 18446744073709551608
  exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
  In function ‘newImpl’,
    inlined from ‘operator new’ at common/memory/OperatorOverride.cpp:86:23,
    inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/ext/new_allocator.h:111:0,
    inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/alloc_traits.h:436:0,
    inlined from ‘_M_allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:172:0,
    inlined from ‘_M_default_append’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/vector.tcc:571:0,
    inlined from ‘resize’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:671:0,
    inlined from ‘THTensor_resizeDim’ at caffe2/aten/src/TH/THTensor.hpp:123:0,
    inlined from ‘THFloatTensor_squeeze1d.part.198’ at caffe2/aten/src/TH/generic/THTensor.cpp:429:0,
    inlined from ‘THFloatTensor_squeeze1d’:
  common/memory/OperatorOverride.cpp:86:23: error:
  argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
   void* ptr = malloc(size);

Reviewed By: soumith

Differential Revision: D9568621

fbshipit-source-id: 4569a4be897d669caa3f283f4b84ec829e8d77ad
2018-08-30 11:55:29 -07:00
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
029082e87c Add entry for torch/lib/pythonX.Y in .gitignore (#11083)
Summary:
I've had `torch/lib/python3.6` show up as part of the build for some time now. It's not ignored which means I need to be extra careful about checking in files, or I end up with a thousand of them in my index.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11083

Differential Revision: D9580453

Pulled By: apaszke

fbshipit-source-id: 369e4fe87962696532d111b24f2a4a99b9572bf2
2018-08-30 11:40:25 -07:00
40227671e9 Add strides to caffe2::Tensor (#10826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10826

Add strides, and make sure the strides are consistent with sizes, and is_contiguous, for all the Caffe2 functions.

is_contiguous means strides_[dim-1] = 1 and strides_[i] = strides_[i+1] * max(size_[i+1], 1);

Reviewed By: ezyang

Differential Revision: D9354480

fbshipit-source-id: 3643871b70f1111b7ffdd9fdd9fe9bec82635963
2018-08-30 11:25:58 -07:00
535633bddc Export MPI functions (#11037)
Summary:
Potential fix for https://github.com/caffe2/caffe2/issues/2551#issuecomment-417124872

cc Yangqing mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11037

Reviewed By: mingzhe09088

Differential Revision: D9580937

Pulled By: orionr

fbshipit-source-id: 5e1fbf718728271a5b5af526d8e67cc5b48f0575
2018-08-30 10:42:02 -07:00
e7195431e0 Add benchmarking functionality to the benchmark app (#10976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976

The app can run in XCode with the benchmark metrics collected.
It can also run when building with buck

Reviewed By: llyfacebook

Differential Revision: D9546755

fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f
2018-08-30 09:54:55 -07:00
a8af7fe46a Support import of nn.RNNCellBase in __all__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10992

Differential Revision: D9572005

Pulled By: soumith

fbshipit-source-id: 26b546830b6a25a4f7ba6f825cd888d678233a97
2018-08-30 08:25:21 -07:00
dbc0004f99 Remove use_count() == 1 in Tensor::Extend (#11046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11046

As suggested by jerryzh168, temporary fix for a new constraint that was added D9350686 is to remove this assert. Long term jerryzh168 is going to work out a better way of handling this.

Reviewed By: jerryzh168

Differential Revision: D9566323

fbshipit-source-id: e4630c7cbe0cc68a084974ea7048654811fae01f
2018-08-29 23:55:28 -07:00
23af7deea7 Add has_lapack flag (#11024)
Summary:
Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python.

Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024

Differential Revision: D9564898

Pulled By: SsnL

fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2
2018-08-29 22:41:16 -07:00
ad1670cf54 Kill the dummy TaskOutput when task.get_step() (#11048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739

I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.

But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk

This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".

This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.

TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.

Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.

Reviewed By: mraway

Differential Revision: D9566744

fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af
2018-08-29 20:11:29 -07:00
16b8e0a787 at::StorageImpl: Rename size_ to numel_ and elementSize() to itemsize()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11011

Reviewed By: ezyang

Differential Revision: D9561898

Pulled By: cpuhrsch

fbshipit-source-id: 0cf5cdc3e7acd397f7e2d66097856aaad0581147
2018-08-29 20:11:27 -07:00
394bdcd49a Fix the build of aten tests when FULL_CAFFE2=1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11019

Reviewed By: orionr

Differential Revision: D9562691

Pulled By: houseroad

fbshipit-source-id: 95a8dee580e5f4dc9af3a2e1f68ec6c62a0e4e04
2018-08-29 18:09:54 -07:00
e550eab3e2 Remove MetaNetDef test case in Predictor (#11052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11052

Delete the test case for Predictor with constructing by MetaNetDef since the constructor
actually has been deprecated. The broken PR is for construcing predictor from DB instance.

Reviewed By: highker

Differential Revision: D9566935

fbshipit-source-id: 5511883953a2d3f6eb0a4f1c5518a1bc4b3ffbdc
2018-08-29 17:55:21 -07:00
91ecbf8b1d Remove TensorBase (#11036)
Summary:
Not subclassed except by Tensor. Also requried to align further with
caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11036

Reviewed By: ezyang

Differential Revision: D9565640

Pulled By: cpuhrsch

fbshipit-source-id: ff7203a2c95d3f3956282b4f2d8dda6c2b93f4a6
2018-08-29 17:27:19 -07:00
ae635b16f7 Record tensor factory functions in trace (#10935)
Summary:
Things like torch.zeros now appear in traces rather than constants.

To continue to support our current level of ONNX export, we run
constant prop to turn these back into constants where possible before
export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935

Differential Revision: D9527427

Pulled By: zdevito

fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a
2018-08-29 17:10:24 -07:00
c4e1adf29d Remove THHalf type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11010

Reviewed By: ezyang

Differential Revision: D9561325

Pulled By: li-roy

fbshipit-source-id: 053cf2925ec1fc458db31e92bd31ffd23389f3e8
2018-08-29 16:44:45 -07:00
2cc98d8df7 Adds dim argument to torch.unique (#10423)
Summary:
Initial version of `unique` supporting a `dim` argument.

As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html).

Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`.

To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105).

Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`:
(The values are calculated by taking the average of the times for both dimension)

| Device | PyTorch (return_inverse=False) | Numpy (return_inverse=False) | PyTorch (return_inverse=True) | Numpy (return_inverse=True) |
| --- | --- | --- | --- | --- |
| CPU | ~0.007331s | ~0.022452s | ~0.011139s | ~0.044800s |
| GPU | ~0.006154s | - | ~0.105373s | - |

Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423

Differential Revision: D9517289

Pulled By: soumith

fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5
2018-08-29 16:26:09 -07:00
98d85b1790 Debugging help + test
Summary: When conversion fails, dump more information to help fix up the netdef

Reviewed By: hyuen, yinghai

Differential Revision: D9558667

fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e
2018-08-29 16:26:07 -07:00
ef7fc2a3e1 Remove at::StorageImpl::finalizer_ (#11022)
Summary:
Unused member variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11022

Reviewed By: ezyang

Differential Revision: D9562520

Pulled By: cpuhrsch

fbshipit-source-id: af190b3ba06d33d65fa0fabffb34a0df769f38d0
2018-08-29 16:09:47 -07:00
6b87198245 Devirtualize StorageImpl deconstructor (#11018)
Summary:
Further align at::StorageImpl with caffe2::StorageImpl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11018

Reviewed By: ezyang

Differential Revision: D9562256

Pulled By: cpuhrsch

fbshipit-source-id: d929317f6226a1e2550b78034b723afbae343aaa
2018-08-29 15:39:54 -07:00
d9b74f6540 Make it possible to disable JIT using env variables (#10867)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867

Differential Revision: D9556882

Pulled By: apaszke

fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4
2018-08-29 15:11:05 -07:00
c755616e00 Enable Detectron model inference for CPU and MKL-DNN paths (#10157)
Summary:
1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks.
2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models.
3. Ignore 0-dim tensor in MKL-DNN concat operator.
4. Generate dynamic library of Detectron module for CPU device.

This PR obsoletes #9164.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157

Differential Revision: D9276837

Pulled By: yinghai

fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f
2018-08-29 15:11:01 -07:00
89834dfe64 Add GPU version of HardSigmoid Op to Caffe2 (#10955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955

Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.

Reviewed By: enosair

Differential Revision: D9499353

fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545
2018-08-29 14:55:29 -07:00
22e3b2c9c3 Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step()
Differential Revision:
D9413150

Original commit changeset: 51aaf3201e26

fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a
2018-08-29 14:39:49 -07:00
6a8bc3804a Add flush to logging messages higher than INFO. (#10983)
Summary:
This probably fixes the logging test error that orionr is encountering - haven't tested locally but wanted to send out a PR to kick off CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10983

Reviewed By: ezyang

Differential Revision: D9552607

Pulled By: Yangqing

fbshipit-source-id: 9ac019031ffd9c03972144df04a836e5dcdafe02
2018-08-29 14:39:48 -07:00
0b1de74732 Documentation improvement in caffe2/core/tensor.h (#11006)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11006

Reviewed By: smessmer

Differential Revision: D9558383

Pulled By: ezyang

fbshipit-source-id: 7d36fb69a6e8a7d064da2c8796dc263a9fd4e094
2018-08-29 14:25:38 -07:00
e9eed8edb4 Add doc for Tensor.digamma_? (#11008)
Summary:
follow up for #10967

zou3519 vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008

Differential Revision: D9559889

Pulled By: SsnL

fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99
2018-08-29 14:11:16 -07:00
f687ff5a59 Delete unnecessary includes from TensorImpl.h (#11005)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11005

Reviewed By: smessmer

Differential Revision: D9558300

Pulled By: ezyang

fbshipit-source-id: ebebb3c6d3a1a2f7cc3da9fe9d3c56310ead46e1
2018-08-29 14:11:14 -07:00
b644d5e74a Delete context and get_context from Type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11001

Reviewed By: cpuhrsch

Differential Revision: D9557315

fbshipit-source-id: b9862b8dda49194298bb1a4fbc214d466f3c8350
2018-08-29 13:55:45 -07:00
cd9416317d Minor copy-edit on setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933

Reviewed By: cpuhrsch

Differential Revision: D9526650

fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f
2018-08-29 13:41:04 -07:00
c99a143eea Update blackbox predictor with new constructor (#10920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10920

Update the black box predictor and the related code to use the
constructor with PredictorConfig.

Reviewed By: highker

Differential Revision: D9516972

fbshipit-source-id: fbd7ece934d527e17dc6bcc740b4e67e778afa1d
2018-08-29 13:31:45 -07:00
56539f5fe1 PT1 Distributed Release MileStone No.1 - Completed Distributed Package and CI tests (#10871)
Summary:
The PR includes:
(1) torch.distributed.c10d, which now includes the complete backward compatible frontend API for `torch.distributed`
(2) `env://` init method functionality
(3) Minor change to `test_distributed.py`, which is now a test for `torch.distributed.c10d`.
(4) The old `test_distributed.py' is now moved to `test_distributed_thd`
(5) Miscellaneous bug fixes.
(6) DDP CPU test is removed since c10d doesn't have this support yet, but this is a very easy test after moving DDP CPU's dependency to torch.distributed.c10d.
(7) CI config to test MPI, NCCL, and Gloo backend of c10d

**Now all the distributed test including c10d DDP can pass with the c10d frontend API**

TODO: (in a separate PR)
MPI subgroup support, once this is added, CI group test will be enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10871

Differential Revision: D9554514

Pulled By: teng-li

fbshipit-source-id: fb686ad42258526c8b4372148e82969fac4f42dd
2018-08-29 12:55:57 -07:00
fa7c81c640 nomnigraph - nit - code style update (#10987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10987

some code style update to make it consistent with fb cpp style

Reviewed By: yinghai

Differential Revision: D9550130

fbshipit-source-id: 6aef9878676c08e7d384383c95e7ba8c5c9a1bce
2018-08-29 12:55:55 -07:00
ec519e8a4a Reduce number of elements within test_abs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997

Differential Revision: D9556861

Pulled By: cpuhrsch

fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5
2018-08-29 12:55:54 -07:00
dbce1c840f exposing net_transformer_fun before add grad (#11003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003

Need a interface to re-write the graph after the net is built and after adding gradient ops.

Reviewed By: aazzolini, harouwu

Differential Revision: D9557827

fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37
2018-08-29 12:55:52 -07:00
bed9d41abd Generate Type::registerCPU as we do register_cuda_types. (#10947)
Summary:
The goal here is to separate out the base Type into core; as it was done previously we need all derived Types to be defined when we compile the base Type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10947

Reviewed By: gchanan

Differential Revision: D9540025

Pulled By: ezyang

fbshipit-source-id: 49f0b5acb3c378348ef3a55780abb73e4ae27edd
2018-08-29 12:39:47 -07:00
4e446b85fb Make profiler.build_table() O(n) rather than O(n^2) (#10969)
Summary:
Fixes #10851

Speeds up profiling results dramatically.

For the following script:
```
import torch
import time

ITER = 2000

x = torch.randn(1, 1, requires_grad=True)

with torch.autograd.profiler.profile() as prof:
    y = x
    for i in range(ITER):
        y = 3 * y - 2 * y
    y.backward()

start = time.time()
print("Done running. Preparing prof")
x = str(prof)
print("Done preparing prof results")
end = time.time()
print("Elapsed: {}".format(end - start))
```

I get 7s before / 0.13s after these changes.

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969

Differential Revision: D9556129

Pulled By: zou3519

fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12
2018-08-29 12:25:51 -07:00
396dec0e37 s/spaerse/sparse (#10968)
Summary:
cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968

Differential Revision: D9546746

Pulled By: zou3519

fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500
2018-08-29 12:13:04 -07:00
525548fb64 Move SparseTensorRef to core, change some includes to core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10964

Differential Revision: D9545021

Pulled By: gchanan

fbshipit-source-id: 8ba7e5e3a7bdf24e5aeb4bbc91957c1a6f14d7f0
2018-08-29 11:55:29 -07:00
e0dbb91060 Windows raw string fix (#10998)
Summary:
Breaking this out of https://github.com/pytorch/pytorch/pull/8338

mingzhe09088's fix of the docstrings for Windows builds. Unfortunately some versions of Windows seem to try and parse the `#` inside the string as a pre-processor declaration. We might need to change this to something else later, but want to get this landed first.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10998

Reviewed By: mingzhe09088

Differential Revision: D9557480

Pulled By: orionr

fbshipit-source-id: c6a6237c27b7cf35c81133fd9faefead675a9f59
2018-08-29 11:40:08 -07:00
206d52d0e3 Disable smart_tensor_printer_test without glog (#10999)
Summary:
Breaking out of https://github.com/pytorch/pytorch/pull/8338

This test fails once we start building with `-DUSE_GLOG=OFF` since the non-glog logging case doesn't support flushing or streaming to the right location. For now, we just disable this test in that case.

cc Yangqing mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10999

Reviewed By: mingzhe09088

Differential Revision: D9557488

Pulled By: orionr

fbshipit-source-id: 8b306f210411dfc8ccc404bdccf77ddcd36a4830
2018-08-29 11:10:23 -07:00
562fc7631f Add test cases for ONNX unsqueeze (#10924)
Summary:
PyTorch exporting test and end to end cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10924

Reviewed By: Ac2zoom

Differential Revision: D9548210

Pulled By: houseroad

fbshipit-source-id: 2381d1ad92a4e07f97060eb65c9fd09f60ad3de6
2018-08-29 11:10:21 -07:00
1b0d5e60ab Get rid of some unnecessary includes of Context. (#10951)
Summary:
This is part of splitting Context from what needs to go in ATen/core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10951

Differential Revision: D9540369

Pulled By: gchanan

fbshipit-source-id: 73b0e8c4493785fbab368a989f46137c51f6ea0b
2018-08-29 11:10:20 -07:00
a9469c9c8a Fill eigenvector with zeros if not required (#10645)
Summary:
Fix #10345, which only happens in CUDA case.

* Instead of returning some random buffer, we fill it with zeros.

* update torch.symeig doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645

Reviewed By: soumith

Differential Revision: D9395762

Pulled By: ailzhang

fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba
2018-08-29 10:55:22 -07:00
b41988c71e Cleanup BUILD_DOCS cmake section (#11000)
Summary:
Breaking out of https://github.com/pytorch/pytorch/pull/8338

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000

Differential Revision: D9557474

Pulled By: orionr

fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00
2018-08-29 10:09:52 -07:00
7169906249 torch.digamma (#10967)
Summary:
Fixes #10307

cc SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967

Differential Revision: D9546748

Pulled By: zou3519

fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd
2018-08-29 09:43:19 -07:00
a5d7abedae Enable fusing aten::expand on GT, LT, EQ (#10845)
Summary:
GT, LT, EQ all support numpy broadcasting, just enable the fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845

Reviewed By: bddppq

Differential Revision: D9494089

Pulled By: houseroad

fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e
2018-08-28 23:56:50 -07:00
db0abe1890 Fix bugs in handling of negative slice + gather indices (#10973)
Summary:
This fixes multiple bugs in the handling of negative indices in both slicing and gather operations. These were uncovered by @[1466077526:Elias Ellison]'s diff D9493614, which made it so that we actually emit negative indices when we see them in PyTorch code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10973

Reviewed By: jhcross

Differential Revision: D9546183

Pulled By: jamesr66a

fbshipit-source-id: 6cb0e84e8ad399e47e24a96c44025f644c17b375
2018-08-28 23:40:40 -07:00
6ca28984c7 Kill the dummy TaskOutput when task.get_step() (#10739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739

I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.

But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk

This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".

This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.

TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.

Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.

Reviewed By: mraway

Differential Revision: D9413150

fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac
2018-08-28 20:41:46 -07:00
beeec47041 Sanity checks for tracing (#10841)
Summary:
TODO: integrate into torch.onnx.export -- separate PR

*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:

* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs

*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.

*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.

Examples:

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
    y = torch.arange(0, x.shape[0]).float()
    return x + y.unsqueeze(1)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                              ^
		+   %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                                +++              ^
		    %2 : int = prim::Constant[value=0]()
		    %3 : Dynamic = aten::_cast_Float(%1, %2)
		    %4 : int = prim::Constant[value=1]()
		    %5 : Dynamic = aten::unsqueeze(%3, %4)
		    %6 : int = prim::Constant[value=1]()
		    %7 : Dynamic = aten::add(%0, %5, %6)
		    return (%7);
		  }
	Node diff:
		- %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                            ^
		+ %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                              +++              ^
	Trace source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Check source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
	Source Location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(shapes (3,), (4,) mismatch)
		 x: array([0, 1, 2])
		 y: array([0, 1, 2, 3])

```
==

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    y = x.data
    return x + y
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value=<Tensor>]()
	Source Location:
		dank.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(mismatch 100.0%)
		 x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
		       0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
		      dtype=float32)
		 y: array([0.243178, 0.315964, 0.972041, 0.0215  , 0.927751, 0.457512,
		       0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
		      dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
    for _ in range(x.size(0)):
        x = torch.neg(x)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		    %1 : Dynamic = aten::neg(%0)
		    %2 : Dynamic = aten::neg(%1)
		    %3 : Dynamic = aten::neg(%2)
		+   %4 : Dynamic = aten::neg(%3)
		-   return (%3);
		?            ^
		+   return (%4);
		?            ^
		  }
```

==

```
import torch

def foo(x):
    if not hasattr(foo, 'cache'):
        foo.cache = torch.neg(x)
    return x + foo.cache

traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = aten::neg(%0)
		+   %1 : Dynamic = prim::Constant[value=<Tensor>]()
		    %2 : int = prim::Constant[value=1]()
		    %3 : Dynamic = aten::add(%0, %1, %2)
		    return (%3);
		  }
	Node diff:
		- %1 : Dynamic = aten::neg(%0)
		+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
	Trace source location:
		test.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		test.py(8): <module>
	Check source location:
		test.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		test.py(8): <module>
```

The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    for i in range(3):
        x[i, :] = torch.zeros(4)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
 y: array([0., 0., 0., 0.], dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
    x.view(-1).add_(-x.view(-1))
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
      dtype=float32)
 y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841

Differential Revision: D9499945

Pulled By: jamesr66a

fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
2018-08-28 20:25:26 -07:00
fe15aedacc Store schema in serialized modules and check arguments in function call (#10872)
Summary:
This PR adds argument checking for script method invocation from C++. For this I had to:
1. The schema of a method is currently not serialized in script modules, so we now store the function schema in the `doc_string` field of the ONNX proto. Upon loading of a serialized script module, we parse the schema into the structured C++ form and assign it to the loaded method,
2. Inside `Method::operator()`, we now verify the number and types of arguments.

CC The controller you requested could not be found.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10872

Differential Revision: D9521219

Pulled By: goldsborough

fbshipit-source-id: 5cb3d710af6f500e7579dad176652c9b11a0487d
2018-08-28 20:11:39 -07:00
ba71547e93 Add clip op to IR
Summary: self explanatory

Reviewed By: highker

Differential Revision: D9551065

fbshipit-source-id: 14b3807af5337654c360a23816cffd7dd346bad5
2018-08-28 19:25:02 -07:00
90eb0b6031 Cleanup accidental logging
Summary: cleanup

Reviewed By: duc0

Differential Revision: D9549449

fbshipit-source-id: 9154b36a39936566fc2711a6e7bd33049681d1c8
2018-08-28 18:55:29 -07:00
72a84127b1 Add Workspace methods ws.feed_blob(name, arr) ws.remove_blob(name) (#10929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929

Workspace classes methods were missing on the Python side.

Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation.

Added

- ws.feed_blob(name, arr)

- ws.remove_blob(name)

Reviewed By: mraway

Differential Revision: D9486867

fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763
2018-08-28 17:54:34 -07:00
8e5b8490bf Add relevant code for adding caffe2 pybind extensions registry to rocm (#10975)
Summary:
cfa5dbadfc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10975

Differential Revision: D9546838

Pulled By: bddppq

fbshipit-source-id: 3bd6dc0a4eee582bb92fc33ed27fc40eb3ab1200
2018-08-28 15:40:37 -07:00
4cb968fb77 Default hidden visibility (#10752)
Summary:
Flipping to hidden visibility one more time. Let's see what fails.

cc mingzhe09088 pjh5 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10752

Reviewed By: ezyang

Differential Revision: D9526343

Pulled By: orionr

fbshipit-source-id: c0e9c29270e95e1b2e21c598095f720c199e1e52
2018-08-28 15:25:43 -07:00
92ff070b83 Add CPU version of hard sigmoid operator to caffe2 (#10837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837

Add CPU version of hard sigmoid operator to caffe2. The definition of
this operator can be found here:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid.

Reviewed By: BIT-silence

Differential Revision: D9489536

fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81
2018-08-28 14:55:49 -07:00
efd2aeac9e Set -Wno-stringop-overflow only with GCC >=7 (#10954)
Summary:
`stringop-overflow` is added in GCC 7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954

Differential Revision: D9546084

Pulled By: SsnL

fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff
2018-08-28 14:25:29 -07:00
b3601a0425 nomnigraph - add documentation for new ReplaceSubgraph api to README.md (#10802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10802

add documentation for new ReplaceSubgraph api to README.md

Reviewed By: yinghai

Differential Revision: D9473282

fbshipit-source-id: 144c895564af83cc8727a0370e894c2f0b7eadf5
2018-08-28 12:55:25 -07:00
cfa5dbadfc Add nomnigraph bindings
Summary: Adds basic nomnigraph python bindings for quickly playing with the graphs.

Reviewed By: duc0

Differential Revision: D9441936

fbshipit-source-id: fd70f8ea279b28c766e40f124008800acd94bddd
2018-08-28 12:40:16 -07:00
a88463cd9a Working async version of AllGather, test fix and compiler warnings, and CI (#10932)
Summary:
The previous NCCL all gather doesn't work as expected. This is a fully working async version.  Tested on both C++ and Python Frontend.

Multi-node:
```
tengli@learnfair042:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=0 WORLD_SIZE=2 ./ProcessGroupNCCLTest
Multi-node world size: 2 rank: 0
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful

tengli@learnfair117:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=1 WORLD_SIZE=2 ./ProcessGroupNCCLTest
Multi-node world size: 2 rank: 1
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful
```

CI test:
```
test_set_get (__main__.FileStoreTest) ... ok
test_set_get (__main__.PrefixFileStoreTest) ... ok
test_set_get (__main__.PrefixTCPStoreTest) ... ok
test_allreduce_ops (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_ops (__main__.ProcessGroupGlooTest) ... ok
test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok
test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok
test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_common_errors (__main__.RendezvousFileTest) ... ok
test_nominal (__main__.RendezvousFileTest) ... ok
test_common_errors (__main__.RendezvousTCPTest) ... ok
test_nominal (__main__.RendezvousTCPTest) ... ok
test_unknown_handler (__main__.RendezvousTest) ... ok
test_set_get (__main__.TCPStoreTest) ... ok
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10932

Differential Revision: D9542067

Pulled By: teng-li

fbshipit-source-id: 25513eddcc3119fd736875d69dfb631b10f4ac86
2018-08-28 12:40:14 -07:00
579bc43a14 Future-proofing embedding.py against heuristic changes (#10959)
Summary:
- rebase of https://github.com/pytorch/pytorch/pull/9851
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10959

Differential Revision: D9542292

Pulled By: weiyangfb

fbshipit-source-id: ce51864d203c8ed89da3817f1da020a0ee932960
2018-08-28 12:40:12 -07:00
3b891d9d49 Support direct access of nn.RNNCellBase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10944

Differential Revision: D9541085

Pulled By: soumith

fbshipit-source-id: 59077f3b226d04c68a93cd6864894e8f6c594aba
2018-08-28 12:25:12 -07:00
5c58cda8ca Add subname to console output for assertExpected (#10559)
Summary:
Running `--accept` on a test doesn't tell you explicitly which sub-test is being updated, this PR fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10559

Differential Revision: D9353977

Pulled By: driazati

fbshipit-source-id: a9d4014386ff0fe388a092f3dcf50f157e460f04
2018-08-28 12:13:03 -07:00
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
5ed62ea6fa Add Upsample example for torch onnx exporting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550

Reviewed By: orionr

Differential Revision: D9541932

Pulled By: houseroad

fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26
2018-08-28 11:39:55 -07:00
22c9bc3117 Resolve builtins using a dict rather than by name (#10927)
Summary:
Changes the approach for resolving builtin ops so that the following works

```
add = torch.add
script
def foo(x):
  return add(x, x)
```

This handles cases when people alias torch and torch.nn.functional to
shorter names.

This works by building a table of id -> builtin name for the know builtin
ops in torch, torch.nn.functional, and for any user-defined
op created by accessing in torch.ops.foo.bar

This allows us to clean up many SugaredValue types in the compiler.

Notes:
* we now consider any attributes on python modules to be constants
(e.g. math.pi, and torch.double).
* fixes a bug where we incorrectly allowed attribute lookup on arbitrary
pyton objects. It is now restricted to modules only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927

Differential Revision: D9527522

Pulled By: zdevito

fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660
2018-08-28 11:25:11 -07:00
c9d337f436 Split IsEmptyOp (#10918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10918

att

Differential Revision: D9515040

fbshipit-source-id: 53c05c160ba5dda92104aadc2e40801519a2cd28
2018-08-28 10:52:28 -07:00
7de830b879 proper sharing in ShareExternalPointer (#10804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804

Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors.
When we need to modify the field of storage, we'll create a new storage instead.

Reviewed By: ezyang

Differential Revision: D9350686

fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca
2018-08-28 10:52:26 -07:00
7f9fd1cc26 allow RandomSampler to sample with replacement (#9911)
Summary:
fixes #7908
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911

Reviewed By: yf225

Differential Revision: D9023223

Pulled By: weiyangfb

fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7
2018-08-28 10:52:25 -07:00
504d705d0f Support for CUDNN_HOME/CUDNN_PATH in C++ extensions (#10922)
Summary:
Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder.

This means cudnn extensions now also get support on the FAIR cluster.

soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922

Differential Revision: D9526856

Pulled By: goldsborough

fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db
2018-08-28 09:40:29 -07:00
1421a9d704 added num_directions explanation to docstrings (#10786)
Summary:
Resolving [https://github.com/pytorch/pytorch/issues/10741](https://github.com/pytorch/pytorch/issues/10741). The current docs use `num_directions` quite a bit, without any explanation for them. `num_directions` is set to 2 if the RNN is bidirectional, or 1 otherwise. This change simply adds that to the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10786

Differential Revision: D9480235

Pulled By: zou3519

fbshipit-source-id: f61d1b0d2b943f84d5b7ff83df6fe0965a508a5e
2018-08-28 09:26:06 -07:00
bee779bc83 StorageImpl scalar_type_ to data_type_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10915

Reviewed By: ezyang

Differential Revision: D9526416

Pulled By: cpuhrsch

fbshipit-source-id: 68e43121d72b1b951c73df5bf7b598854fb0e291
2018-08-28 09:26:04 -07:00
82bb9fbedd Remove Scalar.local(). (#10917)
Summary:
It's a no-op now that Scalars don't store tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10917

Differential Revision: D9520267

Pulled By: gchanan

fbshipit-source-id: 5388ff9a4fbb8fc9b9e1ce92208246bf6f08eb92
2018-08-28 07:41:36 -07:00
7c7a2ccb58 Update onnx.rst for v0.4 (#10810)
Summary:
Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810

Differential Revision: D9500960

Pulled By: zou3519

fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f
2018-08-28 07:26:01 -07:00
de099564e3 Minor copy-edit on README
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10931

Reviewed By: cpuhrsch

Differential Revision: D9526248

fbshipit-source-id: 2401a0c1cd8c5e680c6d2b885298fa067d08f2c3
2018-08-27 21:09:36 -07:00
de9cc98e66 Stop copying tensor memory when importing IR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10487

Differential Revision: D9370084

Pulled By: li-roy

fbshipit-source-id: ecff1d5d7d006fd60e4f6238ee86c56ad168bfc8
2018-08-27 19:25:42 -07:00
2c342e50e1 Fix a bug in constant prop (#10923)
Summary:
More support for tuples has uncovered a bug in constant prop where
it assumed it can create constant nodes of tuples, even though we
cannot easily create a single prim::Constant to represent a tuples.
This fix checks when we cannot represent an IValue as a prim::Constant
and then stops propagating the node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10923

Reviewed By: orionr

Differential Revision: D9523417

Pulled By: zdevito

fbshipit-source-id: 745058c4388d9a5e0fc1553eaa2731e31bc03205
2018-08-27 18:10:17 -07:00
157fb46ffc Add -rdynamic only to linker flags to avoid compiler warnings (#10789)
Summary:
`clang: warning: argument unused during compilation: '-rdynamic'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10789

Reviewed By: houseroad

Differential Revision: D9467385

Pulled By: bddppq

fbshipit-source-id: 610550a8f34cfa66b9dfa183752eb129dae21eaa
2018-08-27 17:56:21 -07:00
f7b02b3a68 Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824

API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}

API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
    - Most methods accessed this way are now accessible on Tensor
      maybe_zero_dim() and set_wrapped_number() being prominent exceptions
      (they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
  TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
  we go through getIntrusivePtr().  I'm not sure about
  this change; I happened to have accidentally removed the
  TensorBase constructor and decided to fix call sites,
  but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
  (The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
  to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
  no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
  it is primarily an API consideration. We can be more standards compliant by
  keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
  pointers of the NullType. (This counts as a bug fix, because the documentation
  specified that pointers produced by release() are valid to reclaim(), and
  a release() on a null intrusive_ptr produces the NullType::singleton())

Bug fixes:
- Dispatch code for mutable references incorrectly returned
  a reference to a value argument (which would immediately
  go out of scope).  They now correctly return a tensor by
  value.
- intrusive_ptr copy/move assignment did not work correctly when
  an object was assigned to itself. We now check for this case and
  no-op if so. (This bug manifested itself as a Tensor mysteriously
  becoming an UndefinedTensor after lines of code like
  'x = x.mul_(y)')

Other changes:
- The checked cast functions in Utils.h have now been
  renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
  have been expunged. This is NO LONGER a valid way to create a new
  tensor, and if you do this, upon your first incref, you will catch an ASSERT
  failure saying that only tensors created by intrusive_ptr::release() are valid
  to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
  other sub-classes of Retainable were modified to use intrusive_ptr.
  When doing this, I had to make the constructors of sub-classes like
  ConstantList public, so that c10::make_intrusive could invoke them.  Fortunately,
  if you incorrectly stack allocate a ConstantList, and then try to get an
  intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
  considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
  which is not always true. This was always the case, but there's now a comment
  explaining what's going on.

Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.

Reviewed By: gchanan

Differential Revision: D9481140

fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
2018-08-27 16:11:01 -07:00
f2bb9f0bb5 speed up kl div loss (#10336)
Summary:
Moved kl div loss to aten.

benchmarks for 5000 iterations on input size (1000,100)

New
```
cuda:
forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006]
input requires_grad=True:
backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155]
double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918]
target requires_grad=True:
backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101]
double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982]

cpu:
forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648]
input requires_grad=True:
backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492]
double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031]
target requires_grad=True:
backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143]
double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685]
```

Old
```
cuda:
forward [3.101281268056482, 3.068499860819429, 3.0527669726870954]
input requires_grad=True:
backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839]
double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133]
target requires_grad=True:
backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282]
double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176]

cpu:
forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352]
input requires_grad=True:
backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552]
double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931]
target requires_grad=True:
backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046]
double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336

Differential Revision: D9213636

Pulled By: li-roy

fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b
2018-08-27 16:10:59 -07:00
f5910c8a36 Add MIOPEN recurrent operator (#10840)
Summary:
The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test.
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840

Differential Revision: D9518980

Pulled By: bddppq

fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f
2018-08-27 15:39:56 -07:00
8e33451e2e Make torch.cuda.* take device objects; Update distributed docs (#10833)
Summary:
Commits:

1. Make `torch.cuda.*` take device objects
2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833

Differential Revision: D9514241

Pulled By: SsnL

fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e
2018-08-27 15:24:42 -07:00
58b145f515 Fix negative indices in tracer (#10560)
Summary:
Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560

Differential Revision: D9493614

Pulled By: eellison

fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c
2018-08-27 15:19:41 -07:00
9aa92bc261 Change the default value of DeviceOption.numa_node_id from -1 to 0 (#10877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10877

change default value of DeviceOption.numa_node_id to 0 and use has_numa_node_id() to check existence

Reviewed By: ilia-cher

Differential Revision: D9473891

fbshipit-source-id: 91ac6a152f445644691023110c93d20a3ce80d43
2018-08-27 14:55:46 -07:00
7842b6d0f7 Fix at::optional compile problems on Windows CUDA.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10909

Differential Revision: D9516837

Pulled By: gchanan

fbshipit-source-id: fad7e3284e74c599b873ebaae2dcdf5013505855
2018-08-27 14:40:41 -07:00
6ce799edd6 Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812)
Summary:
* Fix the necessary pathways so that tuples and lists can be inputs to the script.

* prevent linear algebra functions from being run in shape prop because
they frequently will error out for nonsense data.

* favor schema-driven python input conversion where possible.
remaining cases where we directly create Stacks without schema are
only for debugging

* Make the error messages when calling script/trace functions more pythonic

* Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812

Differential Revision: D9477982

Pulled By: zdevito

fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c
2018-08-27 14:40:40 -07:00
f64f6eed3a move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859

Reviewed By: newstzpz

Differential Revision: D9498312

fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304
2018-08-27 12:56:46 -07:00
35beecfe17 fix xfails involving literals (#10905)
Summary:
I missed these in #10900

cc apaszke jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905

Differential Revision: D9516748

Pulled By: zou3519

fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3
2018-08-27 12:41:06 -07:00
f940af6293 Bag of Distributions doc fixes (#10894)
Summary:
- Added `__repr__` for Constraints and Transforms.
- Arguments passed to the constructor are now rendered with :attr:

Closes https://github.com/pytorch/pytorch/issues/10884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10894

Differential Revision: D9514161

Pulled By: apaszke

fbshipit-source-id: 4abf60335d876449f2b6477eb9655afed9d5b80b
2018-08-27 09:55:27 -07:00
67f6f930a8 Remove FIXME_zerol() from test_jit.py (#10900)
Summary:
The scalar situation has gotten a lot better and now we can
remove all instances of FIXME_zerol().

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900

Differential Revision: D9514206

Pulled By: zou3519

fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce
2018-08-27 08:55:08 -07:00
841d779598 Increase BC for PackedSequence ctor (#9864)
Summary:
PackedSequence is never supposed to be created by user, but unfortunately some community repo is already doing this (e.g., [here](7c191048ce/torchmoji/model_def.py (L218-L229))). Some change we made break the calling pattern `PackedSequence(data=x, batch_sizes=y)`. This patch adds back support for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9864

Differential Revision: D9011739

Pulled By: SsnL

fbshipit-source-id: 0e2012655d7f4863ec54803550df30874ec35d75
2018-08-27 08:25:23 -07:00
c3271b53e4 Remove ability of Scalars to hold Tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10889

Differential Revision: D9512589

Pulled By: gchanan

fbshipit-source-id: 8b2b26c9f3a4da31a46f684793ab237e9ef9a323
2018-08-27 07:26:14 -07:00
3aaad3ecb1 Begin a bestiary of MSVC/NVCC bugs. (#10883)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10883

Differential Revision: D9513997

Pulled By: ezyang

fbshipit-source-id: 37db956e57d86471323d284869bb844f5a4753ac
2018-08-27 07:09:47 -07:00
c8b246abf3 Prevent JIT from overspecializing to every single size configuration (#10844)
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.

Summary of changes:

- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.

zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844

Differential Revision: D9498705

Pulled By: apaszke

fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
2018-08-26 09:54:48 -07:00
9679fc5fcd Handling failing test on ROCm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10854

Reviewed By: ezyang

Differential Revision: D9498721

Pulled By: Jorghi12

fbshipit-source-id: 4018383fea5a2a6baff7183b0c0197a4b7a09f20
2018-08-26 07:55:33 -07:00
ddc37d7487 Update mobile predictor caller's interface
Summary: Update all the caller for the new interface

Reviewed By: highker

Differential Revision: D9323167

fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308
2018-08-24 23:40:05 -07:00
d632ccd2c1 Cache isContiguous and numel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10696

Differential Revision: D9437963

Pulled By: cpuhrsch

fbshipit-source-id: 7217682f5e4b69c73d943411d738e4892bb465f5
2018-08-24 22:40:39 -07:00
17dac3e17f Create class constant for string literal 'blob_names'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10827

Reviewed By: boryiingsu

Differential Revision: D9484567

fbshipit-source-id: 275eddc9406b5f427d72c0ab9b0da481b5e59ece
2018-08-24 22:11:43 -07:00
8253cfaa72 Conv BN fusion for 3D conv (#10239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10239

Make Conv + BN fusion also work for 3D convolutions

Reviewed By: duc0

Differential Revision: D9176314

fbshipit-source-id: 6604aa569c5c3afdb4480a5810890bc617e449c4
2018-08-24 21:24:36 -07:00
542aadd9a7 Stop using symbolic override for tracing RNNs (#10638)
Summary:
This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638

Differential Revision: D9385830

Pulled By: apaszke

fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b
2018-08-24 20:25:58 -07:00
f2f6e6c0e8 Add registry to pybind_state (#10759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10759

Adding a basic registry pattern to pybindstate so that we can have separate 'cc' files register module updates.  This is substantially cleaner than using multiple pybind modules (which have been known to cause bugs)

Reviewed By: bddppq

Differential Revision: D9441878

fbshipit-source-id: af9e9e98385e92b58ca50e935678328c62684d8e
2018-08-24 17:25:02 -07:00
c172ffb632 Remove the nanopb submodule
Summary:
After making changes internally, really remove the nanopb submodule.

Finalizes https://github.com/pytorch/pytorch/pull/10772

Reviewed By: yns88

Differential Revision: D9504582

fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b
2018-08-24 16:24:57 -07:00
148ea2a653 Create at::linear (#10799)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX

ezyang jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799

Differential Revision: D9482168

Pulled By: goldsborough

fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc
2018-08-24 16:02:08 -07:00
1fbabff76a Refactor THCNumerics and add common math functions for at::Half (#10301)
Summary:
**Summary**: This PR is a followup of mruberry's https://github.com/pytorch/pytorch/pull/9318/. It tries to achieve the following:
- Specializing std common math functions for `at::Half` type.
- Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`.
- Update `THCNumerics.cuh` with new usage and comments to  demonstrate the best practice for developers and hence, making way for its deprecation.
- Remove legacy/redundant code path.
- Remove unused CUDA HALF macros (see separate PR https://github.com/pytorch/pytorch/pull/10147)

**Comments**: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed:
- All arithmetic can now be done in ATen using binary cuda kernel  or CUDA tensor pointwise apply (check https://github.com/pytorch/pytorch/pull/8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float.
- Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h`
- Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call.
- Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for
`at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: https://github.com/ROCm-Developer-Tools/HIP/issues/374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP).

Here are some reference PRs that was handy in refactoring TH into ATen:
- https://github.com/pytorch/pytorch/pull/6786
- https://github.com/pytorch/pytorch/pull/5475
- https://github.com/pytorch/pytorch/pull/9401
- https://github.com/pytorch/pytorch/pull/8689
- https://github.com/pytorch/pytorch/pull/8919
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10301

Differential Revision: D9204758

Pulled By: soumith

fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a
2018-08-24 16:02:06 -07:00
87a7840fa6 Remove Tensor constructor of Scalar. (#10852)
Summary:
This is along the way of removing Tensor as a member of the tagged union in Scalar.  This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase).  Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway.

I'm undecided what the final API should be here.  We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best.

For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852

Reviewed By: ezyang

Differential Revision: D9496766

Pulled By: gchanan

fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db
2018-08-24 16:02:05 -07:00
0d5584d8d7 Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply.
Differential Revision:
D9492561

Original commit changeset: d0f0e2ab7180

fbshipit-source-id: fc822e63b11866195ff7883f360338a41e25d9e2
2018-08-24 16:02:04 -07:00
0ef5cfd28c fix ivalue printing for lists (#10777)
Summary:
Fixing the printing of IValue lists, which didn't work previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777

Differential Revision: D9474264

Pulled By: eellison

fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f
2018-08-24 16:02:03 -07:00
983e0f2413 Remove Node::invalidateSchema (#10822)
Summary:
The schema_ field is a private and internal cache for nodes, and no
methods meant to manipulate it should be publicly visible. This call
wasn't even necessary at its call site, since removeInput will reset the
schema by itself.

zdevito jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10822

Reviewed By: zdevito

Differential Revision: D9498683

Pulled By: apaszke

fbshipit-source-id: 42e1743e3737cb7d81f88e556204487d328c0e47
2018-08-24 16:02:01 -07:00
74e6a666b3 If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180)
Summary:
When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180

Differential Revision: D9438153

Pulled By: eellison

fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5
2018-08-24 16:02:00 -07:00
474684cf03 Re-sync with internal repository (#10868) 2018-08-24 15:48:03 -07:00
8044dc4eb8 Support new Reshape semantics (#10848)
Summary:
Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models.

This change also flushes out some of the bugs and new requirement.
- Converting shape info into int64 tensor
- Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848

Reviewed By: houseroad

Differential Revision: D9495121

Pulled By: yinghai

fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a
2018-08-24 11:46:41 -07:00
8130b1a950 Ignore stack frames coming from python3 object file (#10627)
Summary:
goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10627

Reviewed By: ezyang

Differential Revision: D9384411

Pulled By: apaszke

fbshipit-source-id: ce4f6edb9ffbd0c7e320b9347da10399de472150
2018-08-24 11:26:21 -07:00
6e2f6dc6e6 Move Allocator and Device to ATen/core
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10798

Reviewed By: ezyang

Differential Revision: D9466602

fbshipit-source-id: f5bda17045076d8c81be9fa5a0749c97bf274b5f
2018-08-24 11:26:19 -07:00
f1df85d799 bug-fix in normal_( ) (#10846)
Summary:
- fixes #10642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846

Differential Revision: D9495014

Pulled By: weiyangfb

fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae
2018-08-24 11:26:18 -07:00
313139d14e Moving the operator argument to the front for kernelPointwiseApply. (#10829)
Summary:
Currently on PyTorch AMD, memory accesses on the TensorInfo struct contained in the Operators passed into the kernelPointwiseApply kernel leads to hangs on the HCC runtime. Permuting the argument order such that the operator is first alleviates this issue and the kernel hangs disappear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10829

Reviewed By: ezyang

Differential Revision: D9492561

Pulled By: Jorghi12

fbshipit-source-id: d0f0e2ab7180e55846db909f2744b8c8b110205e
2018-08-24 11:10:43 -07:00
e3d12d7afb Automatic update of fbcode/onnx to 6146a85d371481222c10ede4430ad5476e60de87 (#10831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10831

Previous import was 7848f1e0414ba3b2e263609d93d46fd60790b2e9

Included changes:
- **[6146a85](https://github.com/onnx/onnx/commit/6146a85)**: Check pybind version (#1315) <Changming Sun>
- **[2cbf740](https://github.com/onnx/onnx/commit/2cbf740)**: Domain exists in GraphProto but not in Node (#1310) <Ryan Hill>
- **[9b874e9](https://github.com/onnx/onnx/commit/9b874e9)**: [Title] Add optimization pass eliminating nop Pad (#1307) <Tingfan Wu>

Reviewed By: yinghai

Differential Revision: D9485475

fbshipit-source-id: 3adb4e6e182278fd2abe5068a9d4569763e0ff0c
2018-08-24 10:54:40 -07:00
3c9775fff8 Remove nanopb since we've switched to protobuf (#10772)
Summary:
We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default.

cc zdevito pjh5 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772

Reviewed By: pjh5

Differential Revision: D9465894

Pulled By: orionr

fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201
2018-08-24 10:54:38 -07:00
8c13971f57 Remove protobuf require and use requirements.txt (#10771)
Summary:
In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771

Reviewed By: pjh5

Differential Revision: D9474458

Pulled By: orionr

fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492
2018-08-24 10:39:40 -07:00
474bd60bad Provide a tensor overload to mul_out_sparse_scalar. (#10828)
Summary:
This is a small part of the effort to remove Tensor as a tagged member in Scalar because it is inconsistent with how we normally do overloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10828

Differential Revision: D9485049

Pulled By: gchanan

fbshipit-source-id: 103f5cc03bb7775cd2d3a0a5c0c5924838055f03
2018-08-24 09:39:26 -07:00
e146518e46 Fix AT_CUDA_CHECK and AT_CUDNN_CHECK macros (#10834)
Summary:
Previously, the macros evaluated the expression multiple times on error.

For example:

```
AT_CUDA_CHECK(cudaStreamWaitEvent(ptr->stream, event, 0));
```

would previously expand to

```
if (cudaStreamWaitEvent(ptr->stream, event, 0) != cudaSuccess) {
    AT_ERROR("CUDA error: ", cudaGetErrorString(cudaStreamWaitEvent(ptr->stream, event, 0)));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10834

Differential Revision: D9493257

Pulled By: colesbury

fbshipit-source-id: d2473020fd83a25aa421171d19c8dfe559155a9b
2018-08-24 09:09:18 -07:00
ca567862b2 Support multidimensional indexing (#10787)
Summary:
Part of #10774.

This PR does the following:
- Support ast.ExtSlice in the frontend. This is done by returning a
  list of ast.Index and ast.Slice.
- Support multidimensional indexing with ints and slices

The general approach is to desugar multidimensional indexing into
at::slice, at::select operations. This is exactly how normal pytorch
does indexing (by desugaring it into at::slice, at::select, and other ops).

I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference.
We should be able to copy the rest of this to implement the missing
indexing features in script (indexing with ellipses, tensors, sequences, etc).

After I'm done implementing the missing indexing features in future prs, I can try to
templatize python_variable_indexing.cpp so that it can work with both JIT
script and normal pytorch indexing, but right now I'm not sure if that's
a good idea or not.

cc zdevito jamesr66a apaszke wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787

Differential Revision: D9481402

Pulled By: zou3519

fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837
2018-08-24 08:10:32 -07:00
6993e4a9f7 Caffe2 Functional enforcing inplace output (#10797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10797

A few operators enforces in-place output (e.g., running mean/var for SpatialBN). Functional right now doesn't follow the inplace_enforced_ rules in OpSchema and therefore, the RunNetOnce() will fail on OpSchema->Verify(). Edit the output_names in Functional following the rules to pass check.

Reviewed By: jerryzh168

Differential Revision: D9470582

fbshipit-source-id: 168efeccecc32184bd1d02f3fefe8e61faa4e0f4
2018-08-23 22:42:47 -07:00
8da4167129 Fix performance regression (#10835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10835

The last diff of constructor cause performance regression in cold run.
This one tried to fix this.

Reviewed By: highker

Differential Revision: D9489617

fbshipit-source-id: a77c2e2c903a73e2ad9806b4f9c209cdb751442f
2018-08-23 19:55:23 -07:00
df2d48b42c Added PrefixStore, pybind, test for group backward compatibility (#10762)
Summary:
Added Prefix Store support.

This will make group be backward compatible.

Test is covered too.
```
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./FileStoreTest
Using temporary file: /tmp/testoglRl4
Using temporary file: /tmp/testepZIpB
Test succeeded
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./TCPStoreTest
Test succeeded
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10762

Differential Revision: D9484032

Pulled By: teng-li

fbshipit-source-id: 85754af91fe3f5605087c4a2f79ae930a9fd1387
2018-08-23 18:10:37 -07:00
61b34d42e7 nomnigraph - isSubgraphMatch returns the matched Subgraph & map from MatchNodes to graph nodes (#10605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10605

Make isSubgraphMatch returns a subgraph and map from MatchNodes to graph nodes in the result, which makes it easier to write graph fusion logic. Also include some more helper methods for NN subgraph matcher.

Reviewed By: bwasti

Differential Revision: D9374931

fbshipit-source-id: 3a273295eec81a43027ec3a9e835d27f00853df9
2018-08-23 16:40:19 -07:00
ee022a476a Added this-consts to all methods on SymbolicVariable (#10805)
Summary:
Self explanatory. See https://github.com/pytorch/pytorch/issues/9109 or T32954812 for more details
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10805

Reviewed By: ezyang

Differential Revision: D9477686

Pulled By: hakobyant

fbshipit-source-id: 73dd84e5295e4c749bd6416ce2f6eb7590f05cbc
2018-08-23 16:25:27 -07:00
9403e0cac0 Use ATen implementation of RNNs (#10761)
Summary:
apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke!

I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not.

apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761

Differential Revision: D9443885

Pulled By: goldsborough

fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8
2018-08-23 16:12:14 -07:00
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
3d43a82440 Add support for vararg style functions. (#10250)
Summary:
Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments.

I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification.

closes #10632
closes #8516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250

Differential Revision: D9478235

Pulled By: zdevito

fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c
2018-08-23 15:10:36 -07:00
9dbcc9cebd Move _raw_* intrusive pointer manipulations to raw_intrusive_ptr_target (#10779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10779

The idea is to let classes opt-in to providing these methods
by default.

Reviewed By: jerryzh168

Differential Revision: D9466076

fbshipit-source-id: b6beee084cc71d53ce446cdc171d798eeb48dc12
2018-08-23 14:32:24 -07:00
dec3ed7b49 Increase the limit for Proto size (#10745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745

ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/

Differential Revision: D9436368

fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b
2018-08-23 13:55:50 -07:00
432b3adffc Print blob sizes on fatal signal (#10766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766

Added a `Workspace::ForEach(...)` API for accessing the global set of
existing Workspace instances. This is used in the signal handler to print blob
info on the thread receiving a fatal signal.

Reviewed By: mraway

Differential Revision: D9147768

fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4
2018-08-23 13:39:55 -07:00
82ddeb7f2b Using shared implementation in Tensor (#10619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10619
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9047

Reviewed By: jerryzh168

Differential Revision: D8417101

fbshipit-source-id: 98e0a3275864283c2f06d28f4c9b859b5827ed4d
2018-08-23 13:39:53 -07:00
23a366be33 Use ATen native functions for THCTensor_cadd/cmul/cdiv/csub (#10707)
Summary:
This seems to save a few percent in binary size in libcaffe2_gpu.so, but
the effect may not be real. In fact, deleting some functions can cause
the binary size to increase (perhaps due to alignment issues).

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10707

Differential Revision: D9409009

Pulled By: colesbury

fbshipit-source-id: 282931e562e84e316a33ac6da4788c04c2984f08
2018-08-23 13:31:03 -07:00
0f5c8edfd3 Removes unused THCState code paths (#9735)
Summary:
To prepare THCState for refactoring into ATen, this PR removes unused THCState code paths. In particular, it:

- Removes the UVA Allocator
- Removes the THDefaultDeviceAllocator
- Respects the 1 BLAS and 1 sparse handle per device reality
- Removes kernel p2p access
- Removes setting p2p access
- Removes the GCHandler code path
- Removes many unused THCState_... functions
- Removes THCThreadLocal.h/.cpp

It does not change the preexisting external behavior of any remaining function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9735

Differential Revision: D9438558

Pulled By: SsnL

fbshipit-source-id: dde9acbec237a18bb6b75683e0526f7ff1c9a6ea
2018-08-23 13:10:05 -07:00
ab9e7ae23e Add CUDA implementation of LARS --caffe2 (#10509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10509

This diff enables CUDA implementation of LARS operator in caffe2.

Reviewed By: enosair

Differential Revision: D9318356

fbshipit-source-id: 365b9f01e3afd4d9d3ba49155e72e728119f40c5
2018-08-23 12:55:57 -07:00
b14f2e899c Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279)
Summary:
When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times:

```
_sparseDims + _denseDims = len(shape)
_indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)
_values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])
```

This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled.

Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279

Differential Revision: D8936683

Pulled By: yf225

fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e
2018-08-23 10:10:24 -07:00
0eb2c83006 Fix link in THNN/README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10821

Differential Revision: D9481118

Pulled By: soumith

fbshipit-source-id: 0a416202eb4db025ec7d395e70344cbbf626fec0
2018-08-23 09:25:16 -07:00
fcfb1c1979 Make more distributions jittable
Summary:
This uses zou3519's new `torch.broadcast_tensors()` #10075 to make `Categorical.log_prob()` and the `*Normal.__init__()` methods jittable. Previously `.log_prob()` was failing due to calls to `torch._C.infer_size()` with errors like
```
    def log_prob(self, value):
        if self._validate_args:
            self._validate_sample(value)
>       value_shape = torch._C._infer_size(value.size(), self.batch_shape) if self.batch_shape else value.size()
E       RuntimeError: expected int at position 0, but got: Tensor
```
After this change I'm able to jit many more of Pyro's tests.

Reviewed By: ezyang

Differential Revision: D9477487

Pulled By: apaszke

fbshipit-source-id: 5f39b29c6b8fa606ad30b02fefe2dfb618e883d6
2018-08-23 08:09:49 -07:00
529fc68df2 Update docs with clean (#10819)
Summary:
Add tip about cleaning if installing ninja after a build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10819

Reviewed By: soumith

Differential Revision: D9480095

Pulled By: erikbrinkman

fbshipit-source-id: 96ae1387038afe6964a1bd1e2186468f6a5ea12f
2018-08-23 07:25:19 -07:00
deda05e59f Revert D9395814: move HeatmapMaxKeypointOp unittest to oss
Differential Revision:
D9395814

Original commit changeset: 25073eb6b143

fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb
2018-08-23 06:54:29 -07:00
b885dea300 parallize the dense part in event models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10768

Reviewed By: Wakeupbuddy

Differential Revision: D9445750

fbshipit-source-id: b8c2ddfe3ccb9278506de15a5e43bada016408f7
2018-08-22 22:40:07 -07:00
5c0eece2fd Force types on values returned from if blocks to be equivalent (#10281)
Summary:
When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281

Differential Revision: D9466566

Pulled By: eellison

fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c
2018-08-22 19:55:38 -07:00
9a43fc5eaa move HeatmapMaxKeypointOp unittest to oss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674

Reviewed By: newstzpz

Differential Revision: D9395814

fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9
2018-08-22 19:11:10 -07:00
4aa5075cae update the constructor to accept the PredictorConfg only to set up the predictor (#9483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9483

The interface is updated to accept the config to construct the predictor.

Reviewed By: highker

Differential Revision: D8872999

fbshipit-source-id: 3ca54d644970823fc33c0ade9a005e12f52e2b24
2018-08-22 19:11:09 -07:00
f0ec3bfa56 Changes for Python3 compatibility (#10524)
Summary:
Review by tomdz volkhin anshulverma
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10524

Reviewed By: ezyang

Differential Revision: D9328001

Pulled By: huitseeker

fbshipit-source-id: 144721c4fd9a1ea6cf6673793416f20cb448aa93
2018-08-22 18:55:01 -07:00
44b47fd7f3 Working pybind version of MPI process group and abort() pybind (#10606)
Summary:
This will make pybind version of MPI PG work. The issue is the scope of the tensor list won't be available for the MPI worker thread. So we pass the vector by value instead.

Also added recv_anysource pybind to make it work. The front-end API will wrap one level up with an int for this function. So taking a tensor should be the easiest way for now.

Also added abort pybind and fixed the flaky test.
```
tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ mpirun -np 8 ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10606

Differential Revision: D9474393

Pulled By: teng-li

fbshipit-source-id: cca236c333656431e87d0d3573eeae9232c598b0
2018-08-22 18:26:04 -07:00
6c75fc0aa3 Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644

Depends on D8493264

Reviewed By: chocjy, boryiingsu

Differential Revision: D9347706

fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842
2018-08-22 17:10:03 -07:00
f72e813c2f Allow tracing functions that take tuples of tensors as inputs (#10637)
Summary:
And return tuples.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637

Reviewed By: eellison

Differential Revision: D9385892

Pulled By: apaszke

fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431
2018-08-22 15:37:10 -07:00
043a2e36e5 Removing setup_caffe2.py (#10734)
Summary:
FULL_CAFFE2=1 python setup.py (install | build_deps develop) should be all anyone needs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10734

Reviewed By: orionr

Differential Revision: D9439354

Pulled By: pjh5

fbshipit-source-id: 0169afcda4f8f38c57498ba2151f7654ecce6070
2018-08-22 15:37:07 -07:00
6c84f7fea0 Relax RHS type assert for augassign (#10730)
Summary:
Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`).
Right now we assert that the RHS of the binop is a tensor,
but it really doesn't have to be because we support scalar/scalar ops and also
list-list ops (i.e., `[1, 2] + [2, 3]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730

Differential Revision: D9465110

Pulled By: zou3519

fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b
2018-08-22 15:10:33 -07:00
d40a598777 Back out "[pytorch][PR] Create at::linear" (#10785)
Summary:
Multiple failing external and internal CI signals were ignored when this commit
was landed. goldsborough please fix the text failures and resubmit this change as a
new PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785

Reviewed By: ezyang

Differential Revision: D9466791

Pulled By: jamesr66a

fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949
2018-08-22 14:39:59 -07:00
6fcac354c5 Erase ListConstruct nodes for ONNX export (#10713)
Summary:
ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713

Differential Revision: D9458508

Pulled By: jamesr66a

fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694
2018-08-22 14:39:58 -07:00
de11a5fb28 Resubmit #8322 with scipy version check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775

Differential Revision: D9458207

Pulled By: SsnL

fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b
2018-08-22 13:39:49 -07:00
ee3e48d34b Move Backend, Layout, ATenGeneral, Deprecated, Generator to ATen/core. (#10740)
Summary:
I included "legacy" includes in the old spots for Backend, Generator, Layout; it seemed unlikely that the other ones had direct user includes.

This is another step on the path to move Type/Tensor to ATen/core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10740

Reviewed By: ezyang

Differential Revision: D9435888

Pulled By: gchanan

fbshipit-source-id: 89f4f0f445d4498a059d3a79069ba641b22bbcac
2018-08-22 13:39:46 -07:00
5ca2713a8b Fix performance of WeightedRandomSampler (#10636)
Summary:
Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636

Differential Revision: D9423869

Pulled By: zou3519

fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44
2018-08-22 13:15:48 -07:00
0e30fa6f3c Faster random number generation in fused_rowwise_random_quantization_ops (#10634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10634

```
Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=2, random_=True, data_shape_=array([1024, 1224]), gc=, dc=[, device_type: 1])
Sub+Scale+Sum time: 1.9944190979003908 ms
Quantizing time: 2.080512046813965 ms (1.0431669296609765X)
De-quantizing time: 0.7375001907348633 ms (0.36978195380863577X)
```

```
Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=1, random_=True, data_shape_=array([1024, 1224]), gc=device_type: 1, dc=[, device_type: 1])
Sub+Scale+Sum time: 1.6691923141479492 ms
Quantizing time: 7.500243186950684 ms (4.493336761366071X)
De-quantizing time: 1.1209726333618164 ms (0.6715658967876477X)
```

Reviewed By: jspark1105

Differential Revision: D8849770

fbshipit-source-id: 2bb2bac7e633f647f38e419ce980b8958f3bcae2
2018-08-22 13:15:46 -07:00
754ec9e386 Reduce rocm link time with ThinLTO
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10758

Differential Revision: D9467554

Pulled By: bddppq

fbshipit-source-id: 6853ccd96ac3209e062c110913ea37d6840c8134
2018-08-22 13:15:45 -07:00
9767951ca8 Remove regex matching from undefined_tensor_test, fixes #10013 (#10702)
Summary:
Don't regex against strings that may have come from the backtrace.
Better to just not regex at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10702

Reviewed By: ezyang

Differential Revision: D9406154

Pulled By: jsrmath

fbshipit-source-id: 9b17abee2a6e737a32c05f1e3963aef4b6638a47
2018-08-22 12:39:57 -07:00
b0ad8105d2 Split storage from tensor (#10053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053

Tensor in Pytorch 1.0 will have
Tensor -> TensorImpl -> Storage -> StorageImpl
In this diff we split Storage from Tensor in order to align with this design.
We'll have Tensor -> Storage -> StorageImpl after this diff

Reviewed By: ezyang

Differential Revision: D9384781

fbshipit-source-id: 40ded2437715a3a2cc888ef28cbca9a25b1d5350
2018-08-22 11:55:02 -07:00
5fb9b31ed5 Add matrix_rank (#10338)
Summary:
- Similar functionality as NumPy
- Added doc string
- Added tests

Differential Revision: D9240850

Pulled By: SsnL

fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf
2018-08-22 09:58:38 -07:00
fbd7189949 add explicit flag to build static libtorch (#10754)
Summary:
I've tested locally that this works to build static and non-static binaries with and without CUDA.

In terms of ongoing testing, I am working on incorporating this into the release package generation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10754

Differential Revision: D9457423

Pulled By: anderspapitto

fbshipit-source-id: aa1dcb17c67c0f0c493a9cf93aca4a6e06b21666
2018-08-22 09:26:07 -07:00
227635142f Delete THD master_worker (#10731)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731

Differential Revision: D9423675

Pulled By: ezyang

fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc
2018-08-22 08:54:36 -07:00
2fe5fa78fa Use FinishDeviceComputation instead of adding events in Operator::SyncDevice
Summary: The code in Operator::SyncDevice had some duplicate logic and using FinishDeviceComputation sufficed in this case.

Reviewed By: yinghai

Differential Revision: D9348288

fbshipit-source-id: d8d874bab491e6d448fcd5fa561a8b99d502753b
2018-08-22 01:09:53 -07:00
22446a3619 Productionize CRF layer in PyText (#10362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10362

This diff implements a manual export from PyText's CRF module to the caffe2 CRF layer.
Note that most of the changes in caffe2/python/crf.py are just formatting changes, the only relevant change is the new class CRFUtils.

Reviewed By: hikushalhere

Differential Revision: D9234126

fbshipit-source-id: 1a67d709034660e8b3d5ac840560b56de63e3f69
2018-08-22 00:25:26 -07:00
19031c68dc Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488)
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage

This patch does two major changes:

- It replaces the use of Retainable in Storage with a new implementation
  based on intrusive_ptr.  This will be necessary because Caffe2 will
  be using this class to implement intrusive_ptrs, and we need to
  line these up for the merge.  One good thing about the new implementation is
  that the default copy/move constructors/assignment operators and destructor
  work automatically, instead of needing to be hardcoded into Storage/Tensor.

- It replaces all places where we returned std::unique_ptr<Storage> with
  Storage, collapsing an unnecessary double indirection that is no longer
  necessary now that we have correctly working copy/move constructors.

I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.

HOW TO FIX YOUR CODE IN THE NEW API

- You no longer need to dereference the result of tensor.storage() to pass
  it to set.  So, instead of:

      x.set_(*y.storage());

  just write:

      x.set_(y.storage());

- If you were accessing methods on StorageImpl via the pImpl() method, you
  must use the dot operator to run pImpl().  Even better; just drop pImpl,
  we now have method forwarding.  So, instead of:

      storage->pImpl()->data();

  just do:

      storage->data();
      // storage.pImpl()->data() works too but is not as recommended

- storage->getDevice() is no more; instead use storage->device().index()

MISC CODE UPDATES

- retain, release, weak_retain, weak_release and weak_lock are now
  reimplemented using the "blessed API", and renamed to make it
  clearer that their use is discouraged.

- nvcc OS X and general OS X portability improvements to intrusive_ptr

- A new comment in intrusive_ptr describing how stack allocated
  intrusive_ptr_targets work differently than heap allocated ones
  from c10::make_intrusive

CAVEAT EMPTOR

- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
  works with intrusive_ptr.  You must reclaim the strong pointer into a
  real strong pointer, construct a weak pointer from it, and then release
  the strong and weak pointers.  See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488

Reviewed By: gchanan

Differential Revision: D9306134

Pulled By: ezyang

fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
2018-08-21 21:39:55 -07:00
abb209ef25 Fixes *fft docs (#10760)
Summary:
cc cranmer

fixes #10751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760

Differential Revision: D9444473

Pulled By: SsnL

fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d
2018-08-21 21:09:04 -07:00
e5e2514f4e fix debug_info arg in createOperator and improve reroute_tensor (#10736)
Summary:
-Fixed C2 core.CreateOperator debug info assignment
-Improving core.Net.reroute_tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10736

Differential Revision: D9426659

Pulled By: harouwu

fbshipit-source-id: 90caf848c88854e17e568d5f6910dc6c81fd000a
2018-08-21 19:40:16 -07:00
1068ba667c Create at::linear (#10755)
Summary:
The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere.

apaszke ezyang (who mentioned this in #10481)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755

Differential Revision: D9443881

Pulled By: goldsborough

fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3
2018-08-21 19:40:15 -07:00
a2ca634e04 Add enforce back to converter.cc
Summary: hotfix for B*8

Differential Revision: D9444060

fbshipit-source-id: 368f8463e684c39ec0ac18bcb11a7b6132d9f874
2018-08-21 19:09:22 -07:00
ddf187c198 Dont assume serialized integral types were widened to int32 in raw_data (#10718)
Summary:
zdevito et al came to the conclusion that the ONNX spec does not mandate the widening conversion of integral types when serializing tensor data into raw_data, as opposed to serializing the data into int32_data. PyTorch recently made this change in the export code, which caused import in caffe2 to break because it did not match semantics. This fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10718

Differential Revision: D9423712

Pulled By: jamesr66a

fbshipit-source-id: 479fbae67b028bf4f9c1ca1812c2c7b0c6cccd12
2018-08-21 18:41:31 -07:00
6325e5aa48 fix typo in error message (#9827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9827

changed unitilized to uninitialized

Reviewed By: jerryzh168

Differential Revision: D8995509

fbshipit-source-id: 94518d5542a7bff49fcb9a4505c0c7a959746f78
2018-08-21 18:41:29 -07:00
44f996f82c Py3 fixes for layer_model_helper.py (#10525)
Summary:
Fixes `__getattr__` to adhere to its Python API contract, and wraps `range()` call in a list since it does not return one anymore in Python 3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10525

Reviewed By: ezyang

Differential Revision: D9441360

Pulled By: tomdz

fbshipit-source-id: d489c0e7cefecc4699ca866fd55ddbfa629688d4
2018-08-21 18:41:28 -07:00
71ddd837d7 Support custom ops in ScriptModule and tidy up test files (#10610)
Summary:
This PR adds support for using custom ops in ScriptModules, the last step for our custom op strategy. You can now write

```
import torch

torch.ops.load_library('libcustom_ops.so')

class Model(torch.jit.ScriptModule):
    def __init__(self):
        super(Model, self).__init__()

    torch.jit.script_method
    def forward(self, input):
        return torch.ops.custom.op(input) + 1

model = Model()
model.forward(torch.ones(5)) # Works
model.save("model.pt") # Works
model = torch.jit.load("model.pt") # Works
```

You can then load the `model.pt` in C++ and execute its `forward` method!

Missing for this was the fact that the script compiler didn't know to convert `ops.custom.op` into a `BuiltinFunction` which then emits a function call. For this I came up with  the following strategy inside `torch/csrc/jit/scrip/init.cpp`:

1. When we access `torch.ops`, we return a `CustomOpValue` (subclass of `PythonValue`), whose purpose is only to return a `CustomOpNamespaceValue` (subclass of `PythonValue`) whenever something under it is accessed.
2. `CustomOpNamespaceValue` will then for each field accessed on it return a `BuiltinFunction`.

This doesn't reduce performance for any calls that are not to `torch.ops` (as opposed to inspecting every function call's name the call site, for example).

I also had to fix `BuiltinFunction` to not assume the namespace is always `aten::`.

A lot of other changes are just tidying up the Python and C++ test harness before I integrate it in CI.

zdevito dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10610

Differential Revision: D9387832

Pulled By: goldsborough

fbshipit-source-id: c00f431db56c7502a66fe1f813fe78067f428ecb
2018-08-21 18:41:27 -07:00
e94ae99d24 Delete copy constructor/assignment of class Observable explicitly. (#10593)
Summary:
This should resolves "error C2280: 'std::unique_ptr<caffe2::ObserverBase<caffe2::OperatorBase>,std::default_delete<_Ty>> &std::unique_ptr<_Ty,std::default_delete<_Ty>>::operator =(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function" from Visual Studio.
It should also make error message more human-readable in case if something really messed up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10593

Reviewed By: orionr

Differential Revision: D9436397

Pulled By: mingzhe09088

fbshipit-source-id: 31711667297b4160196134a34365da734db1c61d
2018-08-21 16:56:04 -07:00
04b773ab87 Support Loading to GPU (#10710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10710

Can't resume from checkpoint for workflows that use GPU.

The problem is just we didn't leverage the already-provided GPU deserialization of Caffe2.

`keep_device` arg of LoadOp. See https://fburl.com/y27ltaxw

How a serialized BlobProto (contraining TensorProto) is loaded into GPU memory?
- Load BlobProto from DB. https://fburl.com/pe1qaeyf
- Deserialize the BlobProto into a Blob instance. https://fburl.com/5dirjuuh and https://fburl.com/stoho0x1
- Call Blob->Deserialized. https://fburl.com/bnureu32
- Deserializer Registration. https://fburl.com/wbu95ry7 https://fburl.com/ycetud8u
- Create TensorCUDA Deserializer. https://fburl.com/2lirfuqj
- Create Tensor on GPU and get TensorProto of BlobProto. https://fburl.com/7dre82zg
- Copy TensorProto in CPU to Tensor on GPU. https://fburl.com/fr0qk2oe

Cloned the GPU workflows for testing in D9125520.

Reviewed By: mraway

Differential Revision: D9372950

fbshipit-source-id: 2bf70747bd71e8da16239197f7d2761d63f09ff8
2018-08-21 13:57:36 -07:00
edb34434ab More changes for hidden visibility (#10692)
Summary:
Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624

cc mingzhe09088 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692

Reviewed By: mingzhe09088

Differential Revision: D9423617

Pulled By: orionr

fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf
2018-08-21 13:39:57 -07:00
8a1739b05d Add arguments __repr__ in Distribution base class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10373

Differential Revision: D9240316

Pulled By: ezyang

fbshipit-source-id: f35c500f61f86e6be405e8bd4040db5146224984
2018-08-21 12:10:23 -07:00
9c321a8779 Add util function from core type to dtype (#10716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10716

title

Reviewed By: idning

Differential Revision: D9417357

fbshipit-source-id: 0f71805b1d64a46791d6ee4d8620763f878ffdb6
2018-08-21 10:55:19 -07:00
b23d59ce1a Make ONNX_ATEN_FALLBACK as internal default option
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10629

Reviewed By: bddppq

Differential Revision: D9381106

fbshipit-source-id: 03d42c95d17a70a68fe0f38dad68f1793996dfce
2018-08-21 10:10:50 -07:00
b0b5139149 Set the BUILD_ENVIRONMENT variable before installing sccache. (#10640)
Summary:
Set the build environment before installing sccache in order to make sure the docker images have the links set up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10640

Reviewed By: yf225

Differential Revision: D9399593

Pulled By: Jorghi12

fbshipit-source-id: a062fed8b7e83460fe9d50a7a27c0f20bcd766c4
2018-08-21 09:40:41 -07:00
30ad13faca Avoid shadowing i, j vars in GeneralProposals test (#10721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10721

- Fix compilation warning "declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]"

Reviewed By: newstzpz

Differential Revision: D9419688

fbshipit-source-id: 76efc3688782ce4ead3c89e7069211736febfac2
2018-08-21 09:11:38 -07:00
f9d1b001e1 Move THNN Reduction to ATen/core. (#10703)
Summary:
This is part of moving the (base) Type to ATen/core; Some Type methods have default argument of type THNN Reduction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10703

Differential Revision: D9406060

Pulled By: gchanan

fbshipit-source-id: 789bb3387c58bd083cd526a602649105274e1ef6
2018-08-21 08:54:35 -07:00
f0d8a36e70 Completely remove build_aten and use_aten (#10469)
Summary:
Breaking out of #8338 to completely remove build_aten and use_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469

Reviewed By: orionr

Differential Revision: D9413639

Pulled By: mingzhe09088

fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183
2018-08-20 20:26:42 -07:00
9e75ec11fb Make empty list literals construct empty Tensor[] (#10705)
Summary:
This will make the common case more natural (no need to do `_construct_empty_tensor_list()`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705

Differential Revision: D9411622

Pulled By: michaelsuo

fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96
2018-08-20 18:28:28 -07:00
5c0d9a2493 Soumith's last few patches to v0.4.1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10646

Reviewed By: ml7

Differential Revision: D9400556

Pulled By: pjh5

fbshipit-source-id: 1c9d54d5306f93d103fa1b172fa189fb68e32490
2018-08-20 18:28:27 -07:00
e449a27646 Fix issues link in Caffe2 readme (#10711)
Summary:
Change to pytorch issues link

orionr pjh5 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10711

Reviewed By: orionr

Differential Revision: D9412870

Pulled By: duc0

fbshipit-source-id: 341e8504ade8eba614cead832e5b5fdca4b1c270
2018-08-20 16:55:11 -07:00
826550a32e Update the onnx Gemm op to FC/FCTransposed logic in caffe2 onnx backend (#10108)
Summary:
The broadcast is used by default when the opset version is greater then 6.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10108

Reviewed By: bddppq

Differential Revision: D9176934

Pulled By: houseroad

fbshipit-source-id: b737bd87b0ddc241c657d35856d1273c9950eeba
2018-08-20 16:09:22 -07:00
15d7f49205 Adding ATEN_NO_TEST option to root level cmake for propogation to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708

Reviewed By: ml7

Differential Revision: D9410916

Pulled By: pjh5

fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d
2018-08-20 15:40:27 -07:00
585e6b581f Allow method-style casts on tensors (#10641)
Summary:
Closes https://github.com/pytorch/pytorch/issues/10631
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641

Differential Revision: D9407598

Pulled By: jamesr66a

fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f
2018-08-20 14:10:21 -07:00
39a3dcc999 Fix #10698 build failure (#10704)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10704

Differential Revision: D9406072

Pulled By: ezyang

fbshipit-source-id: 0d472ef84cddc3bf7600b06d04e5e02e94d59fa3
2018-08-20 14:10:19 -07:00
b4684db698 Add support for Log()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10694

Reviewed By: houseroad

Differential Revision: D9405612

Pulled By: MisterTea

fbshipit-source-id: 6d83d3c2db933a3822076c7faf578ac0e92e60c6
2018-08-20 13:25:21 -07:00
7832e9d564 Add a bisect percentile operator (#10563)
Summary:
Add a bisect percentile operators with lower and upper bounds for interpolation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563

Reviewed By: chocjy

Differential Revision: D7802182

Pulled By: olittle

fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417
2018-08-20 13:14:05 -07:00
3d0757430b Fix EnsureCPUOutputOp (#10651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10651

EnsureCPUOutputOp will copy the input from another Context to CPU, but currently there is no guarantee that the Copy will be executed.

Differential Revision: D9390046

fbshipit-source-id: af3ff19cf46560264cb77d2ab8821f0cc5be74f6
2018-08-20 12:12:48 -07:00
2e563c417c Nomnigraph - rename some APIs that invole Subtree to Subgraph (#10551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10551

Renaming from "subtree" -> "subgraph" to improve clarity of subgraph matcher APIs since it now supports DAG

This is pure renaming, no functionalities change.

Reviewed By: bwasti

Differential Revision: D9348311

fbshipit-source-id: 4b9267845950f3029dfe385ce3257d3abb8bdad4
2018-08-20 10:55:21 -07:00
aa9f328fa3 Nomnigraph - DAG matching (#10549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10549

Support dag matching in nomnigraph. This is done by maintaining a map from node in the MatchGraph to node in the input graph, and additionally enforce that same nodes in the MatchGraph must match to same nodes in the input graph (with the exception of multiplicity i.e. when count != 1 on the MatchGraph node).

In a follow up diff, I'll rename the API that refers to subtree as subgraph to improve clarity.

Reviewed By: bwasti

Differential Revision: D9347322

fbshipit-source-id: 171491b98c76852240a253279c2654e96dd12632
2018-08-20 10:55:19 -07:00
0cce4620fe Fix backend/device-type comparison with MKLDNN.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10689

Differential Revision: D9400450

Pulled By: gchanan

fbshipit-source-id: f75b042b886d5d525edb2c423173a9646c613a1b
2018-08-20 10:41:08 -07:00
db7b7f1359 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686

Differential Revision: D9399874

Pulled By: SsnL

fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa
2018-08-20 10:40:55 -07:00
d4832f1e7b More fixes for hidden visibility (#10624)
Summary:
Some more `ATEN_API` additions for hidden visibility.

Running CI tests to see what fails to link.

cc Yangqing mingzhe09088 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624

Reviewed By: mingzhe09088

Differential Revision: D9392728

Pulled By: orionr

fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726
2018-08-20 10:11:59 -07:00
9ad9191323 Fix cuDNN dropout state cache (#10662)
Summary:
Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10662

Reviewed By: soumith

Differential Revision: D9393629

Pulled By: apaszke

fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90
2018-08-20 05:09:41 -07:00
c37fac4d50 Fixing stop condition on composite reader (#9888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9888

Limiter cannot be shared or copied; just pass it to the first reader.

Reviewed By: xianjiec

Differential Revision: D9008871

fbshipit-source-id: e20cd785b26b1844e156efc3833ca77cfc3ffe82
2018-08-20 03:02:20 -07:00
83066e9b30 Add trigonometry functions for ONNX export (#7540)
Summary:
Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869

This PR makes pytorch support exporting graphs with trigonometry functions.

This PR might need to wait until it is ready to change
```python
_onnx_opset_version = 6
```
to
```python
_onnx_opset_version = 7
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540

Differential Revision: D9395041

Pulled By: bddppq

fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2
2018-08-19 23:01:28 -07:00
3f603eeee8 some improvements on distributed docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666

Differential Revision: D9395242

Pulled By: SsnL

fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956
2018-08-19 17:40:28 -07:00
108b657159 Import DistributedSampler in utils/data/__init__ (#10671)
Summary:
There is no reason that user should do an extra import to use DistributedSampler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671

Differential Revision: D9395189

Pulled By: SsnL

fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2
2018-08-19 16:55:13 -07:00
6bdbad93b9 Refactor Device to not depend on Backend. (#10478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478

- Removed Backend constructor from Device, and fixed all
  use-sites to use DeviceType::CPU instead of kCPU, or
  use a new function backendToDeviceType to perform
  the conversion.
- New method device_type() on Type; it gives you the
  underlying device type, e.g., CPU for SparseCPU.
- We add backward compatibility for kCPU/kCUDA uses,
  by introducing a new special type which is implicitly
  convertible to both DeviceType and Backend.  As long as
  you don't define a function that's overloaded on both
  DeviceType and Backend (but not on BackendOrDeviceType),
  the implicit conversions will ensure that uses
  of at::Device(at::kCPU) keep working. We fixed use-sites in
  the library, but did NOT fix sites in the test code, so that
  we can exercise this BC code.

Reviewed By: Yangqing

Differential Revision: D9301861

fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716
2018-08-18 17:39:14 -07:00
f1420adfe3 Move at::chunk into the graph fuser (#10178)
Summary:
... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026

This is done through the following:

1) Absorb starting chunks into FusionGroup as a part of the graph fuser
pass.
2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked.
3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an
input tensor on the CPU. This chunk directly takes in an at::Tensor and creates
four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors.

- Expect test and correctness test to see if a single chunk is fused
  by the graph fuser
- Correctness test for a variety of chunks (dimension = beginning,
  middle, end) and tensors (contiguous, non-contiguous, edge case
  (splitSize = 1) for both CPU/CUDA
- Expect test for multiple chunks fused into the same kernel and
  correctness test.

cc zdevito apaszke

LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights.

After changes:
```
thnn    cudnn   jit
8.8468  6.5797  9.3470
```

Before changes:
```
thnn    cudnn   jit
9.9221  6.6539  11.2550
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178

Differential Revision: D9382661

Pulled By: zou3519

fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8
2018-08-18 16:10:11 -07:00
poh
d87b4e941b fix python interpreter can not be found without PYTHON_EXECUTABLE (#10659)
Summary:
Take 2 of #10543
The problem was that between commit and merge there was added one more run point `tools/build_libtorch.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10659

Differential Revision: D9393540

Pulled By: soumith

fbshipit-source-id: 8ebfed600fc735fd1cb0489b161ec80e3db062e0
2018-08-18 15:40:08 -07:00
152762a567 Fix warnings diagnosed in recent clang (#10647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10647

Fix "missing std::move from the return value" warning diagnosed by recent clang compiler.

Reviewed By: soumith, DavidCallahan

Differential Revision: D9384692

fbshipit-source-id: 8ad951e47d605e6f98a9650f2dec2909ad0f3eb8
2018-08-17 21:32:58 -07:00
e29b5a1ea8 graph fuser inserts explicit expands where necessary (#10325)
Summary:
Fixes #10096

If the only thing preventing a simple mappable operator from being fused
into a fusion group is that its Tensor inputs are not of the same shape as the
output, then the graph fuser inserts explicit expand nodes for those
inputs.
This helps the graph fuser not miss out on any fusion opportunities
involving simple mappable operations that have Tensor inputs. This PR
doesn't do anything for the scalar case; that can be addressed later.

Test Plan
- Simple expect test case
- Added expect tests for a raw LSTMCell. The expands help speed up the
  forwards pass by allowing more operations to be fused into the LSTMCell's single
  FusionGroup.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325

Differential Revision: D9379308

Pulled By: zou3519

fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245
2018-08-17 16:03:46 -07:00
7c55d11ba5 Make sure we don't relocate the weight name buffer (#10630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10630

`onnxTensorDescriptorV1.name` points to the string buffer. We use a vector of strings to serve as the storage. This means we cannot reallocate the vector because that may invalidate the `onnxTensorDescriptorV1.name` pointers. Solution is to reserve a large enough vector so that it won't reallocate.

Reviewed By: bddppq, houseroad

Differential Revision: D9381838

fbshipit-source-id: f49c5719aafcc0829c79f95a2a39a175bcad7bfe
2018-08-17 16:03:31 -07:00
65b9308128 Basic infrastructure for C++ documentation (#10569)
Summary:
Adds the folder structure, Doxyfile, sphinx setup and Makefile to build C++ docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10569

Differential Revision: D9386744

Pulled By: goldsborough

fbshipit-source-id: 0a7c581dcf0a5f7b01ba19d317b493cf95935134
2018-08-17 15:39:50 -07:00
b62b378022 Adding torch support for CMAKE_ARGS env
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10635

Reviewed By: ml7

Differential Revision: D9383845

Pulled By: pjh5

fbshipit-source-id: fb21bda12e88053eec738974e6e419388c5038d9
2018-08-17 14:54:43 -07:00
c5c1c051ca Fix dropout fused kernel applied in eval mode (#10621)
Summary:
fixes https://github.com/pytorch/pytorch/issues/10584

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621

Differential Revision: D9379397

Pulled By: SsnL

fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc
2018-08-17 14:54:42 -07:00
86c9856d9c Fuse tensor-scalar ops when scalar is constant (#10511)
Summary:
This is on the way to resolving #9940.

Fixes #10501

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511

Differential Revision: D9378896

Pulled By: zou3519

fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
2018-08-17 14:10:23 -07:00
f3ac619764 Add fusion support for batchnorm and convolution without bias
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10595

Reviewed By: bwasti

Differential Revision: D9110099

fbshipit-source-id: e1ed66c7d82b2f9987b7eb9c7f98877a6dbeb902
2018-08-17 12:11:44 -07:00
d35f365ad5 Remove all cuDNN specific inputs to RNN functions (#10581)
Summary:
This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too.

zdevito ezyang
colesbury (for minor DispatchStub.h) changes

There was no way to handle those in the JIT for now, and they turned
out to be completely unnecessary. It should make the Python and C++
module code much simpler too, since all the logic is now centralized
in the native functions.

The downside is that RNN modules no longer own their dropout buffers,
which are shared per-device instead (with appropriate locking and
synchronization). This might appear as a perf regression at first, but
in reality it's highly unlikely that anyone will want to run cuDNN RNNs
on the same GPU in parallel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581

Reviewed By: colesbury

Differential Revision: D9365541

Pulled By: apaszke

fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5
2018-08-17 11:09:51 -07:00
52058204d6 Add nn functional tests in JIT (#10409)
Summary:
The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode.

Some statistics in summary:

- Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional).

- 7 functions/ops does not support higher gradient, so just excluded from the whole test.

- 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409

Differential Revision: D9350694

Pulled By: wanchaol

fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8
2018-08-17 11:09:49 -07:00
b4e72ea811 Revert D9377394: [pytorch][PR] [Caffe2] Add AT_CORE_EXPORT and AT_CORE_IMPORT.
Differential Revision:
D9377394

Original commit changeset: 993062a461ff

fbshipit-source-id: af8ab92e9b88466602508981d9b3ea24ce393dfc
2018-08-17 10:39:27 -07:00
bd9ab650ae fix compile error in math_hip.cc from new Im2Col/Col2Im interface (#10623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10623

Fix compile error in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-build/10280//console

Reviewed By: ezyang

Differential Revision: D9379451

fbshipit-source-id: 67cc3964981edba1915b93c49643caa300d63c16
2018-08-17 10:24:25 -07:00
ff440b61f6 Revert D9378844: [pytorch][PR] fix python interpreter can not be found
Differential Revision:
D9378844

Original commit changeset: 022e20aab7e2

fbshipit-source-id: 962280707e84edff2a4f59b1ce2f4211a579a055
2018-08-17 10:09:27 -07:00
e190505e84 Adding support for inlining if branches (#10084)
Summary:
Inlining if branches which have constant inputs.  If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should
return a mutated set of (a) and not (a, b).

```
if cond:
  if True:
	 a = a - 1
    else:
	b = b - 1
```
To calculate this we recursively update mutate variables in if branches from the leaf nodes up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084

Reviewed By: michaelsuo

Differential Revision: D9340429

Pulled By: eellison

fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98
2018-08-17 09:48:47 -07:00
31c7a32d1c Include aten_op by default in caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10603

Reviewed By: ahhegazy, dzhulgakov

Differential Revision: D9364309

fbshipit-source-id: e72d9f2b1e99cb0fb2186c737fcd925b14d42754
2018-08-17 08:39:46 -07:00
03982fb8d3 Fix subgraph cutting wrt recent external_input change in nomnigraph (#10598)
Summary:
https://github.com/pytorch/pytorch/pull/10100 recently take external input/output in nomnigraph. This PR makes adjust to
0. Relax some of the conditions on external input
1. Update NNModule inputs/outputs when pruning the input/output.
2. Avoiding copying external input/output as nomnigraph already takes care of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10598

Reviewed By: bwasti

Differential Revision: D9371730

Pulled By: yinghai

fbshipit-source-id: 9273be5041dc4cc8585587f47cb6721e518a06a8
2018-08-17 08:25:49 -07:00
ff3a481aee fix python interpreter can not be found (#10543)
Summary:
Custom python installation, which have no aliases to `python` or `python3` can't be found by cmake `findPythonInterp` without extra cmake argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10543

Differential Revision: D9378844

Pulled By: ezyang

fbshipit-source-id: 022e20aab7e27a5a56b8eb91b6026151116193c7
2018-08-17 08:25:48 -07:00
51222500e2 Add AT_CORE_EXPORT and AT_CORE_IMPORT. (#10602)
Summary:
Fix "error LNK2019: unresolved external symbol" from "CAFFE_KNOWN_TYPE" in tests where we should use dllexport instead of AT_CORE_API(=dllimport).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10602

Differential Revision: D9377394

Pulled By: Yangqing

fbshipit-source-id: 993062a461ffce393f2321c5391db5afb9b4e7ba
2018-08-17 02:09:38 -07:00
cc53807be5 group conv with NHWC layout (#10585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585

group conv with NHWC layout

Reviewed By: BIT-silence

Differential Revision: D7547497

fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661
2018-08-17 00:39:23 -07:00
0aefb9f26c Update onnx to onnx/onnx@7848f1e (#10613)
Summary:
https://github.com/onnx/onnx/commit/7848f1e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10613

Reviewed By: houseroad

Differential Revision: D9376224

Pulled By: bddppq

fbshipit-source-id: ce8a53255ba24f0f8f989570e8b015837f8442fb
2018-08-16 23:39:37 -07:00
6667d55e73 Disallow input filler for GatherRangesOp (#10592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10592

Filter out GatherRanges ops

Reviewed By: highker

Differential Revision: D9365220

fbshipit-source-id: e21ab00dc9e553c9aaf172e1241206e0c0a7a23d
2018-08-16 21:39:09 -07:00
3578909671 Remove unused code base for distributed training (#10282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10282

This diff removes the unused/deprecated features from the code base.

Reviewed By: manojkris

Differential Revision: D9169859

fbshipit-source-id: d6447b7916a7c687b44b20da868112e6720ba245
2018-08-16 20:10:17 -07:00
f1d40ef280 build_pytorch_libs.sh: use MAX_JOBS rather than NUM_JOBS (#10600)
Summary:
MAX_JOBS is set by our jenkins setup
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10600

Differential Revision: D9375317

Pulled By: anderspapitto

fbshipit-source-id: 25416d5ee12372f7610baa78cb7b423806b26aa2
2018-08-16 20:10:15 -07:00
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
67c6d93634 Tune minimal work size (#10599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599

Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed.

Reviewed By: ajtulloch

Differential Revision: D9366664

fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e
2018-08-16 17:39:57 -07:00
afd7477eaa Add `buffers(), named_buffers()` methods. (#10554)
Summary:
This commit adds the ``buffers()`` and ``named_buffers()`` methods as
analogues of ``parameters()`` and ``named_parameters()``.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554

Reviewed By: SsnL

Differential Revision: D9367762

Pulled By: jma127

fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e
2018-08-16 16:26:48 -07:00
342517e6e7 Back out "Add aten_op to caffe2 onnx (python) backend" (#10589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10589

Original commit changeset: 2cc6fedbaf08

Reviewed By: houseroad

Differential Revision: D9365208

fbshipit-source-id: 3871d8e70f0d8e48c8af9593c78587d16c45afc2
2018-08-16 15:15:27 -07:00
488ea824ed Additional changes to make GPU builds work (#10507)
Summary:
A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds.

I was testing with

```
FULL_CAFFE2=1 python setup.py build_deps | tee ~/log.txt
cat ~/log.txt | egrep 'undefined refer' | sort | less
```

I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing.

cc mingzhe09088 anderspapitto ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507

Reviewed By: Yangqing

Differential Revision: D9359606

Pulled By: orionr

fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2
2018-08-16 13:25:27 -07:00
ef15bb8787 remove implicit conversion from gpu to cpu (#10553)
Summary:
Resubmit #10416 with fixed tests . This is to remove implicit conversion from gpu to cpu in when calling numpy to keep behavior match others.

It requires users to move the tensor back to cpu() before call numpy functions on it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10553

Differential Revision: D9350212

Pulled By: ailzhang

fbshipit-source-id: 9317d8fea925d4b20ae3150e2c1b39ba5c9c9d0a
2018-08-16 12:10:39 -07:00
d6f3c88418 Revert D9076734: Split storage from tensor
Differential Revision:
D9076734

Original commit changeset: ea9e1094ecf8

fbshipit-source-id: 3fa9b65b7265fce6207d9e1d9ef4707dbb29704b
2018-08-16 11:25:32 -07:00
40a070422d Adding new allreduce bcube routines to ops supported by gloo (#10494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10494

Adding the AllredubeBcube routines as they are now available in gloo.

Reviewed By: wesolwsk

Differential Revision: D8269473

fbshipit-source-id: 6a3a32291bbf1fbb328b3ced0f2a753dc5caf4e5
2018-08-16 10:56:26 -07:00
4be4b4c8b5 Remove weight from input of onnxifi backend op (#10575)
Summary:
The ONNXIFI backend will absorb the constant weight in Conv, so we should not add it as an input. This is just a test artifacts. Note that Onnxifi transformer will do the right thing when cutting the graph to absorb the weights.

rdzhabarov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10575

Reviewed By: houseroad

Differential Revision: D9357339

Pulled By: yinghai

fbshipit-source-id: a613fa3acafa687295312f5211f8e9d7f77b39cd
2018-08-16 10:56:25 -07:00
319fefe9e6 Support benchmark on windows machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564

Reviewed By: llyfacebook

Differential Revision: D9356389

Pulled By: sf-wind

fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9
2018-08-16 10:56:23 -07:00
00f2731112 Merge THTensor into TensorImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479

Differential Revision: D9315800

Pulled By: gchanan

fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e
2018-08-16 08:10:06 -07:00
130881f0e3 Delete build_caffe2.sh, replace with build_libtorch.py (#10508)
Summary:
delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR).  This ensures that all consumers of the torch CMake file go through as unified a path as possible.

In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508

Differential Revision: D9354398

Pulled By: anderspapitto

fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9
2018-08-16 08:10:04 -07:00
c6facc2aaa Add conversions between DataType and ScalarType.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10472

Reviewed By: gchanan

Differential Revision: D9298048

fbshipit-source-id: c58efa582eab64c58d0771d90d90862911c168d1
2018-08-16 07:55:31 -07:00
fdd2b9baee Add DataType alias
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10547

Reviewed By: soumith

Differential Revision: D9346040

fbshipit-source-id: 1069a44182ccff68b1694086c8b709ba2046b22b
2018-08-16 07:55:29 -07:00
8fdba4ec35 Move all operator<< overloads out of the global namespace. (#10546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10546

Have you ever written an operator<< overload in the caffe2 namespace
in a core Caffe2 header, and then been stunned when some completely
unrelated code started breaking?  This diff fixes this problem!

The problem looks like this:
1. You're building against a really old version of glog (think 0.3.2,
   or something like that)
2. This version of glog defines operator<< overloads for std containers
   in the global namespace
3. You add a new overload in your current namespace (e.g., caffe2).
   Congratulations: this overload is *preferentially* chosen over
   the global namespace one for all calls to << in that namespace.
   And since it doesn't actually have std::vector overloads, unrelated
   Caffe2 code breaks.

Newer versions of glog have a fix for this: they have the line:

  namespace std { using ::operator<<; }

in their header.  So let's help old versions of glog out and do this ourselves.

In our new world order, operator<< overloads defined in the global namespace
won't work (unless they're for std containers, which work because of ADL).
So this diff also moves all those overloads to the correct namespace.

Reviewed By: dzhulgakov

Differential Revision: D9344540

fbshipit-source-id: 6246ed50b86312668ebbd7b039fcd1233a3609cf
2018-08-16 07:55:27 -07:00
238b4b9236 Resolve error C2370 "redefinition; different storage class" by adding dllimport. (#10571)
Summary:
For #10568
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10571

Differential Revision: D9357987

Pulled By: Yangqing

fbshipit-source-id: 6726f0a1d31a225375a0ddc0e05284f3eb89dda8
2018-08-16 00:39:33 -07:00
84427d26db Add aten_op to caffe2 onnx (python) backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10579

Reviewed By: houseroad

Differential Revision: D9357837

fbshipit-source-id: 2cc6fedbaf088df7e11b52a91dfe3b8f0d7fd599
2018-08-16 00:39:30 -07:00
76da0b34c2 Remove an unused variable found by linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10578

Differential Revision: D9357880

Pulled By: bddppq

fbshipit-source-id: 6b56c2dbd02258124b5a4656cdf44d14a59e1b71
2018-08-16 00:25:44 -07:00
7487ee55f1 Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572)
Summary:
For #10570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572

Differential Revision: D9357984

Pulled By: Yangqing

fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb
2018-08-16 00:25:41 -07:00
abf85bf0ef Perform CSE across block boundaries. (#10105)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105

Differential Revision: D9186678

Pulled By: resistor

fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d
2018-08-16 00:25:36 -07:00
2e0dd86903 Make torch::Tensor -> at::Tensor (#10516)
Summary:
This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like

```
auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() +
    torch::stack(torch::TensorListView(value_loss)).sum();
```

is now

```
auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum();
```

CC jgehring

ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516

Differential Revision: D9324691

Pulled By: goldsborough

fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449
2018-08-15 21:25:12 -07:00
8013dac43d Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Reviewed By: Yangqing

Differential Revision: D9348485

Pulled By: soumith

fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a
2018-08-15 20:55:59 -07:00
05dcf00644 fixed c10d test (#10557)
Summary:
fixed NCCL test, which is not run in CI. We should enable it soon.
```
~/new_pytorch/pytorch/test$ python test_c10d.py
...............
----------------------------------------------------------------------
Ran 15 tests in 13.099s

OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10557

Reviewed By: ailzhang

Differential Revision: D9353286

Pulled By: teng-li

fbshipit-source-id: 5a722975beaa601203f51c723522cc881f2d2090
2018-08-15 17:22:38 -07:00
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
87cac4c2f1 Update Im2Col related to make preparation for group conv in NHWC order. (#10439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439

Update Im2Col related to make preparation for group conv in NHWC order.

Reviewed By: houseroad

Differential Revision: D9285344

fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
2018-08-15 17:10:24 -07:00
579962f2a8 reroute tensor feature in core.Net and generate one net feature in model_helper (#10528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528

adding 2 features to core and model_helper

- reroute_tensor which supports op insertion on net level
- model_helper complete net and cut net used for full graph analysis

Differential Revision: D9330345

fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a
2018-08-15 16:40:15 -07:00
523bdc8ec1 Split storage from tensor (#10053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053

Tensor in Pytorch 1.0 will have
Tensor -> TensorImpl -> Storage -> StorageImpl
In this diff we split Storage from Tensor in order to align with this design.
We'll have Tensor -> Storage -> StorageImpl after this diff

Reviewed By: dzhulgakov

Differential Revision: D9076734

fbshipit-source-id: ea9e1094ecf8c6eaeaa642413c56c6a95fb3d14e
2018-08-15 16:40:14 -07:00
03e9ea5ef0 Fix leaking of Storages (not StorageImpls) (#10552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10552

Fix leaking of Storages (not StorageImpls)

Reviewed By: li-roy

Differential Revision: D9349824

fbshipit-source-id: 31f14951020a63189bebda25a3bf8bf195cd227f
2018-08-15 16:10:00 -07:00
4c49da34a9 Add new MKLDNN fallback operators (#10526)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10526

Resubmitting these changes. Previously they caused issues with multifeed, which I fixed with D9280622

Reviewed By: yinghai

Differential Revision: D9327323

fbshipit-source-id: ec69428039b45c6221a5403b8fe9a83637857f04
2018-08-15 15:55:22 -07:00
a129f9ad3b Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation.
Differential Revision:
D9332335

Original commit changeset: 1b3a91d078ef

fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec
2018-08-15 15:25:11 -07:00
151e7de893 varargs for einsum (#10067)
Summary:
Implemented via a wrapper, thank you Richard for the suggestion!

Fixes: #9929
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067

Differential Revision: D9083388

Pulled By: soumith

fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da
2018-08-15 15:13:25 -07:00
fb45ec5ac3 Don't set DEBUG=1 in ASAN build (#9902)
Summary:
This should make ASAN tests run faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9902

Differential Revision: D9032986

Pulled By: yf225

fbshipit-source-id: 3d2edec2d7ce78bc995d25865aa82ba6d3f971d0
2018-08-15 14:39:57 -07:00
26c764a1db Update FP16 submodule. Close #10523 (#10548)
Summary:
Pull a fix in FP16 for compilation bug when using Intel Compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10548

Differential Revision: D9349469

Pulled By: Maratyszcza

fbshipit-source-id: 43e6dc5c3c18319d31eca23426770c73795feec5
2018-08-15 14:26:56 -07:00
021b4888db Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530)
Summary:
In my environment, it looks like setup.py hangs when running

```
FULL_CAFFE2=1 python setup.py build_deps
```

Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530

Differential Revision: D9349597

Pulled By: orionr

fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4
2018-08-15 14:26:53 -07:00
c5b1aa93ee Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316

Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor.

Reviewed By: harouwu

Differential Revision: D9004839

fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845
2018-08-15 14:26:50 -07:00
6f14202acd Revert D9276252: [pytorch][PR] remove implicit conversion to cpu
Differential Revision:
D9276252

Original commit changeset: ea7d9d4f9390

fbshipit-source-id: 5977bf90d4c84b47e15bc8266cc3ce5602c4e05f
2018-08-15 13:55:18 -07:00
5adcac3dce Cuda half macros cleanup (#10147)
Summary:
This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's https://github.com/pytorch/pytorch/pull/9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds.

- Commit 1248de7baf removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit 2f186df52d, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS`

- Commit 520c99b057 removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10147

Differential Revision: D9345940

Pulled By: soumith

fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0
2018-08-15 13:25:42 -07:00
86363e1d8e Move RNN implementations to C++ (#10481)
Summary:
This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR.

Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that).

zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481

Reviewed By: ezyang

Differential Revision: D9341113

Pulled By: apaszke

fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5
2018-08-15 13:25:41 -07:00
484395edfb Fix corner case with torch.multinomial (#9960)
Summary:
In the shortcut for n_sample=1, when category 0 has 0 weight,
we should not map the (uniform) sample 0 to category 0.
The conversion uniform->multinomial was apparently written to work on
a (0,1] range (like curand uses), but PyTorch uses a [0,1) range.

Fixes: #4858. Thank you, Roy Fejgin for reporting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9960

Reviewed By: soumith

Differential Revision: D9341793

Pulled By: ailzhang

fbshipit-source-id: 6b1a96419a7bc58cc594f761f34c6408ff6354cf
2018-08-15 13:25:39 -07:00
fb09292020 Increase tolerance in ConvBN test
Summary: reduce flakiness of test

Reviewed By: Maratyszcza

Differential Revision: D9344877

fbshipit-source-id: 24d5e1b873f94d816c980f3b7db93248cf10aca5
2018-08-15 13:14:35 -07:00
254dedf604 Propagate NaN through threshold (#10277)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10238
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277

Reviewed By: SsnL

Differential Revision: D9199825

Pulled By: soumith

fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2
2018-08-15 12:59:31 -07:00
0bbcc7b534 Don't assume curl version in Windows build script (#10476)
Summary:
Since we can't specify version number to `choco install curl`, we should not assume that `7.57.0` is the curl version that's in the Windows AMI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10476

Differential Revision: D9303129

Pulled By: yf225

fbshipit-source-id: 198544be68330860fbcf93c99bc995f4e280bda7
2018-08-15 12:59:23 -07:00
85408e744f Move filler interface to operator schema (#10522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10522

Move filler interface to operator schema to avoid extra code for
caffe2 mobile.

Reviewed By: dzhulgakov

Differential Revision: D9312940

fbshipit-source-id: 77fb2406f0c6b171a1912a207e05e36da50c6966
2018-08-15 12:40:18 -07:00
9646d68962 support broadcasting in _kl_categorical_categorical (#10533)
Summary:
Support broadcasting in _kl_categorical_categorical

this makes it possible to do:
```
import torch.distributions as dist
import torch
p_dist = dist.Categorical(torch.ones(1,10))
q_dist = dist.Categorical(torch.ones(100,10))
dist.kl_divergence(p_dist, q_dist)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10533

Differential Revision: D9341252

Pulled By: soumith

fbshipit-source-id: 34575b30160b43b6c9e4c3070dd7ef07c00ff5d7
2018-08-15 12:40:17 -07:00
05a260da43 Bump gloo to latest master (#10545)
Summary:
Needed by the Gloo development team. Verifying nothing breaks in CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10545

Reviewed By: Maratyszcza

Differential Revision: D9344413

Pulled By: orionr

fbshipit-source-id: 207edb71170870bacec47a635a12d7f55b6c1275
2018-08-15 12:25:44 -07:00
5d27d68779 remove implicit conversion to cpu (#10416)
Summary:
Fixes #9934
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10416

Differential Revision: D9276252

Pulled By: ailzhang

fbshipit-source-id: ea7d9d4f9390edefcd0865a98498f6c4307c291d
2018-08-15 12:25:42 -07:00
9cffe783f1 relax tolerance for two torch.half (float16) tests (#10519)
Summary:
Two tests in the 'nn' test bucket may fail when the torch.half
(float16) data type is used. The assertions used in the tests
intend to allow slight floating point imprecision in the results,
but the tolerances used for the comparisons are too strict for
the half type.

Relax the tolerances so that slight float16 imprecision won't
cause test failures.

The affected tests are:

- test_variable_sequence_cuda
- test_Conv2d_groups_nobias

For more information, see issue:

https://github.com/pytorch/pytorch/issues/7420
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519

Differential Revision: D9343751

Pulled By: soumith

fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a
2018-08-15 12:11:20 -07:00
d93e8ab343 Nomnigraph - Refactor SubtreeMatchCriteria to become a Graph of MatchNode (#10512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10512

SubtreeMatchCriteria now becomes a graph of MatchNode

MatchNode consists of NodeMatchCriteria, nonTerminal and count. This is a cleaner internal representation of the data structure and will bring us much closer to DAG matching.

Note that I still keep the debugString method because convertToDotGraph doesn't currently work with Subgraph.

Reviewed By: bwasti

Differential Revision: D9321695

fbshipit-source-id: 58a76f007a9a95d18cf807d419c2b595e9bc847f
2018-08-15 12:11:18 -07:00
f59bcea2c3 parallel max and min for ATen on CPU (#10343)
Summary:
optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10343

Differential Revision: D9330799

Pulled By: ezyang

fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c
2018-08-15 11:41:01 -07:00
44b029f5b8 move matrix formation for dot products to precompute/request-only (#10531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10531

fixed a naming issue in pairwise_similarity

Reviewed By: huayuli00

Differential Revision: D9331716

fbshipit-source-id: d7de36f20504c08b1c7871ccdffa343221a3da0c
2018-08-15 11:02:10 -07:00
f5a4dd89b5 Implements volumetric (5d) affine grid generation. (#8322)
Summary:
I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline.

I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all.

Diff probably best viewed with whitespace changes ignored.

Thanks for considering!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322

Differential Revision: D9332335

Pulled By: SsnL

fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df
2018-08-15 11:02:08 -07:00
d8ff7ad6f8 generalize order switch ops for 1-3d (#10395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395

Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images.
This diff generalizes them to 1D and 3D, and also add a unit test we didn't have.

Reviewed By: protonu

Differential Revision: D9261177

fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda
2018-08-15 10:09:31 -07:00
0f05f5fb07 ATen layer norm symbolic (#10513)
Summary:
We can't rely on the ATen fallback pathway here because we need to parse out the constant attributes explicitly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10513

Reviewed By: dzhulgakov

Differential Revision: D9322133

Pulled By: jamesr66a

fbshipit-source-id: 52af947e6c44532ef220cb4b94838ca838b5df06
2018-08-15 08:28:52 -07:00
ce8e8feceb Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390

Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified.
* The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold.
* In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'.

Reviewed By: wat3rBro

Differential Revision: D9252726

fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499
2018-08-14 23:54:23 -07:00
e41528a5cc Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379)
Summary:
Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations:

OSError: [WinError6] The handle is invalid
At:
C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda>
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply
C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda
C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_
C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379

Differential Revision: D9330772

Pulled By: ezyang

fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57
2018-08-14 23:10:20 -07:00
f1631c3106 Modify build.sh and test.sh scripts for ppc64le jenkins build and test (#10257)
Summary:
Initial jenkins builds / test scripts for ppc64le.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10257

Differential Revision: D9331278

Pulled By: ezyang

fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838
2018-08-14 21:54:44 -07:00
19ad55cc02 set coalesced=false at sparse transpose() and removed transpose invariants (#10496)
Summary:
- fixes https://github.com/pytorch/pytorch/issues/6219
- removed invariants at https://github.com/pytorch/pytorch/pull/4707
- assume a sparse tensor with coalesced=true when:
1. its elements are unique and
2. the indices are in sorted order
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10496

Differential Revision: D9311214

Pulled By: weiyangfb

fbshipit-source-id: 167fa5a8e9e5f9c800db02f728a1194029f7e4f3
2018-08-14 21:25:37 -07:00
964e30de1d Workaround for Cuda9.2 and GCC7 compilation errors (#10510)
Summary:
Breaking out of #8338

This PR is a workaround for a bug with CUDA9.2 + GCC7.

Here is the error this PR fixed:
.../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace*)’:
.../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’
   BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace* ws)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510

Reviewed By: orionr

Differential Revision: D9319742

Pulled By: mingzhe09088

fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81
2018-08-14 20:54:52 -07:00
b6cc65afea Send, Recv, RecvAnysource, Barrier Op for MPI PG and Python Bindings (#10227)
Summary:
Based on: https://github.com/pytorch/pytorch/pull/10199
Added:
(1) send, recv, recvanysource, and barrier for MPI process group.
(2) python binding
(3) testing

Please review: 2e64f5d675
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10227

Reviewed By: ailzhang

Differential Revision: D9327138

Pulled By: teng-li

fbshipit-source-id: 80496714550a3ca498eb474465ddbd1b8d657d49
2018-08-14 20:10:11 -07:00
26e40fa665 Tensor.accessor now fails on rvalue reference (#10518)
Summary:
Previously, it's easy to do `x[0].accessor<float, 2>()`. However, x[0] is a temporary, so the accessor will point to invalid strides/sizes and probably segfault. With this change, such unsafe code is a compile error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10518

Reviewed By: goldsborough

Differential Revision: D9329288

Pulled By: ebetica

fbshipit-source-id: d08763bee9a19a898b9d1ea5ba648f27baa1992f
2018-08-14 19:41:31 -07:00
17ecc06b65 static casting TIndex (#10514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10514

fix the bug which break the windows build in fused_rowwise_random_quantization_ops.h

Reviewed By: ezyang, jspark1105

Differential Revision: D9322291

fbshipit-source-id: a6a27e87423b6caa973414ffd7ccb12076f2e1e4
2018-08-14 18:42:44 -07:00
60aa416a6d Re-purpose setup_caffe2.py for faster caffe2 build iterations (#10520)
Summary:
setup.py is the official install script, setup_caffe2.py is not used any more
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10520

Reviewed By: yinghai

Differential Revision: D9325548

Pulled By: bddppq

fbshipit-source-id: 3dda87f3dff061b574fd1d5c91859044f065ee33
2018-08-14 18:13:19 -07:00
32bb4040dd Unified type annotation parsing for script frontends (#10279)
Summary:
After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations.

Possible TODOs:
- Clean up the functions marked HACK
- Clean up the Subscript tree-view to better match the Python AST versions
- Can we use this for Python functions? That's the only place annotations.get_signature() is still needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279

Differential Revision: D9319726

Pulled By: jamesr66a

fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa
2018-08-14 18:13:15 -07:00
b69b1c477b Adding python binding for MPI process group (#10199)
Summary:
Based on https://github.com/pytorch/pytorch/pull/10159

Please review ProcessGroupMPI.cpp/hpp and init.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10199

Reviewed By: yf225

Differential Revision: D9324027

Pulled By: teng-li

fbshipit-source-id: 2dd524bee0c7ca8f9594ec3b4f3ebbbb608df337
2018-08-14 15:56:33 -07:00
39bfc2d0d4 Nomnigraph - add diagnostic ability for Subgraph matching API (#10267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10267

isSubtreeMatch now returns a SubtreeMatchResult which contains a match flag and a debugMessage string that contains the reason why a subtree is not matched (if requested).

Reviewed By: bwasti

Differential Revision: D9182429

fbshipit-source-id: 530591fad592d02fb4c31fc398960a14ec90c86a
2018-08-14 15:56:31 -07:00
3c39e857ca Python binding for reduce,allgather,scatter,gather ops and python tests (#10159)
Summary:
Provided python binding for these four ops. Also provided nccl binding test.

Based on https://github.com/pytorch/pytorch/pull/10058

Please only review init.cpp, and test file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10159

Reviewed By: yf225

Differential Revision: D9323192

Pulled By: teng-li

fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
2018-08-14 14:24:57 -07:00
16ecd6f99c Fix Debug Build On Windows (#10359)
Summary:
compile files in torch/csrc with /MDd runtime library option for debug build on Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10359

Differential Revision: D9316946

Pulled By: SsnL

fbshipit-source-id: c84bfad81d61cd49f39b7bce7177edd2b1e8bd69
2018-08-14 13:24:14 -07:00
3f3a30f79c Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups (#10058)
Summary:
Added
- Reduce (both NCCL and MPI)
- AllGather (both NCCL and MPI)
- Gather (MPI)
- Scatter (MPI)

for c10d process groups. This basically finalizes all supported ops for C10d to match THD.

All ops are tested as well.

```
mpirun -np 8 ./ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
```

```
./ProcessGroupNCCLTest
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10058

Reviewed By: yf225

Differential Revision: D9316312

Pulled By: teng-li

fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f
2018-08-14 13:10:21 -07:00
13814d6744 Remove use of data() in optimizers (#10490)
Summary:
After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors.

This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations.

The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`.

For this PR I:

1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`.
2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy!
3. Minor cleanup of the optimizer codebase

ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490

Differential Revision: D9318229

Pulled By: goldsborough

fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5
2018-08-14 13:10:19 -07:00
bdb11e716a Split the dependence of ONNX from test_operators.py (#10151)
Summary:
Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed)

The major changes:
- output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.)
- ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151

Reviewed By: jamesr66a

Differential Revision: D9130706

Pulled By: houseroad

fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f
2018-08-14 12:54:44 -07:00
eea8ab1861 Move common code to RNNCellBase. (#10399)
Summary:
There are three classes `RNNCell`, `LSTMCell`, `GRUCell` inherited from `RNNCellBase`, all defining the identical initialization function `reset_parameters`. Lets move it to the common base.
Another option is to have different initialization for RNN, LSTM and GRU. Maybe those weights whose output is processed with sigmoid (i.e. gain=1) should be initialized differently from those going to tanh (gain=5/3)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10399

Differential Revision: D9316978

Pulled By: SsnL

fbshipit-source-id: a2d9408f0b5c971a3e6c3d42e4673725cf03ecc1
2018-08-14 12:39:59 -07:00
bd497809e2 CAFFE_ENFORCE -> CAFFE_ENFORCE_EQ for error with more information (#10244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10244

Use CAFFE_ENFORCE_EQ(x, y) instead of CAFFE_ENFORCE(x == y) in conv_op_impl.h for error messages with more information.

Reviewed By: viswanathgs

Differential Revision: D9177091

fbshipit-source-id: cf8d10afec1ce6793d3ae0b62f05648722a4130b
2018-08-14 12:24:44 -07:00
2400512a08 Remove unnecessary include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10486

Reviewed By: ml7

Differential Revision: D9305283

fbshipit-source-id: 0d1316f9a72670ddbe8d95ead93603d00ad0f63b
2018-08-14 12:10:04 -07:00
d1442b36f3 add a rebuild_libtorch command for speedier iteration. (#10036)
Summary:
It just calls into `ninja install`. For iterative work on
libtorch.so/_C.so,
`python setup.py rebuild_libtorch develop` should provide quick iteration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036

Differential Revision: D9317869

Pulled By: anderspapitto

fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2
2018-08-14 12:10:02 -07:00
520f4f6cb9 Added some unit test for box_with_nms_limit_op. (#10389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389

Added some unit test for box_with_nms_limit_op.

Reviewed By: wat3rBro

Differential Revision: D9237860

fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731
2018-08-14 11:55:03 -07:00
d043f83019 Add tests for Tensor.* nn.* F.* docs (#10311)
Summary:
Test only for existence for now. I had to skip a lot of them so there a FIXME in the test.

Also I'm not testing torch.* because of namespace issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311

Differential Revision: D9196341

Pulled By: SsnL

fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc
2018-08-14 11:39:46 -07:00
b4462511fd Add LSTMCell backward pass expect tests (#10506)
Summary:
- Exposed get_debug_graph for ScriptModule (gets the debug graph for its
  forward Method)
- Added forward/backward expect tests for lstm and milstm cells. These
  are intended to prevent regressions

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506

Differential Revision: D9316590

Pulled By: zou3519

fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf
2018-08-14 11:39:44 -07:00
e5811becdd Add tags for onnx tensor descriptors (#10502)
Summary:
We missed 2 places to add tags when we create tensor descriptors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10502

Reviewed By: Maratyszcza

Differential Revision: D9312075

Pulled By: yinghai

fbshipit-source-id: 329e83ec5470b0a778d2eda525dd6f2143facbdf
2018-08-14 11:25:52 -07:00
9497383706 Fix some warnings (#10297)
Summary:
Fixing some compiler warnings while looking at symbol visibility.

cc smessmer ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10297

Reviewed By: soumith

Differential Revision: D9195336

Pulled By: orionr

fbshipit-source-id: 04cbfd3549984caec7bdd1a5b39a6d25e80348e9
2018-08-14 10:40:08 -07:00
61bedc96f0 Schema-based creation of graph nodes (#10198)
Summary:
This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload.
Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well:

```
g.insert(aten::add, {v, 3});
```

Keyword arguments are also supported:

```
g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198

Differential Revision: D9307252

Pulled By: zdevito

fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b
2018-08-14 10:25:38 -07:00
3a40baa15c fix a grammatical error: accelerate compute (#10204)
Summary:
"accelerate compute"
a verb shouldn't go with another verb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10204

Differential Revision: D9316699

Pulled By: fmassa

fbshipit-source-id: f1126c594905c3236ffd6b7e57a92552d3d4c1f1
2018-08-14 10:11:15 -07:00
ef44faece2 check attribute existence in torch.legay.nn.SpatialFullConvolution in method type (#8740)
Summary:
This is related to #5255
When adding cuda support for the model, this error comes:
``
AttributeError: 'SpatialFullConvolution' object has no attribute 'finput'
``
here is my short code for test.
https://gist.github.com/kaleaht/26518c3deea5d1d3dda722fbf1f3ecdc

I converted torch7's model also from here.
https://github.com/art-programmer/FloorplanTransformation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8740

Differential Revision: D8872735

Pulled By: SsnL

fbshipit-source-id: 8d97f8b59cdf4049e87be14b78c4608fd973d149
2018-08-14 10:11:13 -07:00
329d901a91 Fold AffineChannel to Conv, the same way as BN (for Detectron models) (#10293)
Summary:
AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293

Differential Revision: D9276789

Pulled By: yinghai

fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b
2018-08-13 22:43:37 -07:00
c618df154e Add intrinsic support for external_input/output to nomnigraph (#10100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10100

nomnigraph has until this point tried to ignore external input and output, as they aren't very well defined (does order matter?).  but for DCE and some of Keren's work they are becoming necessary.  I went ahead and added this to the core nomnigraph converter

Reviewed By: yinghai

Differential Revision: D9105487

fbshipit-source-id: a2e10e3cc84515611d6ab7d4bc54cf99b77729c0
2018-08-13 21:39:17 -07:00
7d16e87f14 Fix byte ordering issue in from_numpy (#9508)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/3671 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508

Differential Revision: D9307186

Pulled By: soumith

fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa
2018-08-13 21:39:16 -07:00
facb293aad Fix FindMKL.cmake for Windows (#10453)
Summary:
Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453

Differential Revision: D9311591

Pulled By: soumith

fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31
2018-08-13 21:09:27 -07:00
fed05cf4cf Fix prim::FusedConcat bug (#10466)
Summary:
Fixes #10456

The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example,

```
import torch
torch.jit.script
def fn(x, y, z):
    x1 = x + y
    y1 = x - y
    w = torch.cat([x1, y1])
    return w + z

x = torch.randn(2, 2, dtype=torch.float, device='cpu')
y = torch.randn(2, 2, dtype=torch.float, device='cpu')
z = torch.randn(4, 2, dtype=torch.float, device='cpu')
fn(x, y, z)
fn.graph_for(x, y, z)
```
produced the following graph:
```
graph(%x : Float(2, 2)
      %y : Float(2, 2)
      %z : Float(4, 2)) {
  %3 : int = prim::Constant[value=1]()
  %y1 : Float(2, 2) = aten::sub(%x, %y, %3)
  %8 : int = prim::Constant[value=0]()
  %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y)
  return (%14);
}
with prim::FusionGroup_0 = graph(%1 : Float(4, 2)
      %5 : Float(2, 2)
      %7 : Float(2, 2)
      %8 : Float(2, 2)) {
  %11 : int = prim::Constant[value=1]()
  %9 : int = prim::Constant[value=1]()
  %x1 : Float(2, 2) = aten::add(%7, %8, %9)
  %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5)
  %2 : int = prim::Constant[value=1]()
  %3 : Float(4, 2) = aten::add(%w, %1, %2)
  return (%3);
}
```

this is a problem because it violates two invariants:
1) all inputs to the FusionGroup must have the same size
2) prim::FusedConcat's output must not be used inside the FusionGroup

This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer.
If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466

Differential Revision: D9296686

Pulled By: zou3519

fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd
2018-08-13 21:09:25 -07:00
099a545376 Hipify Caffe2 binaries (#10468)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468

Reviewed By: yinghai

Differential Revision: D9301178

Pulled By: bddppq

fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c
2018-08-13 20:56:28 -07:00
9a9224e5c1 Remove "locally" from CONTRIBUTING.md (#10495)
Summary:
A bootcamper was confused by the word "locally" and thought it meant on his macbook as opposed to his FB dev machine. Besides the confusion for the FB context, the word "locally" isn't really necessary at all

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10495

Reviewed By: soumith

Differential Revision: D9311480

Pulled By: goldsborough

fbshipit-source-id: 2779c7c60f903a1822a50d140ed32a346feec39e
2018-08-13 20:56:26 -07:00
f6eb966fd2 Fix TanhGradientOperator linker errors (#10426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10426

We were seeing linker errors for TanhGradientOperator in multifeed. Since we only use the float specialization, we might as well define it that way.

Reviewed By: yinghai

Differential Revision: D9280622

fbshipit-source-id: d2ffb698c73a84bb062de5e1f3bda741330e4228
2018-08-13 17:57:10 -07:00
ffb59e5f20 adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending)
Summary:
This operator implements b (1/2/4/8) bit stochastic quantization of a floating
matrix in a row-wise fashion. 8/b floating values are concatenated to a byte
and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629

Reviewed By: harouwu

Differential Revision: D8493264

fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02
2018-08-13 16:39:23 -07:00
c6fc3ab557 fixes printing non-contiguous tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405

Differential Revision: D9302794

Pulled By: soumith

fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30
2018-08-13 16:26:20 -07:00
216961b7bf Remove is_zero_dim_ bool in THTensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10415

Reviewed By: ezyang

Differential Revision: D9274954

Pulled By: gchanan

fbshipit-source-id: 353a52d91556d5b81c3510eb2bf399d102c9a0a4
2018-08-13 12:39:06 -07:00
f59cce95b4 Some symbol annotation fixes for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10369

Differential Revision: D9300187

Pulled By: ezyang

fbshipit-source-id: bf29966ad6aa221332b7232a965fb85e652f866d
2018-08-13 12:26:00 -07:00
382ff03222 Add missing #pragma once
Reviewed By: ml7

Differential Revision: D9299779

fbshipit-source-id: b5b5a1b9ead1b275d3ae54ecfad99617d2869094
2018-08-13 11:39:45 -07:00
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
cd81217f8e A single print statement in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473

Reviewed By: ml7

Differential Revision: D9299196

Pulled By: pjh5

fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb
2018-08-13 11:39:42 -07:00
0b63d12db6 Don't call into Python during Storage destruction. (#10407)
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.

This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407

Differential Revision: D9272400

Pulled By: colesbury

fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
2018-08-13 11:20:07 -07:00
64235d5c01 Rewrite TensorImpl to use TensorTypeId. (#10278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278

Translation to Backend happens immediately before we go into the
Type universe; otherwise we use TensorTypeId.

I allocated TensorTypeId corresponding exactly to existing ATen
Backend.  Only CPUTensorId and CUDATensorId are relevant in the
Caffe2 universe.

Reviewed By: gchanan

Differential Revision: D9184060

fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597
2018-08-13 11:20:04 -07:00
145eb330ad Back out "Back out "Move typeid.h to move to ATen/core"" (#10465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10465

Original commit changeset: 7050fe845e65

Reviewed By: jerryzh168

Differential Revision: D9296375

fbshipit-source-id: cb8161440ba809dcec5027858a29cd026d537fc3
2018-08-13 11:20:01 -07:00
b8530dc1f0 A few additions (#9837)
Summary:
This PR provides 4 fixes / features:

1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e.

```c++
struct MySuperModuleImpl : virtual public torch::nn::Module {
  virtual void myFunction() = 0;
}

struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {};

struct MyModule : public MySuperModule<MyModule> {
  void myFunction() override;
};
```

This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module.

2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise
3. Serialization functions creates autograd history and calls copy_! Bad!
4. Optimizers did not create buffers after add_parameters was called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837

Reviewed By: goldsborough

Differential Revision: D9199746

Pulled By: ebetica

fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299
2018-08-13 10:24:58 -07:00
0a39a9cfbc Add db directory for hipifying (#10428)
Summary:
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10428

Differential Revision: D9297115

Pulled By: bddppq

fbshipit-source-id: d7134ff24102f03f762e6a7b4340055546c9ecfd
2018-08-13 10:24:56 -07:00
56267cc97b gflags improvement to allow CAFFE2_EXPORTS (#10444)
Summary:
Explanation copied from code:

// Motivation about the gflags wrapper:
// (1) We would need to make sure that the gflags version and the non-gflags
// version of Caffe2 are going to expose the same flags abstraction. One should
// explicitly use caffe2::FLAGS_flag_name to access the flags.
// (2) For flag names, it is recommended to start with caffe2_ to distinguish it
// from regular gflags flags. For example, do
//    CAFFE2_DEFINE_BOOL(caffe2_my_flag, true, "An example");
// to allow one to use caffe2::FLAGS_caffe2_my_flag.
// (3) Gflags has a design issue that does not properly expose the global flags,
// if one builds the library with -fvisibility=hidden. The current gflags (as of
// Aug 2018) only deals with the Windows case using dllexport, and not the Linux
// counterparts. As a result, we will explciitly use CAFFE2_EXPORT to export the
// flags defined in Caffe2. This is done via a global reference, so the flag
// itself is not duplicated - under the hood it is the same global gflags flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10444

Differential Revision: D9296726

Pulled By: Yangqing

fbshipit-source-id: a867d67260255cc46bf0a928122ff71a575d3966
2018-08-13 09:54:48 -07:00
64a6f17177 Fix ATen/core header installation. (#10463)
Summary:
Fixes #10353 and fixes #10397.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463

Differential Revision: D9296491

Pulled By: ezyang

fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366
2018-08-13 09:25:49 -07:00
fa5d95a00c Bump onnx to onnx/onnx@0d250de (#10452)
Summary:
0d250dea76
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10452

Reviewed By: houseroad

Differential Revision: D9288037

Pulled By: bddppq

fbshipit-source-id: 206be3ee2b8ebca26f3d8af0597078363ed6d168
2018-08-13 00:09:15 -07:00
3cbe8f0c3e Detect system RocksDB installation with CMake config files. (#7315)
Summary:
On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake.
And it doesn't include/link the RocksDB dependencies either, like:
  * `Snappy`
  * `Shlwapi.lib`
  * `Rpcrt4.lib`

This PR try to detect in config mode first before using private find module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315

Differential Revision: D9287587

Pulled By: Yangqing

fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00
2018-08-12 18:24:10 -07:00
82d11b847e Use CUDA_LINK_LIBRARIES_KEYWORD instead of hacking. (#10437)
Summary:
There's no need to hack.
Using `CUDA_LINK_LIBRARIES_KEYWORD` is the normal way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10437

Differential Revision: D9287579

Pulled By: Yangqing

fbshipit-source-id: d3d575ea8c3235576ba971e4b7493ddb435f92f3
2018-08-12 18:09:20 -07:00
508de8109f Added missing "AT_" prefix to macro. (#10436)
Summary:
For issue #10435
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436

Differential Revision: D9287578

Pulled By: Yangqing

fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50
2018-08-12 18:09:19 -07:00
1756daaa75 Use FULL_CAFFE2 to build caffe2 and python in one shot (#10427)
Summary:
Building caffe2 and pytorch separately will end up duplicated symbols as they now share some basic libs. And it's especially bad for registry. This PR fixes our CI and build them in one shot with shared symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10427

Reviewed By: bddppq

Differential Revision: D9282372

Pulled By: yinghai

fbshipit-source-id: 0514931ea88277029a68fa5368ff4336472f132e
2018-08-12 15:39:12 -07:00
51f154e072 Fix Python lint errors. (#10441)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10441

Reviewed By: Yangqing

Differential Revision: D9285502

Pulled By: ezyang

fbshipit-source-id: 12c94b28bee9cade930c8f260577e81ea1915269
2018-08-11 21:08:50 -07:00
cd53b78bd0 Remove caffe namespace GetEmptyStringAlreadyInited (#10438)
Summary:
A followup cleanup of #10380 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10438

Differential Revision: D9285692

Pulled By: Yangqing

fbshipit-source-id: c73defbef00d3b563240d0b69d85bd0a6e3eb504
2018-08-11 17:39:58 -07:00
ab6afc2b23 Optimize max_pooling for inference for MKL-DNN/IDEEP device (#10156)
Summary:
Optimize the max_pooling operation for inference path by setting the "inference" flag to the underlying MKL-DNN, saving the computation and store of max indices which is only needed for training. To make the API compatible, training mode is still the default and inference mode is set in the optimizeForIdeep path.
Test shows the speed-up of a single max_pooling operation is up to 7X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10156

Differential Revision: D9276755

Pulled By: yinghai

fbshipit-source-id: ad533d53aabb8ccb3b592da984d6269d9b794a8a
2018-08-10 23:14:05 -07:00
d3ccc836de Fix warning in Nomnigraph (#10425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10425

`const size_t` as return value doesn't make sense.

Reviewed By: duc0

Differential Revision: D9281442

fbshipit-source-id: c3d9c94f5dbe516476f0c74f63c35e60893c8140
2018-08-10 22:40:26 -07:00
1dbdc5a93d Back out "Move typeid.h to move to ATen/core"
Summary: Original commit changeset: 21f2c89e58ca

Reviewed By: yinghai

Differential Revision: D9282171

fbshipit-source-id: 7050fe845e6524b965bdd45794a6fa1665b83e34
2018-08-10 21:39:25 -07:00
31646edfff Increase GLOO rendevous timeout
Summary: Increase GLOO rendevous timeout

Reviewed By: teng-li

Differential Revision: D9273544

fbshipit-source-id: 5c22c1d18df3032f019ff12e2a720aea7c390f15
2018-08-10 18:40:18 -07:00
767687835e Replace sudo with --user in CI caffe2 install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10328

Reviewed By: pjh5

Differential Revision: D9275809

Pulled By: ezyang

fbshipit-source-id: c22cb1570c67199b74b2188ad83b1e4828e11911
2018-08-10 15:11:43 -07:00
adbcb3c1dc Move dropout and alpha dropout to ATen (#10384)
Summary:
zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384

Reviewed By: ezyang

Differential Revision: D9272583

Pulled By: apaszke

fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921
2018-08-10 14:55:28 -07:00
5b0be9de59 Remove TH compatibility calls for strides. (#10414)
Summary:
This should just work now that sizes/strides are unified between TH and ATen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10414

Differential Revision: D9274681

Pulled By: gchanan

fbshipit-source-id: 69eb766f4e3a5b6c57b15837cffdef513b6d7817
2018-08-10 13:54:58 -07:00
674f7a9778 Correctly share CUDA Parameters. (#10220)
Summary:
```
    Correctly share CUDA Parameters, requires_grad and hooks.

    Previously, the following was true:

    - If you put a Parameter for a CUDA tensor
      in multiprocessing queue (or otherwise tried to transfer it),
      this failed, saying that we cannot pickle CUDA storage.
      This is issue #9996.

    - If you put a leaf Tensor that requires_grad=True through the
      multiprocessing queue, it would come out the other end as
      requires_grad=False (It should have come out the other end
      as requires_grad=True).  Similarly, backwards hooks were
      lost.

    - If you put a non-leaf Tensor that requires_grad=True through
      the multiprocessing queue, it would come out the other end
      as requires_grad=False.

    The root cause for the first issue was that implementation of
    reductions for Parameter used the superclass implementation
    (tensor) in __reduce_ex__, but this always picks up the
    non-ForkingPickler reduction, which doesn't work with CUDA tensors.
    So, we registered a new ForkingPickler specifically for Parameter,
    and adjusted the code to correctly rewrap a Tensor in a Parameter
    if it was originally a parameter.

    While working on this, we realized that requires_grad and backwards
    hooks would not be preserved in the ForkingPickler reduction
    implementation.  We fixed the reducer to save these parameters.
    However, Adam Paszke pointed out that we shouldn't allow sending
    requires_grad=True, non-leaf Tensors over a multiprocessing
    queue, since we don't actually support autograd over process
    boundar.  We now throw an error in this case; this may cause
    previously working code to fail, but this is easy enough to fix;
    just detach() the tensor before sending it.  The error message says
    so.

    Fixes #9996.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220

Differential Revision: D9160746

Pulled By: ezyang

fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c
2018-08-10 13:54:56 -07:00
0b8a0125ab Fixes torch.log after torch.expand giving incorrect results (#10269)
Summary:
fixes #10241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269

Differential Revision: D9272472

Pulled By: cpuhrsch

fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca
2018-08-10 13:39:38 -07:00
6a55238a3f Grid sampler: nearest interpolation & reflection padding (#10051)
Summary:
closes #9702 .

cc jph00

Commit structure:

1. Change the index calculation logic. I will explain using 1-D for simplicity.

	Previously we have (in pseudo code):

	```
	// 1. get the float locations from grid
	scalar_t x = from_grid()

	// 2. find the integral surrounding indices
	int x_left = floor(x)
	int x_right = x_left + 1

	// 3. calculate the linear interpolate weights
	scalar_t w_left = x_right - x
	scalar_t w_right = x - x_left

	// 4. manipulate the integral surrounding indices if needed
	// (e.g., clip for border padding_mode)
	x_left = manipulate(x_left, padding_mode)
	x_right = manipulate(x_right, padding_mode)

	// 5. interpolate
	output_val = interpolate(w_left, w_right, x_left, x_right)
	```

	This is actually incorrect (and also unintuitive) because it calculates the
	weights before manipulate out-of-boundary indices. Fortunately, this
	isn't manifested in both of the current supported modes, `'zeros'` and
	`'border'` padding:

	+ `'zeros'`: doesn't clip
	+ `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are
	  clipped to the same value, so weights don't matter

	But this is a problem with reflection padding, since after each time we reflect,
	the values of `w_left` and `w_right` should be swapped.

	So in this commit I change the algorithm to (numbers corresponding to the
        ordering in the above pseudo-code)

	```
	1. get float location
	4. clip the float location
	2. find the integral surrounding indices
	3. calculate the linear interpolate weights
	```

	In the backward, because of this change, I need to add new variables to track
	`d manipulate_output / d manipulate_input`, which is basically a multiplier
	on the gradient calculated for `grid`. From benchmarking this addition doesn't
	cause obvious slow downs.

2. Implement reflection padding. The indices will keep being reflected until
	they become within boundary.

	Added variant of `clip_coordinates` and `reflect_coordinates` to be used in
	backward. E.g.,
	```cpp
	// clip_coordinates_set_grad works similarly to clip_coordinates except that
	// it also returns the `d output / d input` via pointer argument `grad_in`.
	// This is useful in the backward pass of grid_sampler.
	scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t *grad_in)
	```
	For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`.
	If `in` is reflected **odd** times in `'reflection'` mode, `grad_in`
	is set to `-1`.

3. Implement nearest interpolation.

4. Add test cases

5. Add better input checking
  Discussed with goldsborough for moving `operator<<` of `at::Device`,
  `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise
  `AT_CHECK` can't find them.)

6. Support empty tensors. cc gchanan

    + Make empty tensors not acceptable by cudnn.
    + Add `AT_ASSERT(kernel block size  > 0)` if using `GET_BLOCKS`
   + Cache `numel` in `TensorGeometry`
      I was going to use `numel` to test if cudnn descriptor should accept a
      tensor, but it isn't used eventually. I can revert this if needed.

7. Add more test cases, including on input checking and empty tensors

8. Remove an obsolete comment

9. Update docs. Manually tested by generating docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051

Differential Revision: D9123950

Pulled By: SsnL

fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6
2018-08-10 12:43:27 -07:00
def3715e82 Minor changes for nicer pip packages (#9544)
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544

Reviewed By: orionr

Differential Revision: D9267111

Pulled By: pjh5

fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
2018-08-10 12:09:46 -07:00
40109b16d0 Remove caffe1 specific proto (#10380)
Summary:
This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation.

Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380

Differential Revision: D9267981

Pulled By: Yangqing

fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390
2018-08-10 11:10:26 -07:00
018790cd4b thread BUILD_SHARED_LIBS through build_pytorch_libs.sh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10272

Differential Revision: D9239337

Pulled By: anderspapitto

fbshipit-source-id: 187b3acb7e85635d9b45a3dd82c98d86a2b51e70
2018-08-10 10:39:31 -07:00
9b8a036873 Fix basic.cpp, which compared equality between a size [1] tensor with… (#10404)
Summary:
… a size [] tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10404

Differential Revision: D9268467

Pulled By: gchanan

fbshipit-source-id: 92bb387358f4030519c6883c12ea69312185446e
2018-08-10 10:39:29 -07:00
e524a8994b Make lengths_host_.CopyFrom synced in LengthsCosineCoherenceOp and LengthsTileOp (#10360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10360

It seems `lengths_host_.CopyFrom(lengthsInput, &context_);` is asynchronous w.r.t. the host while `lengths_host_.CopyFrom(lengthsInput);` is synchronous.

However, according to jerryzh168,  `lengths_host_.CopyFrom(lengths, &context_); context_.FinishDeviceComputation();` is the safest way to guarantee synchronization.

Reviewed By: jerryzh168

Differential Revision: D9197923

fbshipit-source-id: 827eb63d9d15c1274851e8301a793aed39d4fa6b
2018-08-10 10:39:28 -07:00
be5fb8f6fd Move fused RNN kernels into ATen (#10305)
Summary:
As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305

Reviewed By: ezyang

Differential Revision: D9196227

Pulled By: apaszke

fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba
2018-08-10 09:12:05 -07:00
e221791afc Fix typo.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10387

Differential Revision: D9255840

Pulled By: gchanan

fbshipit-source-id: 97b52d4e349c1e2d1970abde7dc6b25e7cf668a0
2018-08-10 08:55:30 -07:00
1e3e26e3e8 Use nDimensionLegacyNoScalars in THTensorDimApply. (#10388)
Summary:
This issue was exposed in https://github.com/pytorch/pytorch/pull/10383.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10388

Differential Revision: D9255836

Pulled By: gchanan

fbshipit-source-id: 88c5a6415c27d56ff54d00a8957fdc1617cfbde7
2018-08-10 08:55:28 -07:00
3667d029b4 Move typeid.h to move to ATen/core (#10163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10163

- Remove dependency on caffe2/core/common.h for ATen/core/typeid.h
  Unfortunately, Windows seems to rely on typeid.h including this
  header, so it is still included from the forwarding header
  caffe2/core/typeid.h
- Deduplicate Demangle/DemangleType with their ATen equivalents

Reviewed By: smessmer

Differential Revision: D9132432

fbshipit-source-id: 21f2c89e58ca1e795f1b2caa316361b729a5231b
2018-08-10 08:45:44 -07:00
e9ad74357e Use serialization container in ir import export (#10394)
Summary:
Copy of #10191 because these changes didn't land with the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394

Differential Revision: D9260816

Pulled By: li-roy

fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558
2018-08-10 00:09:30 -07:00
0950d7a98d support list slicing (#10318)
Summary:
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318

Differential Revision: D9254351

Pulled By: michaelsuo

fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586
2018-08-09 17:25:13 -07:00
b1e3239ec8 Fix some backwards definitions wrt keepdim. (#10382)
Summary:
Before we had 0-dim tensors in TH, we were flexible in what we accepted wrt to the difference between size [] and size [1] tensors in backwards functions because they were identical in TH.  So, we had backwards definitions that were technically incorrect, but happened to work.  This often masks shape issues, adds greatly to code complexity  and thus IMO isn't worth keeping.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10382

Differential Revision: D9244618

Pulled By: gchanan

fbshipit-source-id: 2c29c53a8ffe8710843451202cad6b4323af10e8
2018-08-09 15:11:55 -07:00
209af45614 Back out "[pytorch][PR] Fix bincount for empty input"
Summary: Original commit changeset: 6c4c66c23679

Reviewed By: SsnL

Differential Revision: D9253403

fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076
2018-08-09 14:25:33 -07:00
18d2fcde7a Fix performance of DistributedSampler per #8958
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361

Differential Revision: D9240798

Pulled By: ezyang

fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f
2018-08-09 12:54:37 -07:00
64a60030a6 Don't copy on clamp, clamp_out (#10352)
Summary:
This makes clamp and relu faster (fixes #10276).

The extra copying was introduced when clamp moved to ATen and
the _th_clamp_ wrapper was used to forward to TH/THC,
we remove that and add _th_clamp(_out) instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10352

Reviewed By: ezyang

Differential Revision: D9233590

Pulled By: SsnL

fbshipit-source-id: 4f86a045498e5e577fb22656c71f171add7ed0ac
2018-08-09 12:40:47 -07:00
b43beec070 Fix bincount for empty input (#9757)
Summary:
Added tests too. Fixes #9756 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757

Differential Revision: D8966879

Pulled By: soumith

fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869
2018-08-09 12:40:45 -07:00
cc5b47ff47 Fix the logic for PATH guess on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10372

Differential Revision: D9240207

Pulled By: soumith

fbshipit-source-id: 0933f6fde19536c7da7d45044efbdcfe8ea40e1f
2018-08-09 12:40:44 -07:00
3fa1c1022a Avoid std::thread ctor "cannot resolve" error (#10381)
Summary:
If an `at::test` function is added, gcc can't figure out the `std::thread(test, -1)` resolution.

It is not a problem for current code. I bumped into this when playing with native functions. But I think it is a good to just prevent it from happening in future by removing `using namespace at;`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10381

Differential Revision: D9241614

Pulled By: SsnL

fbshipit-source-id: 972ac3cecff3a50602b3fba463ae1ebd3f53d036
2018-08-09 11:55:40 -07:00
99b10adc01 Fix compile flags for MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10368

Differential Revision: D9240791

Pulled By: ezyang

fbshipit-source-id: 536b093b5c800cc1cf02cbbde9ae341e25d083d1
2018-08-09 09:39:58 -07:00
7d53c876dc Move maybeZeroDim to TH, change condition so it doesn't turn off scal… (#10333)
Summary:
…ars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10333

Differential Revision: D9206091

Pulled By: gchanan

fbshipit-source-id: 492c50189edc2056aa2acce98d49234d2a54ce39
2018-08-09 09:28:57 -07:00
e967fa9757 Fix THTensor_nElement for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10332

Differential Revision: D9206039

Pulled By: gchanan

fbshipit-source-id: 0bc7c15050a6a602f621d3e9ecc3a6ea35481a6a
2018-08-09 09:28:55 -07:00
52d85bedb7 Deal with undefined tensors in unbind backward (#9995)
Summary:
When only part of the outputs of unbind are used in a backward,
the gradients for the others are undefined. This sets those
to zero in to_tensor_list.

Fixes: #9977
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9995

Differential Revision: D9239610

Pulled By: soumith

fbshipit-source-id: eb8d1b3f2b4e615449f9d856e10b946910df9147
2018-08-09 08:54:28 -07:00
b70b7066f7 Keep kEps in one place to make sure they are consistent (#10334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10334

Keep kEps in one place to make sure they are consistent

Reviewed By: xianjiec

Differential Revision: D9202280

fbshipit-source-id: 35d173ce1d1a361b5b8cdbf1eac423e906e7c801
2018-08-09 08:27:42 -07:00
04f381650e Resubmit: Fix dataloader hang when it is not completely iterated (#10366)
Summary:
https://github.com/pytorch/pytorch/pull/9655
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366

Differential Revision: D9237393

Pulled By: SsnL

fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a
2018-08-09 00:10:24 -07:00
037d8d1bab Order Loss functions alphabetically in nn.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10365

Differential Revision: D9237287

Pulled By: SsnL

fbshipit-source-id: 28e9de76b9cfd8f63c8df561ff1531ea8d0803ea
2018-08-08 22:39:55 -07:00
9dfc4edc68 Update NNPACK and cpuinfo submodules (#8564)
Summary:
Bring in extra optimizations in Winograd-based convolution on NEON
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8564

Reviewed By: hlu1

Differential Revision: D9088140

Pulled By: Maratyszcza

fbshipit-source-id: 2089191416db98bdad8f0e4848b1435fcf74a88b
2018-08-08 22:39:52 -07:00
6e49f933ad Check that result is on CPU for CPU unary ops kernels (#10358)
Summary:
Fixes: #10270
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358

Differential Revision: D9233066

Pulled By: soumith

fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35
2018-08-08 21:11:53 -07:00
783f2c60b2 nomnigraph - Enhancements to subgraph matching APIs (#10218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10218

SubtreeMatchCriteria now supports:
- nonTerminal flag : if this is set, it means we only match the root of the subtree and do not care about the children. Example use case: to match an "input" node but does not care how the input is produced.
Additional tests for these new logic are added to subgraph_matcher_test.cc.

Subgraph matching APIs for NNGraph is also added.

(Further enhancement to make the SubgraphMatching API constructs a Subgraph object/more diagnostic information will go later).

Reviewed By: bwasti

Differential Revision: D9156092

fbshipit-source-id: 3f28ac15d9edd474b3e0cd51fd7e6f973299d061
2018-08-08 14:56:23 -07:00
69760e2840 update torch.eig() doc (#10315)
Summary:
This fixes #9383

Update torch.eig() doc, the complex part is written based on https://scc.ustc.edu.cn/zlsc/sugon/intel/mkl/mkl_manual/GUID-16EB5901-5644-4DA6-A332-A052309010C4.htm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10315

Reviewed By: yf225

Differential Revision: D9200723

Pulled By: ailzhang

fbshipit-source-id: d2e186fd24defbc4fdea6c2cf3dc4f7e05e1d170
2018-08-08 06:43:41 -07:00
0d03219a42 Remove hack as integrated builds use FULL_CAFFE2 now (#10320)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10320

Reviewed By: jerryzh168

Differential Revision: D9198902

Pulled By: ezyang

fbshipit-source-id: 8af28d607735e5f4450c40127c1f8c262ea602ce
2018-08-07 21:40:07 -07:00
7d6d7bef6a Enable docker image build for PyTorch using specific python version (#10317)
Summary:
Current Dockerfile builds pytorch using default python within miniconda, which happens to be Python 3.6

This patch allows users to specify which python should be installed in the default miniconda environment used by the pytorch dockerfile. I have tested the build for python 2.7, 3.5, 3.6 and 3.7. Python 2.7 required typing and cython
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10317

Differential Revision: D9204401

Pulled By: ezyang

fbshipit-source-id: 11355cab3bf448bbe8369a2ed1de0d409c9a2d6e
2018-08-07 16:13:33 -07:00
66b3bae47c Add sizesLegacyNoScalars/stridesLegacyNoScalars analog of sizeLegacyN… (#10323)
Summary:
…oScalars,strideLegacyNoScalars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10323

Differential Revision: D9200567

Pulled By: gchanan

fbshipit-source-id: 5580d6f92eef0acb04132f1978436cc31cdf563a
2018-08-07 15:41:28 -07:00
b7bc327180 Remove new_Tensor and generated components
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10194

Differential Revision: D9160559

Pulled By: cpuhrsch

fbshipit-source-id: 133185b3d4258c154dc43f7572dbef6bfa6786f3
2018-08-07 15:09:38 -07:00
5390476297 Add tracing to custom op and simplify tracer overall (#10212)
Summary:
This PR adds tracing infrastructure for custom operators. It also simplifies the tracer overall, and changes the codegen to do more metaprogramming there instead of via C++ (which was necessary for the custom op tracing).

To give an example of the tracer/metaprogramming change, what used to look like this in `VariableType.cpp`:

```
jit::tracer::PreTraceInfo trace_info;
  if (jit::tracer::isTracing()) {
    trace_info = jit::tracer::preRecordTrace(jit::aten::index_select, "self", self, "dim", dim, "index", index);
  }
```

is now simply the inlined version of `preRecordTrace`, minus C++ metaprogramming:

```
torch::jit::Node* node = nullptr;
  if (jit::tracer::isTracing()) {
    auto& graph = jit::tracer::getTracingState()->graph;
    node = graph->create(jit::aten::index_select_out, /*outputs=*/0);
    jit::tracer::recordSourceLocation(node);
    jit::tracer::addInputs(node, "result", result);
    jit::tracer::addInputs(node, "self", self);
    jit::tracer::addInputs(node, "dim", dim);
    jit::tracer::addInputs(node, "index", index);
    graph->appendNode(node);
  }
```

zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10212

Differential Revision: D9199615

Pulled By: goldsborough

fbshipit-source-id: cd4b603c1dc01340ead407228e109c99bdba2cfc
2018-08-07 13:54:15 -07:00
5bb21493fd add fused dropout kernels (#9666)
Summary:
While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask.
Once dropout is moved to aten, these kernels still can be used for efficient implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666

Reviewed By: SsnL

Differential Revision: D8948077

Pulled By: ezyang

fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a
2018-08-07 13:34:53 -07:00
74979495f0 Optional input lengths in CTC op (#10228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10228

Sometimes, for all items in the minibatch in test mode, input length will be
equal to max time steps. This avoids having to pass in an external tensor.

Differential Revision: D9174378

fbshipit-source-id: 22f7d5c311c855d9c3ac59f2a5e773279bd69974
2018-08-07 13:34:51 -07:00
9b1a65bec3 Extends type and shape tracing with device (#9796)
Summary:
This PR extends the existing type and shape metadata tracing and verification done in autograd with device information. This expansion of tracing is required for #8354, is likely useful in other scenarios, and is a healthy sanity check, just like type and shape tracing.

The precise changes are:

- TypeAndShape -> InputMetadata, now includes device()
- Creating InputMetadata is simplified to just require a tensor, and callers were updated to use this simpler invocation wherever possible
- The gradient accumulator of a variable is now reset when set_data() is called if either the type or device changes, and this reset now locks to avoid contention with acquiring the gradient accumulator
- Mismatched devices during backward() will throw a runtime error, just like mismatched type and shape
- (Bonus!) Two uninitialized pointers in THCReduce are now initialized (to nullptr) to prevent build warnings

fyi colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9796

Reviewed By: goldsborough

Differential Revision: D9119325

Pulled By: ezyang

fbshipit-source-id: 76d1861b8d4f74db0575ff1f3bd965e18f9463de
2018-08-07 12:25:17 -07:00
2993c42ee4 Squash some 'invalid escape sequence' warnings. (#10310)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10310

Differential Revision: D9196254

Pulled By: ezyang

fbshipit-source-id: 63bb8e52ac6970fe8e11a2d3c491ab58250dc467
2018-08-07 12:25:15 -07:00
db7a2b1f0d fix doc for as_tensor (#10309)
Summary:
- fixes #9914
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10309

Differential Revision: D9196427

Pulled By: weiyangfb

fbshipit-source-id: c9a01e42c2e9dbfe2bd94ad14651d9f578751de2
2018-08-07 11:24:45 -07:00
dcaafdd04b fix doc of sparse_coo_tensor (#10308)
Summary:
- fixes #9998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10308

Differential Revision: D9196423

Pulled By: weiyangfb

fbshipit-source-id: 23b4ed96e354ac9aa7c268aad105818a2c6d3bd8
2018-08-07 11:24:44 -07:00
20a549b101 Start using a newer version of rocRand that's PyTorch compatible.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10280

Differential Revision: D9196349

Pulled By: Jorghi12

fbshipit-source-id: 4147f2e6e3fdd641b026f3761d684437591405be
2018-08-07 11:09:59 -07:00
fe68879832 Fix dir(torch) for python 3.7 (#10271)
Summary:
fixes #10160.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271

Differential Revision: D9188031

Pulled By: li-roy

fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f
2018-08-07 09:57:51 -07:00
ad76fc8807 s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275

Remove forwarding declaration in caffe2/core/common.h

```
codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN
```

Reviewed By: mingzhe09088

Differential Revision: D9184809

fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8
2018-08-07 08:54:26 -07:00
66f7b8abbe Better macro name hygiene prefixing. (#10274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274

Good C++ libraries don't take up un-namespaced identifiers
like DISABLE_COPY_AND_ASSIGN.  Re-prefix this.

Follow up fix: codemod Caffe2 to use the new macro, delete
the forwarding definition

Reviewed By: mingzhe09088

Differential Revision: D9181939

fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c
2018-08-07 08:54:24 -07:00
18e298305e Increase TCP listen queue size from 64 to 1024 (#10268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268

Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64.

I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused.

Reviewed By: soumith

Differential Revision: D9182216

fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936
2018-08-07 08:26:06 -07:00
1a797ec810 Revert "clean up the build a bit. We no longer need the separate buil… (#10285)
Summary:
…d_libtorch entrypoint (#9836)"

This reverts commit 62e23a1ee47eb66056e6695cefef4e42599f8bd0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10285

Differential Revision: D9193107

Pulled By: ezyang

fbshipit-source-id: de96dce12fdf74410413ae18feee5caf0bed0025
2018-08-07 07:40:20 -07:00
b6402648f4 fix off-by-one bug in open-ended slicing (#10286)
Summary:
Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286

Differential Revision: D9193292

Pulled By: michaelsuo

fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7
2018-08-07 00:39:42 -07:00
5a7c710548 Support some basic list operations (#10225)
Summary:
Support a few basic operators:
- eq
- add
- len
- select (indexing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225

Differential Revision: D9172338

Pulled By: michaelsuo

fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9
2018-08-07 00:39:40 -07:00
1bae6e24c9 Change empty list literal compiler error to match actual builtin name (#10265)
Summary:
I changed the name of this builtin to match Python's native style, but forgot to change the compiler error to match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10265

Differential Revision: D9192963

Pulled By: michaelsuo

fbshipit-source-id: 225ca4cd50fbbe3b31c369deeb3123a84342aab1
2018-08-07 00:39:39 -07:00
fa9ea5bde9 Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264

Since we now have DISABLE_COPY_AND_ASSIGN macro in the file,
CoreAPI is no longer an accurate name.

Reviewed By: dzhulgakov

Differential Revision: D9181687

fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67
2018-08-06 22:27:44 -07:00
da44cf6101 Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263

Auxiliary changes that were needed:
- Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file
  now)

Reviewed By: dzhulgakov

Differential Revision: D9181321

fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa
2018-08-06 22:27:40 -07:00
f1cf3105de Revert D9169049: [pytorch][PR] Add new mkldnn fallback operators
Differential Revision:
D9169049

Original commit changeset: 3bc30250d734

fbshipit-source-id: 65a91594bda699ff9535b27dccd0d1e5d1a8036a
2018-08-06 20:39:30 -07:00
f47bec821e Add new mkldnn fallback operators (#10162)
Summary:
Add new ideep fallback operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10162

Reviewed By: yinghai

Differential Revision: D9169049

Pulled By: wesolwsk

fbshipit-source-id: 3bc30250d7340fea2c442f36d16b85241ceee6e7
2018-08-06 16:56:00 -07:00
25b2e88750 Stop propagating std flags to downstream gcc/nvcc (#10098)
Summary:
When we directly use -std=c++11, it propagates to the downstream applications.

Problems:
1. Gcc flags propagating to nvcc.
2. nvcc flags propagating to nvcc. (Which throws an error like redeclaration of std flag)

This PR will fix these propagation issues!

Similar problem:
https://github.com/FloopCZ/tensorflow_cc/pull/92
https://github.com/CGAL/cgal/issues/2775

Requires: Cmake 3.12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10098

Differential Revision: D9187110

Pulled By: ezyang

fbshipit-source-id: 0e00e6aa3119c77a5b3ea56992ef3bbfecd71d80
2018-08-06 15:30:27 -07:00
8b08eca203 Move ScalarType to ATen/core, splitting out Backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10262

Reviewed By: dzhulgakov

Differential Revision: D9157408

fbshipit-source-id: 11631a35dfc6cb1f73f61ea08d3115f8ef4cb034
2018-08-06 15:30:25 -07:00
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00
e0d43572c1 Cleaner semantics for Reserve (#10261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261

1. Reserve
Currently, Reserve will allocate new memory and old data in the tensor is also preserved,
and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend.
We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be
an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor.
Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve.
2. Extend/Shrink
Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo.
Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it.
3. Usage Patterns

The expected usage patterns right now is:
```
t->Resize({0, 32, 32, 32});
t->template mutable_data<T>(); // set meta_
t->Reserve(100);
auto* t_data = t->template mutable_data<T>();
// feed data to tensor using t_data
for (int i = 0; i < 100; ++i) {
  t->Extend(1, 50, &context_);
  // you can continue to use t_data if you have reserved enough space
  // otherwise, you should call t->template mutable_data<T> again to
  // get the new data pointer since Extend will allocate new memory even
  // though the original data is preserved.
}
```

Reviewed By: ezyang

Differential Revision: D9128147

fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096
2018-08-06 14:40:16 -07:00
a13a53c151 Optimize group_norm on cpu (#10246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10246

Optimize group_norm on cpu

Reviewed By: houseroad

Differential Revision: D9177878

fbshipit-source-id: 41f7aadc6336317c338c75daccef6cb98e9de9de
2018-08-06 14:26:09 -07:00
0c848f4179 Python integration for custom operators (#10149)
Summary:
Adds the Python path to custom operators, including dynamically loading operations into Python.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149

Reviewed By: ezyang

Differential Revision: D9158380

Pulled By: goldsborough

fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02
2018-08-06 13:54:48 -07:00
62e23a1ee4 clean up the build a bit. We no longer need the separate build_libtorch entrypoint (#9836)
Summary:
the new entrypoint is `./tools/build_pytorch_libs.sh caffe2`

this will also speed up CI builds a bit, since we will no longer be compiling all of libtorch twice
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9836

Differential Revision: D9182634

Pulled By: anderspapitto

fbshipit-source-id: 0b9a20ab04f5df2d5c4e7777e4dc468ab25b9ce2
2018-08-06 13:41:51 -07:00
d1a0c2eaf8 Add back THTensor_nDimension. (#10259)
Summary:
Turns out some people are using this via the C-API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10259

Differential Revision: D9180135

Pulled By: gchanan

fbshipit-source-id: 68f59beabf7f8093e67581d7e7ebfe8dff9e6b69
2018-08-06 11:09:41 -07:00
6ac35b35d1 Stop using THLongStorage for sizes/strides, remove THLongStorageView.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10219

Reviewed By: cpuhrsch

Differential Revision: D9159550

Pulled By: gchanan

fbshipit-source-id: 745a6d335613688ed41b32369ee4938907ce8cbb
2018-08-06 09:25:32 -07:00
835a5d4f49 Add cost inference of fwd sparse operators and sparse adagrad (#9314)
Summary:
We should also add cost inference for sparse operators in backward pass later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9314

Reviewed By: orionr

Differential Revision: D8789240

Pulled By: jspark1105

fbshipit-source-id: 68c2170f294fe13bcc409276f599b5fa8a98bcd3
2018-08-06 08:39:16 -07:00
506142ac8a Add warning for building PyTorch using Python 2.7 on Windows (#10247)
Summary:
Fixes #9232.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247

Differential Revision: D9178257

Pulled By: SsnL

fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca
2018-08-05 21:24:02 -07:00
267c397c5b Add the ocr_det model for benchmarking (#10245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245

as title

Reviewed By: sf-wind

Differential Revision: D9176654

fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449
2018-08-05 16:45:35 -07:00
7f2e43a084 Add the ocr_rec model json (#10240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240

as title

Reviewed By: sf-wind

Differential Revision: D9176522

fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d
2018-08-05 16:45:23 -07:00
df23bdc82d add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233)
Summary:
Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`.
But "setup.py clean" deletes these folders because those are described in `.gitignore` file.

To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker.

Discussed in #10206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233

Differential Revision: D9175515

Pulled By: ezyang

fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37
2018-08-05 15:55:44 -07:00
f57e4ce1d5 Update broadcast with alpha to reduce num of launching kernels. (#10235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235

Update broadcast with alpha to reduce num of launching kernels.

Reviewed By: houseroad

Differential Revision: D9175824

fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0
2018-08-04 19:54:20 -07:00
ab293924bb support generic feature in DPER2 (#10197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197

Support generic feature in DPER2

For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature.

For new feature types with specific structure, there should also be corresponding coding changes expected.

Reviewed By: itomatik

Differential Revision: D8788177

fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184
2018-08-04 15:25:13 -07:00
57d2d4bcff Optimize reduce ops for 2d and 3d (#9992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992

Optimize reduce ops for 2d and 3d

Reviewed By: houseroad

Differential Revision: D9042505

fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d
2018-08-04 13:53:58 -07:00
29406a2c4c Fix shared_ptr refcycle in graph executor (#10222)
Summary:
Fixes #10032

When capturing an output, GraphExecutorAutogradFunction creates
SavedVariable with is_output=False and owns it:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87

Constructing SavedVariable with is_output=False makes it own a copy of
the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference
cycle:
6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)

The solution in this PR is to construct the SavedVariable with
is_output=True if the captured value is an output.

Test Plan

Turn on cuda memory checking for JitTestCase. If the test's name
includes "cuda" or "gpu" in it, the cuda memory checking test happens.

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222

Reviewed By: ezyang

Differential Revision: D9162995

Pulled By: zou3519

fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66
2018-08-04 11:39:10 -07:00
2141cb7d53 Update OnnxifiOp to reflect onnx/onnx#1256
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10230

Reviewed By: yinghai

Differential Revision: D9174527

Pulled By: Maratyszcza

fbshipit-source-id: 753493e67446b528d65b146e89ea9f874b469ead
2018-08-04 08:09:19 -07:00
5df8547ff9 Fix ONNX LogSoftmax export. (#9576)
Summary:
This fixes an issue with incorrect `axis=-1` in the exported ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9576

Reviewed By: yinghai

Differential Revision: D9125463

Pulled By: houseroad

fbshipit-source-id: 6f4cb1067d1aa6bb0a9f56690fc21816c98eebfa
2018-08-03 22:09:42 -07:00
36939417b2 Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175

Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf),
intended to help us distinguish between CPU, CUDA, etc. devices.

This replaces at::Device::Type entirely with at::DeviceType, which in turn
is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType
'enum'.  We can't eliminate the 'enum' because this would a pretty drastic
API change (enum is interconvertible with integers, enum class is not) but
we can make the two line up exactly and share code for, e.g., printing.

Reviewed By: Yangqing

Differential Revision: D9137156

fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab
2018-08-03 19:25:06 -07:00
98d60ad43d Replace caffe2::EnforceNotMet with at::Error
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10184

Reviewed By: dzhulgakov

Differential Revision: D9140095

fbshipit-source-id: 3beead825609cec5054347e59903b0b78ef150f8
2018-08-03 19:25:05 -07:00
e2976ea519 Make at::Error look more like caffe2::EnforceNotMet (#10183)
Summary:
- New concept of a message stack; you can add messages
  using AppendMessage
- New concept of a caller; it's just a way to pass along
  some arbitrary extra information in the exception

Coming soon is changing Caffe2 to use at::Error instead of
EnforceNotMet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10183

Differential Revision: D9139996

Pulled By: ezyang

fbshipit-source-id: 6979c289ec59bc3566a23d6619bafba2c1920de9
2018-08-03 19:25:03 -07:00
c7c6e93312 Use target_compile_definitions for AT_CORE_STATIC_WINDOWS (#10213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10213

nvcc only respects definitions, not options.

Reviewed By: dzhulgakov

Differential Revision: D9154388

fbshipit-source-id: 04c4809154df1c61108b65f1115fccdeb336952e
2018-08-03 19:25:02 -07:00
02a64b183c Move ATenGeneral back out of core. (#10224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10224

It doesn't work with Caffe2; use AT_CORE_API from ATen/core/CoreAPI.h
instead.

Reviewed By: smessmer

Differential Revision: D9162467

fbshipit-source-id: 3c7d83c1ccb722ebac469296bdd7c3982ff461e5
2018-08-03 19:25:01 -07:00
41dce17e22 Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210)
Summary:
The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I fixed this by
  reordering the static initializers. Fixes #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210

Differential Revision: D9150697

Pulled By: ezyang

fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35
2018-08-03 18:25:19 -07:00
149d4f776b use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965)
Summary:
- fixes #9141, #9301
- use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion)
- return (N) instead of (N, C) to match the same behavior as MultiMarginLoss
- Note that with this PR, the following behavior is expected:
```
loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none')
loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean')
loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum')

loss.sum() == loss_sum  # True
loss.mean() == loss_mean  # True
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965

Differential Revision: D9038402

Pulled By: weiyangfb

fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9
2018-08-03 17:54:19 -07:00
7bc87172ea Kill Tensor::shares_data (#10217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217

It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages.

Reviewed By: jerryzh168

Differential Revision: D9155685

fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895
2018-08-03 17:40:39 -07:00
3b3aff2ed6 IsType<TensorCPU> -> IsType<Tensor>(CPU) (#10135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10135

att

Reviewed By: yinghai

Differential Revision: D9121892

fbshipit-source-id: 4a4a3bfc450896b619bf92c92ef218aaaefc3081
2018-08-03 17:24:59 -07:00
4aa7469d1f Implement c10 ops needed for benchmark (#9360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9360

This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark.

All implementations are CPU-only and experimental. They're not meant to be used in production.

They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths.

Reviewed By: dzhulgakov

Differential Revision: D8811698

fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482
2018-08-03 16:09:27 -07:00
08e7af20d3 Implement calling of c10 ops from c2 (#9369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9369

This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept.

I used this test script to make sure it works:

    from caffe2.python import workspace, model_helper
    import numpy as np

    data1 = np.random.rand(16, 100).astype(np.float32)
    workspace.FeedBlob("data1", data1)
    m = model_helper.ModelHelper(name="my net")
    sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1")
    sigmoid2 = m.net.Sigmoid("data1", "sigmoid2")

    workspace.RunNetOnce(m.param_init_net)
    workspace.CreateNet(m.net)
    data1 = np.random.rand(16, 100).astype(np.float32)
    workspace.FeedBlob("data1", data1)
    workspace.RunNet(m.name, 1)

    print(workspace.FetchBlob("data1"))
    print(workspace.FetchBlob("sigmoid1"))
    print(workspace.FetchBlob("sigmoid2"))

(and check that both sigmoid outputs are the same)

Reviewed By: ezyang

Differential Revision: D8814669

fbshipit-source-id: eeb0e7a854727f1617a3c592a662a7e5ae226f40
2018-08-03 16:09:23 -07:00
c5abe8844a Add IDEEP fallbacks for Resnet50 training ops (#8541)
Summary:
1. Add fallback gradient ops
2. In fallback ops, set the output Tensor as CPUTensor instead of IDEEPTensor if ndim = 0. Because IDEEPTensor doesn't support 0 dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8541

Reviewed By: yinghai

Differential Revision: D9115233

Pulled By: wesolwsk

fbshipit-source-id: 163e6a76f02bd781c95d1060ccbacf2cab90055e
2018-08-03 15:54:17 -07:00
4680ab4d44 Generalize intrusive_ptr comment (#10216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10216

-

Reviewed By: ezyang

Differential Revision: D9155601

fbshipit-source-id: 154de2e6ad747134413a3ab3ae0b7507b8284d49
2018-08-03 14:25:28 -07:00
97cbcb7d67 Allow releasing/retaining weak_intrusive_ptr (#10214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10214

Seems we're passing weak pointers over C API boundaries. Need this API there too.

Reviewed By: ezyang

Differential Revision: D9154505

fbshipit-source-id: c9889689b87dad5d918f93ba231e01704b8d2479
2018-08-03 14:25:24 -07:00
6456b944fd ctc_loss odds and ends (#10112)
Summary:
- Add convenience wrapper to pass tensors as input_lengths, target_lengths
- Fix documentation example
- Check BLANK >= 0

Thank you, Simon and Soumith for the suggestions!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112

Differential Revision: D9130737

Pulled By: SsnL

fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6
2018-08-03 13:25:18 -07:00
65d32b1705 Remove unused substitutions (#10187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10187

These substitutions don't actually occur in the target file. Remove them.

Reviewed By: ezyang

Differential Revision: D9141567

fbshipit-source-id: fcfddee0b4d31e21763b39d852577d2dbb9ce843
2018-08-03 12:25:59 -07:00
f51f15bb27 Update include paths for ATen/core (#10130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130

Update some include paths to make them internally consistent

Reviewed By: ezyang

Differential Revision: D9119906

fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467
2018-08-03 11:57:02 -07:00
f77b62c3e1 Add documentation for margin arg in Caffe2 MarginRankingCriterionOp (#10186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10186

The MarginRankingCriterionOp margin argument was undocumented.

Reviewed By: jerryzh168

Differential Revision: D9141228

fbshipit-source-id: 724d45dc8e555fbe9d3e8afc7b6bf8ed17bbbdb1
2018-08-03 11:45:51 -07:00
cb0e72e00d Add registerOperator overloads that infer the schema (#10048)
Summary:
This PR adds a way to infer the JIT/script schema of a function from its signature, and then create an operator from the schema and implementation. The implementation function is wrapped into another function, which pops values from the stack into an argument tuple, then invokes the function and pushes the return value back onto the stack, sometimes unpacking the return value if it is a tuple.

Currently the method is called `createOperator`. We may want to think of a nicer way of registering ops in tandem with `RegisterOperators`. It might be very cumbersome to add a template constructor to `Operator`, so maybe we can come up with a chaining method on `RegisterOperators` like `RegisterOperators(schema, func).op(schema.func).op(schema, func)` -- it has to work at startup time (for a static variable) though. We can solve this in another PR.

zdevito apaszke smessmer dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10048

Differential Revision: D9125975

Pulled By: goldsborough

fbshipit-source-id: de9e59888757573284a43787ae5d94384bfe8f9a
2018-08-03 11:45:49 -07:00
7a377b9a53 Add torch.argsort mirroring similar functionality in numpy. (#9600)
Summary:
Per issue #9542
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600

Differential Revision: D8952338

Pulled By: resistor

fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7
2018-08-03 11:45:47 -07:00
c91af1202a Make release_resources non-const (#10192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10192

- release_resources() method must be non-const because it modifies the object
- for intrusive_ptr<const MyClass>, this needs to be const_cast :(

Reviewed By: ezyang

Differential Revision: D9143808

fbshipit-source-id: 9203ff7a7ff3bec165931279371c6e75d4f0ca8c
2018-08-03 11:24:45 -07:00
39476d79a2 Allow releasing/reclaiming intrusive_ptr (#10133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10133

This is useful for C APIs where we want to give owning pointers to/from other languages.

Reviewed By: ezyang

Differential Revision: D9121493

fbshipit-source-id: f903f5830f587b2ba69c0636ddcf1a066bbac2e0
2018-08-03 11:24:43 -07:00
5753746d29 Enable static initializer order ASAN. (#10211)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10211

Differential Revision: D9150687

Pulled By: ezyang

fbshipit-source-id: 4cd458d19a34788c8897905a87d1b52229f67f90
2018-08-03 11:24:42 -07:00
4a6fbf03c6 Make StorageImpl member variables largely private and use getters and setters
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10074

Differential Revision: D9086887

Pulled By: cpuhrsch

fbshipit-source-id: d2dd0d6a1b71d0f864aefb64cd1daefd11dcfb91
2018-08-03 11:10:02 -07:00
50cf326158 Allow type cast between int and float in Script (#10168)
Summary:
The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168

Differential Revision: D9141163

Pulled By: wanchaol

fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208
2018-08-03 10:56:05 -07:00
5d3782b655 Fix IDEEP Copys (#10104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10104

.

Reviewed By: yinghai

Differential Revision: D9109638

fbshipit-source-id: 319cc5711132314dfba0f09ac403522f21ad532b
2018-08-03 10:31:32 -07:00
656bb320b7 EnforceFinite test (#10143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143

att

Reviewed By: xianjiec

Differential Revision: D9122444

fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02
2018-08-03 10:31:29 -07:00
13de6e8dfa Make list literals construct ListType (#10193)
Summary:
Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like:
```
foo = [2]
if True:
    foo = [2, 2]
```
to fail to compile, since `(int)` is not the same as `(int, int)`.

This PR changes things so that list literals construct ListTypes, which can be resized.

Potentially breaking changes introduced:
- Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them.
- Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work)
- Lists must have a single type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193

Differential Revision: D9147166

Pulled By: michaelsuo

fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039
2018-08-03 00:55:23 -07:00
ab0ac6391b fix padding doc not rendered correctly (#10196)
Summary:
somehow sphinx doesn't like the previous wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10196

Differential Revision: D9146817

Pulled By: SsnL

fbshipit-source-id: 2140859bc363af556a021658def946d7afbdb245
2018-08-02 23:26:45 -07:00
4778afb8bb In Expand support using -1 to indicate preserving original size (#10174)
Summary:
zrphercule

https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174

Differential Revision: D9136467

Pulled By: bddppq

fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583
2018-08-02 22:09:47 -07:00
dd527db711 Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179)
Summary:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console

petrex ashishfarmer rohithkrn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179

Differential Revision: D9139657

Pulled By: bddppq

fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee
2018-08-02 21:09:27 -07:00
1f78e06f63 Add g.insertConstant and clean up dead attributes code (#10177)
Summary:
* Changes `insertConstant(g, val)` to `g.insertConstant(val)`.
* Moves SourceRange to its own file to enable it.
* Cleans up dead attribute code in schema matching and graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177

Differential Revision: D9137789

Pulled By: zdevito

fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d
2018-08-02 20:45:31 -07:00
798b530361 weak_intrusive_ptr (#10038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10038

Add weak_ptr ability to intrusive_ptr.

Reviewed By: ezyang

Differential Revision: D9039980

fbshipit-source-id: dd504d6e0d7acf5914cd45845355e28f9df201fb
2018-08-02 17:25:14 -07:00
2bd709a7c8 intrusive_ptr (#9897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9897

Add an IntrusivePtr class to do intrusive refcounting with a shared_ptr-like interface.

Reviewed By: ezyang

Differential Revision: D9018619

fbshipit-source-id: 5de8706aab8eea2e30bead0f59bd6a7ca4d20011
2018-08-02 17:25:12 -07:00
0e9c6898cb Export modules in ir with google protobuf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746

Differential Revision: D9110006

Pulled By: li-roy

fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639
2018-08-02 15:54:51 -07:00
e2ecf3914a Change default CUDA block size from 512 to 128 (#10090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090

Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000)

Reviewed By: pjh5

Differential Revision: D9093573

fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd
2018-08-02 15:40:13 -07:00
7dc870bd7b Delete invalid 'template' keyword (#10173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173

With D9024330, `Extend` fundtion is no more a template, which makes
the `template` keyword here invalid. For some reason current version of LLVM
doesn't catch this, but the latest one does.

Reviewed By: jerryzh168

Differential Revision: D9133462

fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860
2018-08-02 14:50:11 -07:00
dad6e8bb6c Remove capture specifiers in register_aten_ops when they're not needed. (#9669)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9669

Differential Revision: D8952335

Pulled By: resistor

fbshipit-source-id: 8fbbec7a7f55fbeeda3509cb3d339e1db90a53e6
2018-08-02 13:40:31 -07:00
94c67f1454 Replace storageimpl type with scalar_type and backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10097

Differential Revision: D9124287

Pulled By: cpuhrsch

fbshipit-source-id: c976abeeaaa085b972812c1a3270eb6aef0c0dca
2018-08-02 13:31:30 -07:00
538b15d13c Use PYTORCH_PYTHON to call generate_code.py (#10171)
Summary:
Probably fixes https://github.com/pytorch/pytorch/issues/8373#issuecomment-409994847
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10171

Differential Revision: D9135607

Pulled By: SsnL

fbshipit-source-id: 72f535875658c857621e41fd25c2174052714557
2018-08-02 12:54:14 -07:00
9e85a7a9de Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169

Original commit changeset: 2b4d867abfdc

Reviewed By: pjh5, SsnL

Differential Revision: D9135216

fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5
2018-08-02 12:39:01 -07:00
7be071a829 Update onnx to onnx/onnx@2a3a226 (#10167)
Summary:
2a3a226a96
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10167

Reviewed By: houseroad

Differential Revision: D9134738

Pulled By: bddppq

fbshipit-source-id: 9d3fd3c04a584d5626146f174ac78cabfa0e5934
2018-08-02 12:25:19 -07:00
6e85112f12 Adding katex rendering of equations, and required edits to equations. (#8848)
Summary:
This fixes issue #8529.

- Adds Katex extension to conf.py and requirements.txt
- Fixes syntax differences in docs
- Should allow documentation pages to render faster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848

Reviewed By: soumith

Differential Revision: D8677702

Pulled By: goodlux

fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f
2018-08-02 12:25:17 -07:00
ee98533746 Fix compiler warnings on ignored const qualifiers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10142

Reviewed By: yinghai

Differential Revision: D9125502

Pulled By: bddppq

fbshipit-source-id: 8043b2a05507a4707220fa820ab6cc486760a93e
2018-08-02 12:10:37 -07:00
5765549155 codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166

TypeIdentifier is still easy to codemod away from

Reviewed By: smessmer

Differential Revision: D9132840

fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5
2018-08-02 11:54:30 -07:00
4a2f3cc45f Improve lars operator by applying clipping (#9905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905

This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate

Reviewed By: pjh5

Differential Revision: D9020606

fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f
2018-08-02 11:54:28 -07:00
a243e517fa Guard sizes/strides in TH/THC for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10145

Differential Revision: D9125791

Pulled By: gchanan

fbshipit-source-id: d0b8c88c49d7af85971a4531a63fd85a97bfbec7
2018-08-02 11:24:36 -07:00
170d29769b Strings lexing, parsing, implementation in print (#9324)
Summary:
This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print.  If they are encountered elsewhere a NYI exception will be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324

Reviewed By: jramseyer

Differential Revision: D8807128

Pulled By: eellison

fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078
2018-08-02 11:09:03 -07:00
230ca98d4b Remove THTensor_isSize. (#10146)
Summary:
This is part of the process of removing THLongStorage to represent sizes/strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10146

Differential Revision: D9126611

Pulled By: gchanan

fbshipit-source-id: b0d995a4c51dfd54bf76dcfee9a69f37f9d01652
2018-08-02 10:39:43 -07:00
9c818bfbc7 Refactor PythonValue types + use tryMatchSchema for PythonOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132

Differential Revision: D9121327

Pulled By: jamesr66a

fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da
2018-08-02 10:26:58 -07:00
cfa05706ef ROCm contributions week 29 (#9653)
Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653

Differential Revision: D9117791

Pulled By: ezyang

fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
2018-08-02 09:09:00 -07:00
70d47f92db Add support for rand_like op in fusion compiler (#9795)
Summary:
Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795

Differential Revision: D8999029

Pulled By: SsnL

fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8
2018-08-02 08:55:25 -07:00
4a5cd4f6ab nomnigraph - new utility for graph transformation (#10081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10081

Add new utility that make it easier to write graph transformation. Callers now only need to take care of the actual transformation logic. The subgraph matching is simplified because callers only need to specify a simple construct for subtree matching criteria.

The utlity is SubgraphMatcher::replaceSubtree

Some notes:
- replaceSubtree takes a subtree matching criteria, and a lambda that takes a subtree root. It does't not handle any transformations itself. Callers should be responsible for the transformation part, including deleting all nodes in the matched subtree(s). We could enhance this to also handle the deletion part if it turns out to be useful.
- Only sub tree matching is supported for now but we can add general DAG sub-graph support later if needed.

Reviewed By: bwasti

Differential Revision: D9073297

fbshipit-source-id: 465a0ad11caafde01196fbb2eda2d4d8e550c3b6
2018-08-01 23:09:41 -07:00
acbc2744d8 fix bug in 3d group convolution (#9860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9860

For 3D group convolution, in the case of CUDNN 7 and NCHWD order, filter dim is (M, C/group_, k_h, h_w, k_d).

According to CUDA doc (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#grouped-convolutions), the existing implementation is incorrect, and will crash the 3d video model training with group convolution.

In the implementation, `filter.dims(1)` is already `C/group_`. So don't need to divide it by `group_` again.

Reviewed By: BIT-silence

Differential Revision: D9008807

fbshipit-source-id: 2f0d6eb47f4e16d7417a7e3baeba709e3254154f
2018-08-01 22:55:38 -07:00
57061d600a Auto-batching IR transformation for control flow (#9392)
Summary:
Implement IR transformation for control flow

- `prim::Constant`: clone to new graph directly
- `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1`
- `prim::TensorToNum`: clone to new graph
- `prim::ListConstruct`: clone to new graph
- `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond`
- `prim::Loop`:
  - for loop
  - while loop: change while `cond` to `cond_any`, use `cond` to update outputs

test case: hand-written LSTM, greedy search, beam search
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392

Differential Revision: D8822369

Pulled By: ChunliF

fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5
2018-08-01 22:24:35 -07:00
8a25acbba5 Use angle brackets instead of quotes for includes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10153

Reviewed By: smessmer

Differential Revision: D9123768

fbshipit-source-id: 0970552ba4d5772fb3cef2db3af3181d98f85140
2018-08-01 22:02:51 -07:00
5699250acc Move IdWrapper to ATen/core (#10152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10152

- Moved from namespace c10::guts to at
- I fixed the use sites, since there were only three of them
- Macro renamed from C10_ to AT_

Reviewed By: smessmer

Differential Revision: D9123652

fbshipit-source-id: bef3c0ace046ebadb82ad00ab73371f026749085
2018-08-01 22:02:50 -07:00
8cc7d33656 Renumber typeid.h so that the number lines up with ScalarType (#10139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10139

We want CaffeTypeId to be interconvertible with at::ScalarType, and
this means we should have the numbers line up exactly.  Fortunately
this is not too hard to do.

Reviewed By: smessmer

Differential Revision: D9123058

fbshipit-source-id: 7e9bd59ca25a552afe9d2d0a16cedc4f6311f911
2018-08-01 22:02:46 -07:00
6b338c8026 Implement torch.broadcast_tensors (#10075)
Summary:
This exposes expand_outplace to python. Fixes #8076. Fixes #10041.

I didn't name it torch.broadcast because numpy.broadcast does something
slightly different (it returns an object with the correct shape
information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075

Differential Revision: D9125816

Pulled By: zou3519

fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1
2018-08-01 19:18:34 -07:00
191482fa39 Distinguish TupleLiteral from ListLiteral (#10128)
Summary:
Previously, the parser was emitting list literals for tuples, but the IR was representing list literals internally with TupleTypes.

For implementing most list operations, I think it will be helpful distinguish between lists (dynamic size, homogeneous types) and tuples (fixed arity, heterogeneous types)

This diff modifies the parser logic to emit tuple literals. This frees us to represent lists as ListType in the IR, while still properly mapping tuple literals to TupleTypes.

A following diff will actually switch over list literals to emit ListTypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10128

Differential Revision: D9121305

Pulled By: michaelsuo

fbshipit-source-id: e0cad07ae8bac680f7f8113d10e5129d5a1a511d
2018-08-01 19:18:31 -07:00
a44d9d6eb4 Fix tensor check logic in logging (#10138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10138

Note that `TensorCPU` and `TensorGPU` are all refined to be `Tensor` now. Basically they are the same thing. So check like `blob.IsType<TensorCPU>()` is no longer safe as `TensorGPU` can pass the check too.

We need to systematically weed out the such usage in our codebase... @[100008320710723:jerryzh]

Reviewed By: houseroad

Differential Revision: D9115273

fbshipit-source-id: 13b293c73691002eac34e095cdcd96c27183e875
2018-08-01 18:09:19 -07:00
24bb8cecbe Move ATen/Half to ATen/core, and apply lint (#10137)
Summary:
This rewrites checked_convert to use stringstreams, eliminating the use of to_string which is not available on Android stdc++.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/10137

Reviewed By: smessmer

Differential Revision: D9122340

fbshipit-source-id: b7c1bff70e36217305f2b3333c51543ef8ff3d9c
2018-08-01 17:54:58 -07:00
806854a3c5 Pin AMD gpu id in Caffe2 CI (#10144)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10144

Differential Revision: D9125707

Pulled By: bddppq

fbshipit-source-id: 8ef8f3da6ceb1855f28fc24be621b9b4854ff7f9
2018-08-01 17:39:21 -07:00
59c355c870 Move halfbits2float and float2halfbits conversions to ATen. (#10134)
Summary:
This will be needed soon because I want to move Half.h into
ATen/core, and then I cannot have a TH dependency.

I also took the liberty of making the code more strict-aliasing
safe (this is not actually useful, since we will never built Torch
with strict aliasing) by replacing pointer casts between
float and unsigned with a memcpy instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10134

Differential Revision: D9121920

Pulled By: ezyang

fbshipit-source-id: 3b1f86a7c5880e8ac1a589a51f0635bb72e1fd40
2018-08-01 17:09:12 -07:00
4ed5b9267c #8518 Support for empty tuples (#10027)
Summary:
Fixing #8518

Sorry for the pile of commits; I forgot to rebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10027

Reviewed By: ezyang

Differential Revision: D9070028

Pulled By: jramseyer

fbshipit-source-id: 49729c9755ab8a586711e9f6d6a574f3035a7e75
2018-08-01 16:10:00 -07:00
1f6888b70a Allow mobile exporter to export string arrays (#10017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017

Allow mobile exporter to export string arrays

Reviewed By: pjh5

Differential Revision: D9061213

fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7
2018-08-01 16:09:58 -07:00
1d427fd6f6 Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787)
Summary:
…e_/is_variable_

The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I didn't fix
  it in this PR; it's tracked in #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787

Reviewed By: cpuhrsch

Differential Revision: D8980971

Pulled By: ezyang

fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8
2018-08-01 15:34:56 -07:00
edb90387b2 Lint ArrayRef.h (#10129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10129

-

Reviewed By: ezyang

Differential Revision: D9119933

fbshipit-source-id: dd13c6d2a0ab72d943acff5cb02b3278ca8c7ba6
2018-08-01 15:34:54 -07:00
080ae5ea1f Remove implicit ArrayRef -> vector conversion (#9740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9740

- Remove implicit ArrayRef -> vector conversion
- Fix 4 call sites that accidentally did an implicit expensive vector conversion but wouldn't have needed to
- Remove explicit vector conversion from 4 call sites that also didn't need to do that

Reviewed By: ezyang

Differential Revision: D8961693

fbshipit-source-id: 980da9f988083c0072497f9dbcbbf6f516fa311c
2018-08-01 15:34:52 -07:00
e2846c365a Improve ArrayRef (#9610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9610

Mostly making some stuff in ArrayRef constexpr to give it better perf.

Reviewed By: ezyang

Differential Revision: D8926785

fbshipit-source-id: af6d4b05fbc69d20855a80f3edc2b501577a742b
2018-08-01 15:34:50 -07:00
ad6d62250a Add torch.compiled_with_cxx11_abi(). (#10071)
Summary:
It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1.

Fixes #8385
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071

Differential Revision: D9088946

Pulled By: zou3519

fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0
2018-08-01 15:34:48 -07:00
1b1c47dfe5 Update onnx to onnx/onnx@32ac71b (#10126)
Summary:
32ac71b1b9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10126

Reviewed By: houseroad

Differential Revision: D9120544

Pulled By: bddppq

fbshipit-source-id: 4fbe1f16e3b712c092f2f188324173ba1ecc1062
2018-08-01 14:28:54 -07:00
fb24c52dc3 Prepare TH for first class scalars (0-dimensional tensors).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10123

Differential Revision: D9121068

Pulled By: gchanan

fbshipit-source-id: 1cdc6e4b327cf158729cbb4026315be63b159f9d
2018-08-01 14:28:53 -07:00
2d56b5cf8b Prepare THC for first class scalars (0-dimensional tensors).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072

Differential Revision: D9082421

Pulled By: gchanan

fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd
2018-08-01 14:28:51 -07:00
59af5b928a Move UniqueVoidPtr to ATen/core and apply lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10131

Reviewed By: smessmer

Differential Revision: D9121096

fbshipit-source-id: a6861429f06302e3e279ff669961bba34a9fb7a1
2018-08-01 13:25:23 -07:00
2d6738e89e Fix lint in ATen/core (but not ArrayRef)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10124

Reviewed By: smessmer

Differential Revision: D9119768

fbshipit-source-id: c0a56d27401b730956945146d4f48d4d5a9b77a6
2018-08-01 13:25:19 -07:00
f908b2b919 Use google protobuf in pytorch onnx import/export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8469

Reviewed By: houseroad

Differential Revision: D9102041

Pulled By: li-roy

fbshipit-source-id: 805c473745d181b71c7deebf0b9afd0f0849ba4f
2018-08-01 12:54:41 -07:00
5a44be50ab Minor nit in comment in CMakeLists.txt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10125

Reviewed By: smessmer

Differential Revision: D9119766

fbshipit-source-id: 290b804bc552b1c3f68e5129ff60ef7f34307714
2018-08-01 12:39:38 -07:00
e8f27311aa fix a couple problems with libtorch cmake file (#10091)
Summary:
in particular, make not building tests actually work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10091

Differential Revision: D9121366

Pulled By: anderspapitto

fbshipit-source-id: d7d38cf759aa46bff90d3b4f695c20f29039ae75
2018-08-01 11:39:33 -07:00
f126687fbc Add a dump() method to IR Node's. (#10106)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10106

Differential Revision: D9119891

Pulled By: resistor

fbshipit-source-id: 5f41d8890007c639f8f0cdc92d11b128433ad6b8
2018-08-01 11:09:53 -07:00
4070005081 Move C++17.h to ATen/core (#10107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10107

This header is needed for ATen/core stuff

This diff also fixes an issue in C++17.h when run in C++17 enabled compilers.

Reviewed By: ezyang

Differential Revision: D9095209

fbshipit-source-id: d45947956019a7095875f48746b88c414e8865bc
2018-08-01 09:54:59 -07:00
87d57dc5f5 Simplified Operator (#10080)
Summary:
zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things:

1. Removes all code associated with attributed operators,
2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly).

Now rebased on top of https://github.com/pytorch/pytorch/pull/9801

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080

Differential Revision: D9113668

Pulled By: goldsborough

fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750
2018-08-01 09:41:08 -07:00
f1964c43fd Update eigen submodule to fix BUILD_ATEN issue (#10095)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/8338

Updating Eigen submodule to fix an issue we saw with BUILD_ATEN and BUILD_CAFFE2 removal.

cc mingzhe09088 ezyang smessmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10095

Reviewed By: mingzhe09088

Differential Revision: D9109877

Pulled By: orionr

fbshipit-source-id: 90e36c298d8a22398558d70dc5f68a95a7687d6b
2018-08-01 09:41:06 -07:00
a2a7b0c01a Initial documentation for building libtorch (#10087)
Summary:
It's not a particularly pretty process right now, but it may as well
be documented.  I'm not aware of an ideal location for this, so I'm
just dropping it in the docs/ folder for now as recommended by
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10087

Differential Revision: D9119681

Pulled By: anderspapitto

fbshipit-source-id: cd4afb642f3778c888d66a501bc697d0b0c88388
2018-08-01 09:41:02 -07:00
ee964c51f4 NegativeBinomial distribution (#9345)
Summary:
- [x] implement distribution
- [x] add tests
- [x] docs

cc ingmarschuster
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9345

Differential Revision: D8807023

Pulled By: ezyang

fbshipit-source-id: 7bf7f352dd455e0909c58dd94e1bdebba0e8b5c8
2018-08-01 08:39:25 -07:00
2f848ec8ec Use new PyTorch API to make code simpler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9968

Differential Revision: D9088316

Pulled By: li-roy

fbshipit-source-id: 2658fe0c1734d8b064cbad24d8f0d6c341400b4e
2018-08-01 08:39:23 -07:00
fa6b28bf40 Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092)
Summary:
This also makes Backtrace more portable, by disabling its functionality for
mobile builds as well.

It also handles Caffe2 static Windows builds by introducing a new variable,
AT_CORE_STATIC_WINDOWS, which must be set if you're building
ATen on Windows as part of a static library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092

Reviewed By: gchanan, smessmer

Differential Revision: D9094393

Pulled By: ezyang

fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4
2018-08-01 08:39:22 -07:00
b503109f20 Guard sizes/strides in THCUNN for scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10083

Differential Revision: D9093572

Pulled By: gchanan

fbshipit-source-id: a5c27571ec06f8ed30e6b3b492c743444b58d9fe
2018-08-01 08:10:33 -07:00
43b151224e Move grid sampler to ATen (#9961)
Summary:
Spatial version benchmark

|                           | CPUFloat THNN | CPUFloat ATen | CPUDouble THNN | CPUDouble ATen | CUDAHalf THNN | CUDAHalf ATen | CUDAFloat THNN | CUDAFloat ATen | CUDADouble THNN | CUDADouble ATen |
|---------------------------|---------------|---------------|----------------|----------------|---------------|---------------|----------------|----------------|-----------------|-----------------|
| [1024x1x28x28] zero pad   | 2.19281888s   | 0.21280479s   | 2.52922535s    | 0.23944831s    | 0.17494774s   | 0.06242800s   | 0.31270599s    | 0.03706479s    | 0.40542483s     | 0.07391024s     |
| [1024x1x28x28] border pad | 3.04329610s   | 0.24705672s   | 2.29205394s    | 0.22336411s    | 0.17980361s   | 0.06212497s   | 0.31415701s    | 0.03847790s    | 0.43020391s     | 0.07540464s     |
| [32x3x244x244] zero pad   | 18.29301333s  | 2.18566656s   | 19.01662397s   | 3.51552224s    | 1.72487235s   | 0.28933954s   | 2.02466702s    | 0.18178749s    | 2.63671613s     | 0.41391206s     |
| [32x3x244x244] border pad | 18.72205329s  | 2.02600884s   | 20.13017297s   | 3.25979590s    | 1.96455693s   | 0.33070564s   | 2.18666625s    | 0.19546938s    | 2.91268897s     | 0.38465047s     |

For #9702

basics:
+ grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d).
+ input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d)
+ grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar).

variable naming:
+ `tensor_sH` means the stride of `tensor` at the dimension of `H`.
+ `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961

Differential Revision: D9057175

Pulled By: SsnL

fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6
2018-08-01 07:54:46 -07:00
6fc75eadf0 Add CELU activation to pytorch (#8551)
Summary:
Also fuse input scale multiplication into ELU

Paper:
https://arxiv.org/pdf/1704.07483.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551

Differential Revision: D9088477

Pulled By: SsnL

fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b
2018-08-01 07:54:44 -07:00
6f6a1f2d63 fix test_load_error_msg failure (Network is unreachable) (#10021)
Summary:
- fixes [some failure]
- removed use of urlopen in test_load_error_msg]

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021

Differential Revision: D9068108

Pulled By: weiyangfb

fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2
2018-08-01 00:24:01 -07:00
5bd43a7af8 Refactor Seq2SeqModelCaffe2EnsembleDecoder (#10035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035

This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class.

Reviewed By: jmp84

Differential Revision: D9026372

fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8
2018-07-31 23:09:09 -07:00
3d247041e4 Force sync device when ops are sampled for observation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10054

Reviewed By: xw285cornell

Differential Revision: D9071097

fbshipit-source-id: 44357cdf79148e81db86c5350122a1a320a923fb
2018-07-31 21:09:00 -07:00
ec807f2a91 Bail out if netdef has disable_nomnigraph argument
Summary: allow models to override nomnigraph opts

Reviewed By: ajtulloch

Differential Revision: D9035729

fbshipit-source-id: 2b30208263c14ce7039f27c618a3b232bf11ee33
2018-07-31 20:54:46 -07:00
fcd567ed15 Enable Optimization on mobile by default
Summary: Re-enable opt by default

Reviewed By: Maratyszcza

Differential Revision: D8525434

fbshipit-source-id: a61253907251a44cfc59e0b50fb1906c5eb20558
2018-07-31 20:54:44 -07:00
7d2bda7588 Move DDP broadcast coalesced to C++ (#9729)
Summary:
This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions.

pietern The controller you requested could not be found. apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9729

Differential Revision: D8985308

Pulled By: goldsborough

fbshipit-source-id: dc459fe9040273714044152063585e746974752f
2018-07-31 19:54:21 -07:00
294c065384 Changed serialization mechanism of LambdaLR scheduler (#9927)
Summary:
I opened an issue explaining some of my frustrations with the current state of schedulers.
While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix.

This PR changes the way the LambdaLR scheduler gets serialized:
> The lr_lambda functions are only saved if the are callable objects (which can be stateful).
> There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless.

This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function).

Does this functionality need any unit tests?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927

Differential Revision: D9055505

Pulled By: soumith

fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f
2018-07-31 19:39:06 -07:00
aae37324cc fixed a newly introduced regression in softmax (#10066)
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0.  The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x).  These are not the same.  The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training.  I'm not sure how a unit test did not catch this.

```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.0278,  0.0755,  0.3385,  0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066

Differential Revision: D9106995

Pulled By: soumith

fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
2018-07-31 19:28:30 -07:00
f2412fbafc Allow multiple ops.def and clean up code gen in general
Summary:
This is a cleanup and refactoring.
In its original form (changeset 6fdf915c057a) this diff caused a 5% regression
on ads CPU.  The root cause was an omission of link_whole = True, causing
symbols to be stripped in mode/opt and forcing the converter to fallback
causing patterns to be unmatched in the graph transform logic.  This version of
the diff tests for link_whole by including a C++ test of the transform

Reviewed By: yinghai

Differential Revision: D9040511

fbshipit-source-id: 3e19b89989aa68b021762d12af2d0b4111280b22
2018-07-31 19:28:28 -07:00
799c947cf3 add .gitattributes for EOL conversion. (#9813)
Summary:
`.bat` file's EOL is LF, so a build is failed on some Windows machines.
To fix this, add `.gitattributes` and set batch file's EOL to CRLF.

Discussion is in #9677.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9813

Differential Revision: D9026486

Pulled By: soumith

fbshipit-source-id: 341eaa677c35f8476a7eda1bac9827385072eb29
2018-07-31 18:38:43 -07:00
9c0f65fc87 Remove While op stuff (#10102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10102

these codepaths are unused, deleting them

Reviewed By: yinghai

Differential Revision: D9109764

fbshipit-source-id: 8ace42a399806632bfbcada96b383268f0a8ae89
2018-07-31 17:56:25 -07:00
c54d71ba60 Upgrade old transform passes to newer APIs (#10046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10046

stampable

Reviewed By: duc0

Differential Revision: D9075830

fbshipit-source-id: dc65be1d39625ef24ad319b5ce0263ecfe7a10c9
2018-07-31 17:39:35 -07:00
ceb0f14176 Fix SpatialBN Fusion (#10044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10044

The test was subtly broken! This transform wasn't writing to the correct blob and the test did not catch that because it was looking at the old version.

thanks @[100022211048576:kerenzhou] for catching this

Reviewed By: Jokeren

Differential Revision: D9075520

fbshipit-source-id: c31ff0afcd78dd2dc7ffc240e2e89eeda87f1fb4
2018-07-31 17:39:34 -07:00
bf744bea94 Parse and register schema declarations lazily (#9801)
Summary:
This should prevent slow startup times, and will not report as many
errors during static initialization time which are hard to debug

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801

Reviewed By: goldsborough

Differential Revision: D8986603

Pulled By: zdevito

fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc
2018-07-31 17:24:24 -07:00
34c7c56c73 Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077)
Summary:
This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077

Differential Revision: D9087491

Pulled By: gchanan

fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354
2018-07-31 16:43:45 -07:00
ba5d33bede Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060

Differential Revision: D9081387

Pulled By: bddppq

fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6
2018-07-31 15:27:05 -07:00
e04f8bbfa6 Add virtual dtor for ideep context (#10059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10059

Without virtual dtor, it could induce incorrect sized deallocation, messing up the memory. And unfortunately, sized deallocation cannot be detected by ASAN, yet.

Reviewed By: jerryzh168

Differential Revision: D9080526

fbshipit-source-id: c136cf653134e75b074326be2bc03627da42446f
2018-07-31 15:27:02 -07:00
d2178562a4 Remove some unnecessary includes. (#10085)
Summary:
The affected files are all files that are planned to be moved
to ATen/core; the includes are for headers which are NOT slated
for movement.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10085

Differential Revision: D9093746

Pulled By: ezyang

fbshipit-source-id: 2beeffdae26d03d631d2d51b40bf6303759a2f50
2018-07-31 15:13:37 -07:00
1f13453b4d Slightly relax the constraints on argument and return types to script functions (#9969)
Summary:
This lays out initial support for taking and returning a richer set
of types than only tensors. Floats and ints are already valid, lists are
straightforward to add, tuples need some discussion.

Based on top of #9948. Review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969

Reviewed By: zdevito

Differential Revision: D9076973

Pulled By: apaszke

fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0
2018-07-31 14:25:29 -07:00
58fd6e1dd6 Also add ATen/core tests to oss CI (#10029)
Summary:
-
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10029

Reviewed By: ezyang

Differential Revision: D9070030

Pulled By: smessmer

fbshipit-source-id: b5ae79a383dc14e7d79e6a82c5d70e951c9f5168
2018-07-31 13:54:39 -07:00
ee17ed672b Add missing dependencies (#10086)
Summary:
Fix the master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10086

Differential Revision: D9093741

Pulled By: houseroad

fbshipit-source-id: 65e42994ae7d8e0b449d10a8116a7609434aad04
2018-07-31 13:54:38 -07:00
2422801625 fix _pointwise_loss for target gradients (#10018)
Summary:
_pointwise loss has some python special casing, we converted reduction to aten enums too early.

fixes #10009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018

Differential Revision: D9075489

Pulled By: li-roy

fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162
2018-07-31 13:39:58 -07:00
56d1a82b31 Add shape inference when converting from onnx to caffe2 (#10037)
Summary:
Otherwise, some RNN case conversion may fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10037

Reviewed By: orionr

Differential Revision: D9072298

Pulled By: houseroad

fbshipit-source-id: 080f589eba8618719453feb15a7a494fe5380dd0
2018-07-31 12:42:02 -07:00
371a786b18 Errors out when Openmpi < 2.x.x with distributed. (#10015)
Summary:
This PR fixes #9418 .
Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version.
I've tested on 2.1.1 and 3.0.0 and they work well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015

Reviewed By: soumith

Differential Revision: D9088103

Pulled By: ailzhang

fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045
2018-07-31 12:24:40 -07:00
1ae520c704 Add AT_CHECK for null storage. (#9823)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9823

Differential Revision: D9029433

Pulled By: ezyang

fbshipit-source-id: 6101556305593c66f618b20d8c2a084ae2558ea8
2018-07-31 12:09:25 -07:00
685224aa14 Add CTC loss (#9628)
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.

This could eventually fix #3420

I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.

I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.

Thank you for looking!

In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:

```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```

Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.

Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.

My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length  = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16

torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()

def time_cuda_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo, culog_alpha = torch._ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_cudnn_ctc_loss(groupt, *args):
    torch.cuda.synchronize()
    culo, cugra= torch._cudnn_ctc_loss(*args)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

def time_warp_ctc_loss(grout, *args):
    torch.cuda.synchronize()
    culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
    g, = torch.autograd.grad(culo, args[0], grout)
    torch.cuda.synchronize()

if sys.argv[1] == 'cuda':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
    lpcu = log_probs.float().cuda().detach().requires_grad_()
    args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
    grout = lpcu.new_ones((batch_size,))
    torch.cuda.synchronize()
    print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
    import warpctc
    activations = activations.cuda().detach().requires_grad_()
    args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
    grout = activations.new_ones((batch_size,), device='cpu')
    torch.cuda.synchronize()

    print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628

Differential Revision: D8952453

Pulled By: ezyang

fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
2018-07-31 11:09:48 -07:00
430e44480f Delete some obsolete steps in the ROCm build. (#10005)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10005

Differential Revision: D9066107

Pulled By: ezyang

fbshipit-source-id: 346f654214cff1c956a4022173347d95657ee9d4
2018-07-31 11:09:46 -07:00
f779202711 Correctly set CAFFE2_DISABLE_NUMA when USE_NUMA=OFF in cmake (#10061)
Summary:
previously https://github.com/pytorch/pytorch/blob/master/caffe2/core/numa.cc still gets compiled even when USE_NUMA=OFF
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10061

Reviewed By: houseroad

Differential Revision: D9081385

Pulled By: bddppq

fbshipit-source-id: ad28b647e0033727839770b1da0fba341b1b7787
2018-07-31 11:01:51 -07:00
cba03e2ebe Handle dynamic repeats in onnx symbolic (#10052)
Summary:
ONNX Tile can takes the `repeats` as dynamic input
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10052

Differential Revision: D9076841

Pulled By: bddppq

fbshipit-source-id: ddd692c5f5846c8fdba019baa9fad83ef9638da4
2018-07-31 10:39:50 -07:00
0c11101eca Prepare THNN/THCUNN for first class scalars. (#10023)
Summary:
I previous did some transformations, e.g. _nDimension,_dim -> nDimensionLegacyAll, nDimension -> nDimensionLegacyNoScalars.
But this didn't touch dim(), which needs to be updated to support scalars.  Instead of doing an (ugly) move, I audited the call sites and updated the cases that could be size 1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10023

Differential Revision: D9068996

Pulled By: gchanan

fbshipit-source-id: c63820767dd1496e908a5a96c34968482193f2c5
2018-07-31 10:39:48 -07:00
c2d9d2888b Fix typo in tensors.rst (#10073)
Summary:
An tensor -> A tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10073

Differential Revision: D9087421

Pulled By: soumith

fbshipit-source-id: 6713f5a5e11fb11dff0ab5d2d6274f7837c6625f
2018-07-31 10:13:40 -07:00
68cbe37c6a fix the reference link path
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9240

Reviewed By: SsnL

Differential Revision: D8764196

Pulled By: ezyang

fbshipit-source-id: 3efc70714406d801ed74f52313beca61129593c7
2018-07-31 09:09:46 -07:00
5e5c15dd42 Add (constant size) TensorLists to JIT, use them in cat and stack nodes (#9948)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9948

Reviewed By: ezyang

Differential Revision: D9033666

Pulled By: apaszke

fbshipit-source-id: 02d75e391ed6dee62500842df50f0b6ee5e38846
2018-07-31 07:39:52 -07:00
6fb9acfc16 Revert empty n-dim and ATen in C2 integration builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064

Differential Revision: D9082082

Pulled By: gchanan

fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa
2018-07-31 07:25:56 -07:00
78b806c861 Fix the onnx symbolic for upsample (#10001)
Summary:
We missed the upsample symbolic when bumping up the opset to 7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10001

Reviewed By: bddppq

Differential Revision: D9067212

Pulled By: houseroad

fbshipit-source-id: 3e285d2800a32cb04fa82f8e7f261bdd010a8883
2018-07-30 21:39:48 -07:00
37a226de63 When BUILD_ATEN=OFF, use ATen/core directly (#10019)
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019

Reviewed By: smessmer

Differential Revision: D9067262

Pulled By: ezyang

fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
2018-07-30 21:09:55 -07:00
aa36a5d01c Add typing into caffe2 requirements.txt for USE_ATEN (#10047)
Summary:
I was dumb lol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10047

Differential Revision: D9076023

Pulled By: bddppq

fbshipit-source-id: 10587875d04ac2aed2e015846fc73ce9e4717a4f
2018-07-30 20:09:21 -07:00
51539fa383 Add pyyaml into caffe2 requirements.txt for USE_ATEN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10039

Reviewed By: houseroad

Differential Revision: D9074261

Pulled By: bddppq

fbshipit-source-id: 26df516633d5a4ec539a03a62cf9e7839e1e1964
2018-07-30 18:11:25 -07:00
8f0a229078 Fix HPTT path for 0-sized inputs.
Reviewed By: Maratyszcza

Differential Revision: D9068091

fbshipit-source-id: 4aeac45f9732a86979a08488637bf0ba6cc79b34
2018-07-30 17:54:57 -07:00
788b2e996d nomnigraph - minor cleanup of Graph.h (#9890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9890

Minor cleanups for Graph.h to make it more consistent with our style guide

Also fix opt/device.cc and binary_match_test.cc to not access subgraph.nodes_ which is now private

Reviewed By: bwasti

Differential Revision: D9017108

fbshipit-source-id: 9f5cba4a2cd2a452a955005f4704f6c120bbc1d5
2018-07-30 16:24:03 -07:00
e0a0234018 Remove C++14 feature (#10022)
Summary:
Which test should I look at, bddppq?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10022

Reviewed By: bddppq

Differential Revision: D9068732

Pulled By: yinghai

fbshipit-source-id: 241ef72c7fac0ed0b8c58ecdffbb5e24eb956217
2018-07-30 16:24:02 -07:00
3e3f40aeeb Update onnx to latest master (#10024)
Summary:
df01dbc005
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10024

Reviewed By: houseroad

Differential Revision: D9069464

Pulled By: bddppq

fbshipit-source-id: 751328352cd495e27b6bd533f4632d3d6d06c4a6
2018-07-30 15:54:34 -07:00
e57cb4a1b2 Add a Constant Propagation Pass to the JIT (#8808)
Summary:
Adding a constant propagation pass to the JIT. I have added examples to the expect files.

There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false.  I have added a test for each case in test_jit.py file as expected failures.

To be consistent with DCE, python ops & CPP ops are treated as not having side-effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808

Reviewed By: wanchaol

Differential Revision: D8906770

Pulled By: eellison

fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c
2018-07-30 15:54:31 -07:00
db96a0951f Add SIMD version to GFTRL optimizer (#9698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698

Add SIMD version to GFTRL optimizer

Differential Revision: D8949723

fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26
2018-07-30 15:27:24 -07:00
9987282134 Use Retainable as base class for StorageImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9956

Reviewed By: gchanan

Differential Revision: D9066103

Pulled By: cpuhrsch

fbshipit-source-id: 1a5a2ace306308707add3d0e0c1fc861f5c79705
2018-07-30 15:08:56 -07:00
7214754663 Check and return when numel() == 0 in Loops.cuh.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10031

Reviewed By: colesbury

Differential Revision: D9070346

Pulled By: gchanan

fbshipit-source-id: d6ad4e6ca43d334f5be42fea35915270dd8f405e
2018-07-30 15:01:28 -07:00
57750bd638 Enable ATen in C2 in integration builds to test ONNX ATen conversions (#10014)
Summary:
zrphercule
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10014

Reviewed By: houseroad

Differential Revision: D9061842

Pulled By: bddppq

fbshipit-source-id: 1e1c2aeae62dd2cc5c6a8d5e1d395ea5cf882734
2018-07-30 15:01:13 -07:00
6c7fb1582f Introduce __array_priority__ on torch.Tensor (#9651)
Summary:
This causes numpy to yield to the torch functions,
e.g. instead of numpy array/scalar __mul__ converting the tensor to
an array, it will now arrange for the Tensor __rmul__ to be called.

Fixes case 2 of #9468
I also makes case 3 and 4 equivalent but does not fix them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651

Differential Revision: D8948079

Pulled By: ezyang

fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db
2018-07-30 14:39:43 -07:00
ea3c36b822 NumPy Scalar to PyTorch Scalar (#9225)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4985 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225

Differential Revision: D8769317

Pulled By: ezyang

fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28
2018-07-30 14:39:40 -07:00
c9eab34e63 Fix Caffe2 with ATen conda build failure (#10020)
Summary:
Extracted from 627624627e and in support of https://github.com/pytorch/pytorch/pull/10019

cc pjh5 mingzhe09088 ezyang smessmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10020

Reviewed By: pjh5

Differential Revision: D9068124

Pulled By: orionr

fbshipit-source-id: 4dd4910136a312b6517c65ce8802837108475f89
2018-07-30 14:10:02 -07:00
04939a4745 Match parameter names and = default (#9737)
Summary:
More clang tidy cleanups in `torch/csrc`. This time:

1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.

Also updated my script a little bit.

apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737

Differential Revision: D9069069

Pulled By: goldsborough

fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
2018-07-30 14:10:00 -07:00
40a8239984 Fix a bug in argument spec (#9958)
Summary:
Non-tensor types did not set the running total_dims count, causing corrupted data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9958

Reviewed By: jamesr66a

Differential Revision: D9065621

Pulled By: zdevito

fbshipit-source-id: 0ac1fcdf6da076a9c9ebd5d70ce9126e3f8e722e
2018-07-30 13:08:59 -07:00
faa96c1c47 Deal with spaces in einsum equation string (#9994)
Summary:
Fixes #9930
Thank you, vadimkantorov for the report.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994

Differential Revision: D9042876

Pulled By: ezyang

fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b
2018-07-30 12:57:56 -07:00
ce5f0d40b6 Enable n-dimensional empty tensors. (#9947)
Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947

Reviewed By: ezyang

Differential Revision: D9032778

Pulled By: gchanan

fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
2018-07-30 12:33:17 -07:00
73a60efccc Fix Caffe2CTScan error (#9962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9962

att

Reviewed By: hlu1

Differential Revision: D9036869

fbshipit-source-id: 3155af00c62d489f998cbfba07121c4fd20e1c6f
2018-07-30 12:33:15 -07:00
b4f8c60931 Don't use the XML reporter for Catch2. (#10012)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10012

Differential Revision: D9057766

Pulled By: ezyang

fbshipit-source-id: 12148a8cf3061423c61b3e7b36864dfcdb1138a1
2018-07-30 11:25:09 -07:00
9a9a7325c6 Remove the generation of storage files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9954

Reviewed By: gchanan

Differential Revision: D9035947

Pulled By: cpuhrsch

fbshipit-source-id: 9b56c7a68e3f562ea11b9265a5fa234838f2b4e0
2018-07-30 09:53:57 -07:00
432ca747b0 Don't seed GPUs if there are none available. (#9931)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9931

Differential Revision: D9051375

Pulled By: ezyang

fbshipit-source-id: 1721f6217e07f80adc107d95e897cd7dd488659a
2018-07-30 08:23:53 -07:00
3609977d7f Update onnx to onnx/onnx@c761845 (#9964)
Summary:
c761845c7f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9964

Reviewed By: houseroad

Differential Revision: D9038133

Pulled By: bddppq

fbshipit-source-id: 6ce740944e636175d2de4602edb92cc4d7e8e5ac
2018-07-29 23:10:12 -07:00
5ff1551eb9 ATen's emscripten support (#9803)
Summary:
Not sure if anybody is interested but I managed to infer a `GRU` fine in `wasm` using ATen's compiled with emscripten. It was quite trivial to fix the configuration.
It also passes most of the tests, specially all scalar tensor tests.

The command line to configure was, but could be simplified:
```
emconfigure cmake -DAT_LINK_STYLE=STATIC -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DCMAKE_C_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_CXX_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_INSTALL_PREFIX=/home/sugar/aten-wasm ../
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9803

Differential Revision: D9004610

Pulled By: ezyang

fbshipit-source-id: db26c59f27162ed80f6aee2973c4cb9252d3d1e4
2018-07-29 20:39:00 -07:00
3d6015db0e Add essential PATH for the Windows PyTorch loading process (#9920)
Summary:
Fixes #9818.
It seems original Python doesn't add `[PYTHONPATH]\Library\bin` into `PATH`. We try to add it before dll loading process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9920

Differential Revision: D9040825

Pulled By: soumith

fbshipit-source-id: c07fff71b2aea254a396042ab677696f6829aac7
2018-07-29 08:23:59 -07:00
56974a06b5 Revert D8909766: [caffe2] Simplify order switch operators
Differential Revision:
D8909766

Original commit changeset: 17a302d5bf4a

fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba
2018-07-28 18:40:13 -07:00
eee01731a5 Adds the default value for the amsgrad arg to the Adam docstring (#9971)
Summary:
Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971

Differential Revision: D9040820

Pulled By: soumith

fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c
2018-07-28 09:23:45 -07:00
b99492a507 Fix BlobStatRegistry HIP BlobStatGetter registration issue (#9973)
Summary:
This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7df. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9973

Differential Revision: D9040671

Pulled By: bddppq

fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922
2018-07-28 02:23:40 -07:00
46d8002800 Fix bug that always uses the same blob when repeating poolings
Reviewed By: houseroad

Differential Revision: D9027902

fbshipit-source-id: 957702ad9736812ec5aa32066d286c2c3adffc49
2018-07-28 00:09:16 -07:00
47c1badf90 Fix the clamp special case and gradient problem on None, add None to JIT (#9596)
Summary:
Supersedes #8925

This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like:

```python
torch.jit.script
def func():
    x = torch.randn(3, 3, requires_grad=True)
    y = torch.clamp(x, None, 0) # max = 0
    y = torch.clamp(x, min=None, max=0)
```

In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp.

In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed.

Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp).

zdevito jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596

Reviewed By: zdevito

Differential Revision: D8940839

Pulled By: wanchaol

fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28
2018-07-27 22:54:33 -07:00
851c18dd20 PyTorch File Format API (#9900)
Summary:
This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900

Reviewed By: zdevito

Differential Revision: D9021057

Pulled By: jamesr66a

fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3
2018-07-27 22:24:57 -07:00
d913db70f2 Handle the "spatial" attribute in onnx BatchNormalization op (#9492)
Summary:
If we have this "spatial" attribute and its value equals to 1, we could just remove this attribute and convert this op to caffe2 SpatialBN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9492

Differential Revision: D8988165

Pulled By: houseroad

fbshipit-source-id: a9218dc9cd5fab43deb371f290f81285f5283231
2018-07-27 22:09:15 -07:00
bcba5a50d1 Fix EnforceFiniteOp
Summary: att

Reviewed By: kennyhorror

Differential Revision: D9040248

fbshipit-source-id: 0da0f3b1ce51375731098cc86c92f35953be0861
2018-07-27 22:01:23 -07:00
ab4e209007 Back out "[caffe2][nomnigraph] Allow multiple ops.def and clean up code gen in general"
Summary: Original commit changeset: 6fdf915c057a

Reviewed By: yinghai

Differential Revision: D9040008

fbshipit-source-id: 33fd5d4ddc0ec8cae56cf86f6d63b6f666e51a3e
2018-07-27 20:09:14 -07:00
607688e928 Adding reciprocal operator and a test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908

Differential Revision: D9035809

Pulled By: virtan

fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7
2018-07-27 18:24:43 -07:00
ee827f6ba3 Fix a testcase in logsoftmax onnx export (#9660)
Summary:
We only support special case. The original dim is not supported by ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660

Reviewed By: bddppq

Differential Revision: D8965507

Pulled By: houseroad

fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776
2018-07-27 17:54:32 -07:00
12a1af3731 Adding conv tests with explicit algo definition
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798

Differential Revision: D9034663

Pulled By: virtan

fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d
2018-07-27 17:39:17 -07:00
9eeb4e17af Split gather op for easier smaller code size (#9916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9916

att

Differential Revision: D8961085

fbshipit-source-id: 39a9838647dc97611e77beb0607c4655de727ada
2018-07-27 17:15:33 -07:00
c3fe071483 Update hip files (#9826)
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
2018-07-27 16:54:39 -07:00
a532c1a48c Fix default argument value for CTCGreedyDecoder op (#9747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747

Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases.

Reviewed By: houseroad

Differential Revision: D8963635

fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb
2018-07-27 16:33:07 -07:00
eb9bb1f09a Travis CI: Run flake on Python 2.7 and 3.7 (#9953)
Summary:
Flake8 will produce different results on Python 2 and 3.  Python 3.7 has __async__ as a reserved word https://github.com/pytorch/pytorch/pull/4999.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9953

Differential Revision: D9035415

Pulled By: soumith

fbshipit-source-id: 8a46e028a2e20a7e3f6d90137020268d65a7cc64
2018-07-27 14:43:26 -07:00
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
e3c4057b6c Eliminate an extra lookup in the hashtable during CSE. (#9668)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9668

Differential Revision: D8955185

Pulled By: resistor

fbshipit-source-id: f3f929efc11be63850bd863679cc7b297c98d679
2018-07-27 14:43:22 -07:00
ef9801f32c Merge THStorage into at::Storage
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9772

Reviewed By: ezyang

Differential Revision: D9019375

Pulled By: cpuhrsch

fbshipit-source-id: d5185e29747929d648e4260db4967452cd40f563
2018-07-27 13:53:55 -07:00
6ed41adb04 Use round-to-negative division when computing output sizes for convolutions involving striding and dilation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640

Differential Revision: D8948081

Pulled By: resistor

fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595
2018-07-27 13:22:54 -07:00
8c0355c90d convert lambd directly to scalar_t at hardshrink (#9919)
Summary:
- convert lambd directly to scalar_t instead of creating a tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9919

Differential Revision: D9026708

Pulled By: weiyangfb

fbshipit-source-id: d20ab06ecc12aa972ee9d1323ee2f84abf8d5ffd
2018-07-27 13:22:52 -07:00
ce0b895a0c Fix UBSAN error in ONNX peephole pass, make it more robust.
Summary: Minor fix for a bug introduced by D9004285

Reviewed By: anderspapitto

Differential Revision: D9028762

fbshipit-source-id: 9b9c5eef30e61d7ae19784e0418fa29bad2b5564
2018-07-27 12:38:56 -07:00
c77e4bc4d5 export tensor(ArrayRef, options) on Windows (#9904)
Summary:
I hope this helps me for the windows build failure in #9628 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9904

Differential Revision: D9026715

Pulled By: soumith

fbshipit-source-id: bb97d41d060823f5a37bfc9a1659815b8b9f4eab
2018-07-27 12:14:52 -07:00
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
94439d7df4 Suppress the vptr warning in ubsan (#9909)
Summary:
Unblock https://github.com/pytorch/pytorch/pull/8469
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9909

Differential Revision: D9023650

Pulled By: houseroad

fbshipit-source-id: 7682a9cd7905e98c802b820ad59745672b32970d
2018-07-27 10:28:07 -07:00
c0bacc6284 Guard test_lapack_empty with has_magma. (#9936)
Summary:
CUDA lapack functions generally don't work unless has_magma is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936

Differential Revision: D9028579

Pulled By: gchanan

fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055
2018-07-27 10:09:00 -07:00
bf32ea8094 Fix dimension check in 1D instance norm, allowing 2D tensors alongside 3D. (#9924)
Summary:
Fixes #9776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9924

Differential Revision: D9028328

Pulled By: soumith

fbshipit-source-id: d5f22abb2be83b34aee95ebe144c97519a6854f8
2018-07-27 09:24:07 -07:00
d3ba9a173e Handle case where THC btrifact doesn't zero info. (#9907)
Summary:
This was showing up in the n-dimensional empty tests as flaky because it's reading uninitialized cuda memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9907

Differential Revision: D9021413

Pulled By: gchanan

fbshipit-source-id: 31542b7597919df9afd6e528bb108a4a3e8eaf60
2018-07-27 09:11:44 -07:00
1af1b0c2a5 Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895

The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward.
Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!):
```
nDimension                 corresponds to the "true" ATen dimension. TODO: implement.
nDimensionLegacyNoScalars  correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors.
nDimensionLegacyAll        corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors
                           and tensors with a dimension of size zero are collapsed to 0-dimensional tensors.
```
So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll.
These are just codemods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835

Reviewed By: ezyang

Differential Revision: D8999338

Pulled By: gchanan

fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311
2018-07-27 08:56:38 -07:00
bc66d98248 Fix narrow on empty tensors after negative size support.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9838

Differential Revision: D9002345

Pulled By: gchanan

fbshipit-source-id: 13f4bacff94d9d0ea31a3b73a75b9b3e774eabf5
2018-07-27 07:55:20 -07:00
7b375ed362 fix ParameterDict doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9918

Differential Revision: D9026402

Pulled By: soumith

fbshipit-source-id: d0459dcda631e8921ab39725b9045e03960da5c9
2018-07-27 01:10:50 -07:00
a709f23225 revise a little spell mistake in tensor.py (#9868)
Summary:
Hello! I just find a small spell mistake while reading this source code. Just PR it, Thx!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9868

Reviewed By: gchanan, ezyang

Differential Revision: D9016030

Pulled By: soumith

fbshipit-source-id: fc3877177be080adbdbda99a169e401691292ebb
2018-07-27 00:55:03 -07:00
4a192bcc3d Rename onnx integration tests file to avoid confusion
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9913

Differential Revision: D9026787

Pulled By: bddppq

fbshipit-source-id: a3e7e79973abc4f5fe163f3e86b24382a1efd082
2018-07-26 23:40:41 -07:00
8cb1eef7b9 Unify IR operator representation (stop using attributes in the JIT) (#9807)
Summary:
Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..."

I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier.

Known issues/regressions/future tasks:
- `aten::lerp` and `aten::clamp` are no longer fusable
- `CreateAutodiffSubgraphs` needs a rewrite
  - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something).
  - It sometimes creates constant-only (or constants + one node) graphs, which is useless
- Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF
- Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types.
- There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task.
- Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds.
- Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807

Reviewed By: ezyang

Differential Revision: D9004285

Pulled By: apaszke

fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7
2018-07-26 22:11:50 -07:00
2c1d9e09b8 Support UINT8 for addition data in ImageInputOp (#9901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9901

Added support for UINT8 datatype for additional data (prefetching and
output) by ImageInputOp

Reviewed By: ashwinb

Differential Revision: D9018964

fbshipit-source-id: f938a8a072c15c0ee521b2f16788c024b08cd37f
2018-07-26 22:11:46 -07:00
aa671ddefa Support production models with predictor benchmark (#9855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9855

Support production models with predictor benchmark
Two new flags are added:
`--update_prod`: pull production data (netdef, input types, input dims) from Hive and store locally
`--use_prod`: run benchmark with local production data with the same workload as in production.
By default, 300 models will be loaded.

production vs benchmark
avg net run time:
(collected by prod: https://fburl.com/scuba/6lb91zfx and bench: https://fburl.com/ngjj1dc8)
**prod: `408us` vs bench: `543us`**
(With prod data distribution, this should be even closer)

framework overhead (as of 2018-07-22):
prod:
```
9.111%    BlackBoxPredictor::Run
4.602%    SimpleNet::Run
2.377%    Operator::Run
1.786%    BlackBoxPredictor::AllocateMemory
1.372%    Observable::StartAllObservers
1.358%    Observable::StartObserver
1.206%    Blob::GetMutable
```

bench:
```
8.577%    BlackBoxPredictor::operator()
3.276%    SimpleNet::Run
1.954%    Operator::Run
1.697%    BlackBoxPredictor::AllocateMemory
1.477%    Tensor::ShareData
1.230%    Blob::GetMutable
1.034%    Observable::StartObserver
```

Reviewed By: yinghai

Differential Revision: D8942996

fbshipit-source-id: 27355d7bb5a9fd8d0a40195261d13a97fa24ce17
2018-07-26 21:39:29 -07:00
eb33887816 Addressed issue identified by static code analysis: potential buffer … (#9889)
Summary:
…overrun
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9889

Differential Revision: D9026278

Pulled By: soumith

fbshipit-source-id: ee2ee255f34731ddc581261984c3caf56faa0e12
2018-07-26 21:09:51 -07:00
e41eb43327 Remove deprecated masked_copy (#9819)
Summary:
No tests are affected by this removal.

Closes https://github.com/pytorch/pytorch/issues/1885 and closes #9817

While I was at it, I also fixed #9876 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9819

Differential Revision: D9018126

Pulled By: SsnL

fbshipit-source-id: a9142bf4e2403bef05779a097f61fa8b7db04b71
2018-07-26 20:55:18 -07:00
a841006353 Simplify some code by directly constructing unordered_set from nodes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9675

Differential Revision: D8952196

Pulled By: resistor

fbshipit-source-id: 5ef2308fed9f702021f650cf2d241a83d880d359
2018-07-26 19:54:38 -07:00
dfa0af093d Move predictor into caffe2/caffe2/predictor (#9548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548

Pull Request resolved: https://github.com/pytorch/translate/pull/157

One part of refactor predictor. Move all the files into predictor dir.

Reviewed By: highker

Differential Revision: D8845276

fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013
2018-07-26 19:03:40 -07:00
c045e969b6 Use qualified name at::Half in Dispatch.h (#9848)
Summary:
This makes AT_DISPATCH_ALL_TYPES_AND_HALF valid outside of the at
namespace.

See https://github.com/pytorch/extension-cpp/issues/15
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9848

Differential Revision: D9006921

Pulled By: colesbury

fbshipit-source-id: a6e4f097a9d6fb85c921e1c9b9ea25d0f2db06dc
2018-07-26 19:03:24 -07:00
e7ab093d93 Simplify order switch operators (#9581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581

Mostly to simplify code. Should also improve performance but order switch ops
don't take much time anyway.

Reviewed By: viswanathgs

Differential Revision: D8909766

fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919
2018-07-26 18:24:29 -07:00
b7b61a8eb4 Change expect, cast on Type to return shared pointers, make isSubtypeOf accept TypePtr (#9786)
Summary:
Follow up task of #9584.

Commit 1:

- change expect/cast to return shared pointers instead of raw pointer
- isSubtypeOf accept TypePtr instead. Use `x->isSubtypeOf(NumberType::get())` rather than `x->isSubtypeOf(*NumberType::get())`

Commit 2:

- to address enable_shared_from_this pitfalls, we make the constructor private and expose the factory method to make sure user can only create it using our factory method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9786

Reviewed By: zdevito

Differential Revision: D8980441

Pulled By: wanchaol

fbshipit-source-id: e5c923fc57a701014310e77cf29985b43bb25364
2018-07-26 18:09:45 -07:00
9df9c46992 fix loading 1dim tensor from 0.3.* to 0dim tensor (#9781)
Summary:
This PR fixes #9743 .

Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781

Differential Revision: D8988196

Pulled By: ailzhang

fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef
2018-07-26 17:09:41 -07:00
d65c667f28 Avoid divide-by-zero when hamming_window window length is 0.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9896

Reviewed By: ezyang

Differential Revision: D9018572

Pulled By: gchanan

fbshipit-source-id: fa314687973124165bffb3084932d8ab6d872a93
2018-07-26 15:56:44 -07:00
d1260d26fe Sleep before run (#9891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891

Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880

Reviewed By: llyfacebook

Differential Revision: D9014254

Pulled By: sf-wind

fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0
2018-07-26 14:39:17 -07:00
18a6541b82 Create IDEEP fallback operators for ctc decoder ops (#9847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9847

CTCBeamSearchDecoder and CTCGreedyDecoder do not currently support IDEEP
execution. Add fallback operators to allow IDEEP execution of models that use
these operators.

Reviewed By: yinghai

Differential Revision: D9006234

fbshipit-source-id: fc539ba67b07d1f960d28564d8adde0be8690649
2018-07-26 14:09:11 -07:00
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
456f41301c Disable unique ops test on rocm (#9892)
Summary:
Somehow we have Unique operator tests in two places test_unqiue_ops.py and hypothesis_test.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9892

Reviewed By: houseroad

Differential Revision: D9017631

Pulled By: bddppq

fbshipit-source-id: 1f9e40e4953afca26141ef4581202b9b9fce0ae9
2018-07-26 13:10:23 -07:00
1dc708493e Add html-stable target to docs Makefile (#9884)
Summary:
This lets one build docs for the release easier. All of the unstable
warnings are removed in `make html-stable`.

cc soumith SsnL

Sample build:
![image](https://user-images.githubusercontent.com/5652049/43277115-05e2f720-90d5-11e8-9977-b0b4a6ee4b8e.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9884

Reviewed By: SsnL

Differential Revision: D9016001

Pulled By: zou3519

fbshipit-source-id: 5cf2dfbf886de993242db28cdac5d0c5fadbdc4d
2018-07-26 12:09:06 -07:00
0c84a5c27e Pass shape infos to ONNX -> Caffe2 C++ conversion backend (#9870)
Summary:
And let Gemm conversion to inspect the input `C` to try converting to FC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9870

Reviewed By: houseroad

Differential Revision: D9013198

Pulled By: bddppq

fbshipit-source-id: b4c509cfccca238262e1c406b004e66cef256321
2018-07-26 12:00:32 -07:00
e39c8043dc Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763)
Summary:
This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763

Reviewed By: zou3519

Differential Revision: D8978457

Pulled By: apaszke

fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20
2018-07-26 12:00:27 -07:00
6f10944f88 Re-enable rocm tests that have been fixed in rocm 1.8.2 (#9862)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9862

Differential Revision: D9012520

Pulled By: bddppq

fbshipit-source-id: cdcc184e23befa8dbd1bc44d59bd25766aac33d0
2018-07-26 10:54:57 -07:00
716f7d657d Remove Broadcast.py. (#9843)
Summary:
I don't think this file is used anywhere, I guess we'll find out!

(Weirdly this failed lint on one of my PRs even though it shouldn't).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9843

Differential Revision: D9003949

Pulled By: gchanan

fbshipit-source-id: 26d580d1e7cdd30e82e5f4176244e51fd7cd616d
2018-07-26 10:44:24 -07:00
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
2c7e7e37a6 Corrected doc in class RNNCell (#9866)
Summary:
fixes #9642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9866

Differential Revision: D9012131

Pulled By: weiyangfb

fbshipit-source-id: d2849b1a50234dbdb335dffab4835c9de85183c3
2018-07-26 09:27:05 -07:00
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
e70fc145a9 MIOpen fixes for Caffe2 (#9842)
Summary:
The PR contains:
Fixes for running MIOpen conv operator in a multi worker scenario, along with a performance fix
Fixing a typo in MIOpen pool op and adding some extra checks for MIOpen spatial BN op

bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9842

Differential Revision: D9012512

Pulled By: bddppq

fbshipit-source-id: 270e1323c20fbfbc4b725f9a4ff34cd073ddaaa8
2018-07-26 02:42:26 -07:00
3be8e4db51 Do not run ONNX integration tests in parallel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9861

Differential Revision: D9011458

Pulled By: bddppq

fbshipit-source-id: 7ab1b1763d56f1290ade7a99682ad461c97f807b
2018-07-25 21:54:29 -07:00
997f46d1e1 Disable "filter too much" health check for fc operator tests (#9865)
Summary:
makes the CI flaky
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865

Differential Revision: D9011882

Pulled By: bddppq

fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d
2018-07-25 21:41:14 -07:00
ba062e7da9 Update OnnxifiOp according to onnx/onnx#1224
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9844

Reviewed By: yinghai

Differential Revision: D9004222

Pulled By: bddppq

fbshipit-source-id: 1bdcefc0dfbd5e3422217b5254b2462e5a568d2a
2018-07-25 19:29:38 -07:00
5e4de0821a Set ROCm MAX_JOBS=4 (#9856)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9856

Differential Revision: D9009100

Pulled By: ezyang

fbshipit-source-id: 28f34128fcb7c3d6a115884bf28dc2a6bde5aed6
2018-07-25 19:09:41 -07:00
6cd0174ff5 Reimplement localScalar as a native function. (#9762)
Summary:
I split it into two parts, _local_scalar and _local_scalar_dense (unchecked)
so I could reuse the sparse logic in both paths.

_local_scalar became a method on Tensor to work around a circular
include problem.

This is resurrected copy of #9652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762

Differential Revision: D8972348

Pulled By: ezyang

fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c
2018-07-25 19:09:39 -07:00
ad47228020 Test pinning Hypothesis 3.59.0 (#9830)
Summary:
We think this will band-aid some of the new Caffe2 test failures.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9830

Differential Revision: D9008052

Pulled By: ezyang

fbshipit-source-id: 84f1c0faea429d758d760965d6cbfe9e4c72eb19
2018-07-25 18:11:10 -07:00
b84b78a69d Fix the ROCM build, and enable sccache for it
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9841

Differential Revision: D9008030

Pulled By: ezyang

fbshipit-source-id: 51cac3c75fc52658b22a10a6bf8a479bcf803fb2
2018-07-25 17:55:47 -07:00
0b16b03b98 Plumb type annotations through script compilation (new) (#9547)
Summary:
Supersedes https://github.com/pytorch/pytorch/pull/9405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547

Reviewed By: zdevito

Differential Revision: D8900327

Pulled By: jamesr66a

fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd
2018-07-25 17:10:14 -07:00
445c17d492 Update CopyMatrix in math (#9792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792

Update CopyMatrix in math

Reviewed By: houseroad

Differential Revision: D8982421

fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002
2018-07-25 16:10:52 -07:00
74ac5265d1 nomnigraph - make use of nodeIterator (#9831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9831

Follow up to D8980903 - replace dataIterator with nodeIterator where the data isn't used.

Reviewed By: pjh5

Differential Revision: D8998351

fbshipit-source-id: c333847ecd8b6d8075352322845839b94a63aecc
2018-07-25 15:40:44 -07:00
302adb7cc8 added torch.rot90() to ATen (#8628)
Summary:
1. fixes #6271
2. implemented torch.rot90() following [numpy.rot90()](6a58e25703/numpy/lib/function_base.py (L54-L138))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628

Reviewed By: ezyang

Differential Revision: D8987860

Pulled By: weiyangfb

fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9
2018-07-25 15:11:44 -07:00
2f5c0c30cd Make logsumexp work with empty tensors again. (#9825)
Summary:
https://github.com/pytorch/pytorch/pull/9755 broke this, but it was only tested if size zero dims were turned on (it can still happen even if that isn't turned on, because we support size [0] tensors).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9825

Differential Revision: D8997303

Pulled By: gchanan

fbshipit-source-id: 911dce112f73fad0f3980a7f4f9423df0f2d923d
2018-07-25 13:41:24 -07:00
4b0098f3ae Add --allow-change-held-packages to make nccl2 install in docker work (#9828)
Summary:
This was used to build Caffe2 Docker version 170.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9828

Differential Revision: D8997808

Pulled By: ezyang

fbshipit-source-id: f48938b2b71bc86578c9d9b46c281ed05478724e
2018-07-25 11:56:40 -07:00
279b836675 Add some user-friendly checks in pack padded symbolic to ensure thing… (#9731)
Summary:
…s are the right type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9731

Reviewed By: soumith

Differential Revision: D8958693

Pulled By: jamesr66a

fbshipit-source-id: 7db1f86a85188fd2c84d0edaaaac6a096d64ba52
2018-07-25 11:25:42 -07:00
be163f50a3 Avoid divide-by-zero when bartlett_window size is 0.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788

Differential Revision: D8980951

Pulled By: gchanan

fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c
2018-07-25 10:40:39 -07:00
56fbfee872 Remove ifdef __cplusplus from THTensor.h, have cpp self-contained in … (#9775)
Summary:
…THTensor.hpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9775

Differential Revision: D8977140

Pulled By: gchanan

fbshipit-source-id: d6d2461f7cb0511ee1def52ac1032a86349a7105
2018-07-25 10:25:17 -07:00
a7f183f971 Revert "Fix dataloader hang when it is not completely iterated (#9655)" (#9804)
Summary:
This reverts commit 9ee513365121cd387e11987c66db6599ac53ded7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804

Reviewed By: ezyang

Differential Revision: D8987780

Pulled By: SsnL

fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5
2018-07-25 10:10:30 -07:00
c14e17eced Co-disitillation with different archs and/or feature set (#9793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9793

Enable co-distillation with different archs

Reviewed By: pjh5

Differential Revision: D8888479

fbshipit-source-id: eac14d3d9bb6d8e7362bc91e8200bab237d86754
2018-07-25 10:10:27 -07:00
ea67a2bd11 Allows negative index to tensor.narrow (Fixes: #9546)
Summary:
Fixes #9546
Test cases added

Reviewed By: ezyang

Differential Revision: D8974842

Pulled By: zou3519

fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df
2018-07-25 09:25:45 -07:00
0853d13f86 Move scalar boolean to THTensor, rename scalar in this context to zer… (#9783)
Summary:
…o dim.

Manifest:
1) The scalar boolean is now in THTensor, although it isn't hooked up at the TH level yet.
2) setScalar is gone, everything now goes through the maybeScalar equivalent (which is renamed)
3) all "scalars" in this context now refer to "zero_dim" in order to differentiate this concept from the "Scalar" class.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9783

Differential Revision: D8978911

Pulled By: gchanan

fbshipit-source-id: f09254be4bebad0e4c510fefe4158b4f7e92efe1
2018-07-25 09:25:41 -07:00
8825e323b5 nomnigraph - Add way to check if a NodeRef is in a graph, and make a graph node iterator (#9790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9790

- Add way to check if a NodeRef is in a graph
- Make a nodeIterator (similar to dataIterator) but only iterate through nodes.

Reviewed By: bwasti

Differential Revision: D8980903

fbshipit-source-id: b20504a46715858752e25242303125a15a709b88
2018-07-25 09:02:13 -07:00
42a4747389 Temporarily need this to prevent sccache from breaking. (#9810)
Summary:
Temporarily need this to prevent sccache from breaking when I move sccache install to the DockerFile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9810

Differential Revision: D8991684

Pulled By: Jorghi12

fbshipit-source-id: 14cd0278f53a72372f9bbe27b228980f8d3c1d4a
2018-07-25 09:01:58 -07:00
a74a3fdeb6 typo fix, tutorials url with http protocol is not valid (#9812)
Summary:
The tutorials url with http is not valid, replacing it with https.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9812

Differential Revision: D8991344

Pulled By: ezyang

fbshipit-source-id: c12faa57905b50eadc320f9938c39c4139bd093b
2018-07-25 07:54:26 -07:00
3ef521e98a Implement backward for torch.symeig (#8586)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/6890. (backward pass for non-symmetric eigen-decomposition is not implemented in other packages, e.g. autograd, mxnet, tensorflow, presumably because the eigenvalues can be imaginary for the general case, and AFAIK we cannot support complex numbers).

This patch adds a backward function for the symmetric eigen-decomposition function `torch.symeig`. The formula used is taken from [here](http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf). Unit tests are added to verify correctness.

There is still one outstanding issue, which is how to handle the case where the `symeig` is called with `eigenvectors=False`. In this case, the eigenvectors are returned as a zero tensor, but the backward computation for the eigenvalues depends on the eigenvectors. There was a previous attempt to implement this in https://github.com/pytorch/pytorch/pull/2026, where apaszke mentioned that the `eigenvectors` argument should be overridden so that they are saved for the backwards pass. The forward code is autogenerated, though, and it isn't clear to me how that would be done. I'd appreciate any guidance. For now, there is a unit test that will fail until that issue is resolved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8586

Reviewed By: ezyang

Differential Revision: D8872760

Pulled By: SsnL

fbshipit-source-id: 76614495d0f9c118fec163a428f32e5480b4d115
2018-07-25 07:16:10 -07:00
0262fd0f91 Delete Tensor::typeString() (#9764)
Summary:
The primary use-site of typeString was checked_cast_tensor.
I did a little more than I needed in this patch, to set
the stage for actually deleting the tensor type.

Specifically, I modified checked_cast_tensor to explicitly
take Backend and ScalarType, the idea being that once we
remove the tensor subclasses, we will delete the T template
parameter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764

Differential Revision: D8969196

Pulled By: ezyang

fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6
2018-07-24 22:26:15 -07:00
723a600ebd Update for new incremental build instructions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9773

Differential Revision: D8988285

Pulled By: ezyang

fbshipit-source-id: c2c3b7cefb54e4e18602b180281f22939293a383
2018-07-24 22:26:13 -07:00
bca10ad706 Implementation of Weibull distribution (#9454)
Summary:
This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution).

**Details**
- We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions).
- The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector.

Example of the bug:

```
>>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine
array([1., 1., 1.])
>>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var
    return self.dist.var(*self.args, **self.kwds)
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var
    res = self.stats(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats
    if np.isinf(mu):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454

Differential Revision: D8863574

Pulled By: SsnL

fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787
2018-07-24 20:40:15 -07:00
4b61760738 Add Adadelta optimizer to caffe2 (#9088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088

Closes https://github.com/pytorch/pytorch/pull/9088

- Added CPU/GPU implementations of Adadelta and SparseAdadelta.
- Added corresponding Python unittests

Reviewed By: BIT-silence

Differential Revision: D8712169

fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be
2018-07-24 20:09:21 -07:00
620952117e remove unnecessary -Wno= flags
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9608

Differential Revision: D8946664

Pulled By: anderspapitto

fbshipit-source-id: b05f10af58da25b2a2588f7153f393bb3637f29a
2018-07-24 18:40:42 -07:00
9cf76cfb4c Chaning conda build script to use current python version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9780

Reviewed By: ml7

Differential Revision: D8983501

Pulled By: pjh5

fbshipit-source-id: 79208796247433cbe271a2d06f66254587d96f80
2018-07-24 18:40:40 -07:00
f62bc01dfe Remove TORCH_ASSERT (#9575)
Summary:
I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575

Differential Revision: D8924566

Pulled By: goldsborough

fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441
2018-07-24 18:10:06 -07:00
d2610fb379 Constexpr Type Ids -> 6.5% caffe2 perf improvement (#9603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9603

Using constexpr for some heavily queried type ids gives us a 6.5% perf improvement for caffe2.

Benchmark results: P59829647

Also ad canaries (but they don't show a significant difference):
- adfinder:
  - https://our.intern.facebook.com/intern/ads/canary/411346509423301481
  - https://our.intern.facebook.com/intern/ads/canary/411346563021753557
- adindexer:
  - https://our.intern.facebook.com/intern/ads/canary/411346517006038367
  - https://our.intern.facebook.com/intern/ads/canary/411346571387258927
- multifeed_predictor:
  - https://our.intern.facebook.com/intern/ads/canary/411346526631282941
  - https://our.intern.facebook.com/intern/ads/canary/411346583141009531

Reviewed By: dzhulgakov

Differential Revision: D8841577

fbshipit-source-id: 1a0ce7f2bee1ae54b723caefe5bc7f85a20935b4
2018-07-24 17:24:55 -07:00
6c6a353a66 Fix speedbenchmark bug (#9770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9770

Add zero ops to operators that do not have a valid schema

Reviewed By: hlu1

Differential Revision: D8957472

fbshipit-source-id: d8d0a351183e88ace2e050a87c1e1c363af67e33
2018-07-24 17:10:37 -07:00
d7d673b68d Updata onnx to lastest master (#9782)
Summary:
52d40befa7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9782

Reviewed By: yinghai, houseroad

Differential Revision: D8978668

Pulled By: bddppq

fbshipit-source-id: 238f76a36784c12cc5655a2ee059f7e0169c0bb6
2018-07-24 14:42:01 -07:00
e5fe66d7ea Add support for specifying device_option in Functional (#9619)
Summary:
e.g.
```
Functional.Add(x, y, device_option=DeviceOption(HIP, 0))

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9619

Differential Revision: D8966599

Pulled By: bddppq

fbshipit-source-id: 22235e42f19278e79802642798bf0ee70a1202f6
2018-07-24 14:41:59 -07:00
37fc58f1d3 Use torch::empty before random_ on seed gen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9769

Reviewed By: goldsborough

Differential Revision: D8977636

Pulled By: SsnL

fbshipit-source-id: c2437d5ef53dc74e1b17eb16e728e1d67ae314c7
2018-07-24 14:41:58 -07:00
f393df774b Test case for c10d DDP (#9670)
Summary:
Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some.

I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!).

I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9670

Differential Revision: D8977724

Pulled By: goldsborough

fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944
2018-07-24 14:10:24 -07:00
e26d584445 Remove isScalar() from TensorImpl.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9765

Differential Revision: D8969474

Pulled By: gchanan

fbshipit-source-id: 42002b129488179affc919dba877de5a4e8f9fb5
2018-07-24 12:55:06 -07:00
7050d83dd7 Make logsumexp_out inplace (#9755)
Summary:
Fixes: #9754

Maybe this could also make its way into 0.4.1, it  is a severe debugging headache if you hit this...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755

Reviewed By: ezyang

Differential Revision: D8967178

Pulled By: zou3519

fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42
2018-07-24 12:40:48 -07:00
360c1bbd5b Add multivariate log-gamma (mvlgamma) (#9451)
Summary:
1. Add tests in test_cuda, test_torch
2. Add doc strings

Closes https://github.com/pytorch/pytorch/issues/9378 .

Differential Revision: D8859746

Pulled By: ezyang

fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e
2018-07-24 12:10:10 -07:00
6885b3fd62 Delete dead IsVariable enum. (#9768)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9768

Differential Revision: D8975802

Pulled By: ezyang

fbshipit-source-id: f85844872a1eb13e782aba0c168a3a1c1ac0313d
2018-07-24 11:58:11 -07:00
f9a99d5504 Specify default initialization schemes for modules in docs (#9038)
Summary: This closes #6906 .

Reviewed By: ezyang

Differential Revision: D8698632

Pulled By: weiyangfb

fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
2018-07-24 11:58:08 -07:00
2b134c72e6 Add interface to provide blob types to shape&type inference (#9643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643

Current map interface assumes float data type, which is not always correct.

Reviewed By: kennyhorror

Differential Revision: D8455784

fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10
2018-07-24 11:58:05 -07:00
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
6ab5e697b9 Small fixups for enabling zero size dims. (#9724)
Summary:
1) Properly test cpu for alpha/beta addmm cases.
2) Unsqueeze on empty no longer throws an exception.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724

Reviewed By: ezyang

Differential Revision: D8958513

Pulled By: gchanan

fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09
2018-07-24 11:11:39 -07:00
675d80841a Small fixups for n-dimensional empty tensors in CUDA non-reduction di… (#9722)
Summary:
…m ops.

Continuation of https://github.com/pytorch/pytorch/pull/9658.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9722

Differential Revision: D8956321

Pulled By: gchanan

fbshipit-source-id: 116fcaa1be5b1373f03217911556a28125cc860d
2018-07-24 11:11:37 -07:00
f6496229a5 Fixes xcode 10 beta 4 compile error (#9748)
Summary:
When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748

Reviewed By: jerryzh168

Differential Revision: D8954332

Pulled By: williamtwilson

fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3
2018-07-24 11:11:35 -07:00
1283834600 Devirtualize TensorImpl::toString (#9758)
Summary:
This can hardly be called an improvement (we now print
CPUFloatType instead of CPUFloatTensor) but it was the
simplest way I could think of devirtualizing this function in
the short term.  Probably need some sort of native function
that gives string information about a tensor.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Approved in #9710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758

Differential Revision: D8966935

Pulled By: ezyang

fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db
2018-07-24 11:11:33 -07:00
679d397f28 Fix scalar_tensor_test for squeeze/unsqueeze with zero sized dimensions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9766

Differential Revision: D8971173

Pulled By: gchanan

fbshipit-source-id: 50bf7778eee7c60f51e1660ad834e161fa40f563
2018-07-24 10:42:39 -07:00
a7afba7308 Remove duplicated functions (#9601)
Summary:
found by linter, duplication was likely introduced in previous code sync
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9601

Differential Revision: D8922379

Pulled By: bddppq

fbshipit-source-id: 1f61bd7f539d823e62920615674a532ec0149623
2018-07-24 10:23:46 -07:00
adda789770 Skip maxpool_with_indices onnx tests (#9751)
Summary:
Not in the same format. Skip at the moment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751

Reviewed By: yinghai

Differential Revision: D8965636

Pulled By: houseroad

fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f
2018-07-24 10:23:43 -07:00
ba634c11df Move strides to base class. (#9749)
Summary:
Approved in #9644
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9749

Differential Revision: D8965336

Pulled By: ezyang

fbshipit-source-id: d1b0763e592f298395621cfd684715dc0a550cd6
2018-07-23 22:27:48 -07:00
9bf72b2087 Add missing windows exports
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9738

Reviewed By: apaszke

Differential Revision: D8961728

Pulled By: zdevito

fbshipit-source-id: aacba8c03d0d8dfe1e87585d1c2b26703d2ed103
2018-07-23 19:55:19 -07:00
5df3eae89e Add 1x1 specialization for conv with NCHW order (#9671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671

Add 1x1 specialization for conv with NCHW order

Reviewed By: houseroad

Differential Revision: D8944686

fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c
2018-07-23 18:54:58 -07:00
a387331e54 Re-enable test_segfault after recent dataloder changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700

Differential Revision: D8953615

Pulled By: SsnL

fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60
2018-07-23 18:38:42 -07:00
099b5ba9d1 Tensor merge PRs from July 20 (#9713)
Summary:
Constituent PRs:

- [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang)
- [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan)
- [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan)
- [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan )
- [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang)
- [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang)
- [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL)

Upcoming PRs which didn't make this cut:

- [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan)
- [ ] #9652 Native localScalar  (by ezyang, **UNREVIEWED AND FAILING TESTS**)
- [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan)
- [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_  (by cpuhrsch, **CHANGES REQUESTED AND FAILING TESTS**)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713

Reviewed By: gchanan

Differential Revision: D8960882

Pulled By: ezyang

fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204
2018-07-23 18:00:41 -07:00
e3fb9088d5 Allow multiple ops.def and clean up code gen in general
Summary: Basic cleanup, refactoring out some ops to closed source fb

Reviewed By: yinghai

Differential Revision: D8720722

fbshipit-source-id: 6fdf915c057a5749656d9f34a57fc142de6b076b
2018-07-23 15:44:04 -07:00
5849354aa1 Add operator<< overloads for TensorOptions (#9606)
Summary:
Added `operator<<` overloads for `at::TensorOptions` on request of ebetica

Example output:

```
TensorOptions(dtype=Double, device=cpu, layout=Strided, requires_grad=false)
```

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9606

Differential Revision: D8925191

Pulled By: goldsborough

fbshipit-source-id: 0503bc2851268276e9561d918290bc723e437c9c
2018-07-23 15:11:33 -07:00
d05a8145c5 Change behavior of clone to clone to a device (#9609)
Summary:
ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from

`shared_ptr<Module> clone()`

to

`shared_ptr<Module> clone(optional<Device> device = nullopt)`

with semantics of:

1. If a `device` is given, all parameters/buffers are moved to that device,
2. If no `device` is supplied (default), parameters/buffers retain their device.

ezyang apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609

Differential Revision: D8957367

Pulled By: goldsborough

fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8
2018-07-23 14:55:25 -07:00
31ba2f15e1 Rename embedding variable to weight (#9720)
Summary:
I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code.

ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720

Differential Revision: D8955647

Pulled By: goldsborough

fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea
2018-07-23 14:55:24 -07:00
431415adc4 quick patch for PackPadded removal to propagate the correct size. (#9657)
Summary:
The underlying reason why this is even an issue is that the conversion
into and out of the 'fictional' onnx operators is done in an unhygenic
order. This doesn't address that, but it does fix the one observable
case where this produces an incorrect result, and unblocks some other
work being done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9657

Differential Revision: D8940824

Pulled By: anderspapitto

fbshipit-source-id: ea827a24c85447fe4ae470336a746329598eee84
2018-07-23 14:25:39 -07:00
a949245a86 Switch interpreter to use IValue's primitive int/floats (#9718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718

This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base.

Notes:
* Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line
* Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because
we no longer allow implicit tensor -> integer conversions following the same convention as in python
* prim::ListConstruct/createList has been added to the interpreter for creating lists and this has
replaced aten::stack for integers lists
* gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract
the primitives
* IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor
* `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type
more consistent. This functionality combined with `toIValue(Value*)` replaces the `tensor_as` and `as_tensor` family of functions.
* conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors
* IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store
the scalars as a double or int64.
* To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler,
along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior.
* Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions.
* Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions.
* TensorToNum and NumToTensor have actual implementations as operators now
* regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp
* Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler.
* incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here.
* stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types:

```
at::Tensor a;
at::Scalar b;
pop(stack, a, b);
at::Tensor c = a + b;
push(stack, c);
```
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584

Reviewed By: apaszke

Differential Revision: D8910546

Pulled By: zdevito

fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30
2018-07-23 14:11:11 -07:00
a9742e1a27 Add fallback to TensorCPU if there are unsupported types for IDEEP Tensor (#9667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667

MKL-DNN doesn't support 64-bit integger (cfee61bf81/include/mkldnn_types.h (L62-L75)). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back.

Reviewed By: pjh5

Differential Revision: D8943544

fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116
2018-07-23 13:54:57 -07:00
ee2cc68259 Add ctc_beam_search_decoder op for caffe2 (#9622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622

Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder.

Differential Revision: D8903100

fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3
2018-07-23 13:40:24 -07:00
aa8a9fa5fc Extend DispatchStub to support CUDA dispatch (#9664)
Summary:
This is a modification of the strategy from https://github.com/pytorch/pytorch/pull/8919 and https://github.com/pytorch/pytorch/pull/9579.

```
Previously, the CPU architecture-specific kernels self-registered with
the DispatchStub. When linking as part of a static library, this requires
the flag --whole-archive to be passed to the linker to ensure that the
object files for the kernels are included. Caffe2 and TensorFlow use that
strategy.

We ran into some issues with --whole-archive blowing up the binary size
of some downstream projects in Facebook. This PR avoids --whole-archive
for CPU kernels. The downside is that the generic code needs to be aware
of whether kernels are compiled with AVX and with AVX2 (via
HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION).

The CUDA kernels still self-register with DispatchStub because the CPU
library is not aware of whether the CUDA library will be available at
runtime.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9664

Differential Revision: D8943350

Pulled By: colesbury

fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef
2018-07-23 13:40:23 -07:00
3e9e3ef383 Improving diagnose RF NE with Cali (#9550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9550

as titled

Differential Revision: D8899226

fbshipit-source-id: 3c7cf026e8cbc0e95770e5a35b213a97bebba385
2018-07-23 13:40:21 -07:00
88d6b6e6cd Fix D8722560 (#9717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9717

D8722560 was landed with some build errors, unfortunately the c10 code isn't part of contbuild yet.
Fixing them.

Differential Revision: D8954141

fbshipit-source-id: 2a082fb8041626e45ccd609f37a8ef807f6dad8a
2018-07-23 12:55:20 -07:00
5094684238 Create torch::from_blob for variables (#9605)
Summary:
Need an overload of `at::from_blob` for Variables.

ezyang colesbury ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9605

Differential Revision: D8926226

Pulled By: goldsborough

fbshipit-source-id: e377c0d019d4377f3fc124614c7dcc562aa69990
2018-07-23 12:40:12 -07:00
14d4bdb406 Reformat output data format to make it more general for other binaries (#9555)
Summary:
This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries.

This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555

Reviewed By: pjh5

Differential Revision: D8903024

Pulled By: sf-wind

fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a
2018-07-23 11:11:26 -07:00
029cf1d78a Improve error messages of wrong dimensions (#9694)
Summary:
Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694

Differential Revision: D8949589

Pulled By: ezyang

fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f
2018-07-23 10:10:55 -07:00
9525925119 Low rank multivariate normal (#8635)
Summary:
This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost.

During the way, I also revise MultivariateNormal distribution a bit. Here are other changes:
+ `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`.
+ Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler.
+ Use `torch.diagonal` for `_batch_diag`
+ Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`.
+ Use trtrs to compute term2 of KL.
+ `variance` relies on `scale_tril` instead of `covariance_matrix`

TODO:
- [x] Resolve the fail at `_gradcheck_log_prob`
- [x] Add test for KL

cc fritzo stepelu apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635

Differential Revision: D8951893

Pulled By: ezyang

fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3
2018-07-23 10:10:53 -07:00
9d6521c3a0 Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658)
Summary:
…unctions.

This also unifies the error checkign between scatter/scatterAdd on CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658

Differential Revision: D8941527

Pulled By: gchanan

fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea
2018-07-23 08:40:12 -07:00
53083b8353 Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491) (#9491)
Summary:
Fixes #9092.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693

Differential Revision: D8946850

Pulled By: ezyang

fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707
2018-07-23 06:40:39 -07:00
9ee5133651 Fix dataloader hang when it is not completely iterated (#9655)
Summary:
second trial of https://github.com/pytorch/pytorch/pull/7140

cc csarofeen Let's see if this works. It passes everything locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655

Differential Revision: D8940177

Pulled By: SsnL

fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158
2018-07-22 20:38:27 -07:00
1afdc57ed8 Hide all other fields in THTensor (#9683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9683

This pops off `refcount_`, `storage_`, `storage_offset_`; there are now no more direct accesses to these fields and we can make them private (with appropriate friending).

Stacked on #9561
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9591

Reviewed By: SsnL

Differential Revision: D8922246

Pulled By: ezyang

fbshipit-source-id: dfae023d790e29ce652e2eab9a1628bbe97b318d
2018-07-22 09:09:34 -07:00
f3d72b2101 Modify barrier net to allow better control over its initialization and execution in DPM (#9665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665

In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net.

Reviewed By: andrewwdye

Differential Revision: D8375389

fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578
2018-07-22 00:23:47 -07:00
769cb5a640 Add new ways of matching nodes with schemas in the JIT (#9567)
Summary:
**REVIEW LAST COMMIT ONLY**

As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function:
```cpp
n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor")
```

This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567

Reviewed By: zdevito

Differential Revision: D8938482

Pulled By: apaszke

fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e
2018-07-21 21:39:07 -07:00
a01d6f01b5 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525

Update channel_shuffle_op and transpose 2d to speed up ShuffleNet

Reviewed By: houseroad

Differential Revision: D8889361

fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e
2018-07-21 12:54:33 -07:00
3bb8c5eab1 Allow MKLDNN on macOS, and any other OS where CMake is able to detect it.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9638

Reviewed By: soumith

Differential Revision: D8946130

Pulled By: resistor

fbshipit-source-id: 87bd9cb12608467b05bd4998fdb00bfdbd038ca2
2018-07-20 22:27:02 -07:00
b5c8d59451 Add a CUDAContext header include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9662

Differential Revision: D8945581

Pulled By: ezyang

fbshipit-source-id: 2fe0adc96456788579f7d6f1c4513fe45360c030
2018-07-20 20:39:09 -07:00
23ed26a0c3 Guard include of cuda-only header comm.h (#9656)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9656

Reviewed By: colesbury

Differential Revision: D8941361

Pulled By: ezyang

fbshipit-source-id: c18cb0e606ae0608e5892040192b8792ae542b74
2018-07-20 19:46:36 -07:00
5e84403d5f Fix for half conversion for ROCm 1.8.2 (#9663)
Summary:
This PR contains the change for explicit conversion between ushort and __half required for ROCm 1.8.2 support
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9663

Differential Revision: D8943937

Pulled By: bddppq

fbshipit-source-id: 16102f9dbc68ed4ece2e8fc244825c3992c24901
2018-07-20 17:11:30 -07:00
3efdece9da Support n-dimensional empty tensors in take/put.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635

Differential Revision: D8935119

Pulled By: gchanan

fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef
2018-07-20 15:40:49 -07:00
45e5c17ecf ONNXIFI transform (#9569)
Summary:
Cut-off runnable subgraph and off-load to ONNXIFI backend
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569

Reviewed By: Maratyszcza

Differential Revision: D8930408

Pulled By: yinghai

fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b
2018-07-20 15:09:59 -07:00
01581037dc Add workspace.RunPlanInBackground (#9637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637

Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training.

Reviewed By: MisterTea

Differential Revision: D8906439

fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e
2018-07-20 14:56:12 -07:00
1003ccfa15 Creates CUDAContext (#9435)
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
2018-07-20 12:56:15 -07:00
8a0fe0a588 set_input_record() should always add external input (#9636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636

Make sure that the blobs are registered to the net

Reviewed By: pjh5

Differential Revision: D8924883

fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6
2018-07-20 11:55:37 -07:00
bae156a481 Support (some) CUDA Lapack on n-dimensional empty tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631

Reviewed By: ezyang

Differential Revision: D8933202

Pulled By: gchanan

fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f
2018-07-20 11:40:25 -07:00
d3688861ec Fixed a missing '=' in LPPoolNd repr function (#9629)
Summary:
In the repr funciton of LPPoolNd(..) class, there was a missing '='. (`kernel_size{kernel_size}`)

Link to line in the code: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/pooling.py#L694

Original:

       return 'norm_type={norm_type}, kernel_size{kernel_size}, stride={stride}, ' \
              'ceil_mode={ceil_mode}'.format(**self.__dict__)

Fixed:

       return 'norm_type={norm_type}, kernel_size={kernel_size}, stride={stride}, ' \
              'ceil_mode={ceil_mode}'.format(**self.__dict__)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9629

Differential Revision: D8932913

Pulled By: soumith

fbshipit-source-id: 9030dff6b14659b5c7b6992d87ef53ec8891f674
2018-07-20 11:24:42 -07:00
a3a6ab60cd Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598

The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp.

Reviewed By: jerryzh168

Differential Revision: D8919799

fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840
2018-07-20 11:09:34 -07:00
1d4d9fc7da Prepare to stop using attributes in the JIT (#9505)
Summary:
This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are:

- `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name.
- `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue
- `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position).

All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here.

One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505

Reviewed By: ezyang

Differential Revision: D8915496

Pulled By: apaszke

fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5
2018-07-20 10:56:00 -07:00
b9e89cf9fd Revert "Extend DispatchStub to support CUDA dispatch (#9579)" (#9614)
Summary:
This reverts commit bcf0bf42a1727c8ee788f733c28579d0e36a387c.

The commit was causing issues for some internal FB projects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9614

Reviewed By: Yangqing

Differential Revision: D8929552

Pulled By: colesbury

fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6
2018-07-20 10:25:11 -07:00
bbb30ad4ab Use THTensor/Storage for THVoidTensor/Storage (#9588)
Summary:
Change akin to change for THVoidStorage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9588

Reviewed By: gchanan

Differential Revision: D8915559

Pulled By: cpuhrsch

fbshipit-source-id: 6cc69df0e29942c62750f990903dfd8e4d344581
2018-07-20 09:54:44 -07:00
f84fdc7866 Remove unnecessary functions from StorageDerived.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9553

Reviewed By: ezyang

Differential Revision: D8915526

Pulled By: cpuhrsch

fbshipit-source-id: 32013d3aa58a1a68637f99ee619d06e27fadaad6
2018-07-20 09:41:36 -07:00
7b9d8916e5 Fix integral type dispatch error message (#9625)
Summary:
This fix will prevent errors like (found in `bincount`)
```
RuntimeError: %s not implemented for '%s'bincounttorch.FloatTensor
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9625

Differential Revision: D8932945

Pulled By: soumith

fbshipit-source-id: 794e3b58d662779402ab318e274661826a5db8b2
2018-07-20 09:24:27 -07:00
2a0018f2a8 Add scatter_add_ doc (#9630)
Summary:
fixes #4176 cc vishwakftw

I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630

Differential Revision: D8933022

Pulled By: SsnL

fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508
2018-07-20 08:41:05 -07:00
bfe2aa093e docs fixes (#9607)
Summary:
fixes #9589 #9507 #9502 #9390
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9607

Reviewed By: ezyang, soumith

Differential Revision: D8923575

Pulled By: SsnL

fbshipit-source-id: cb61d990333b700d813ce781040c3d0325999b8c
2018-07-20 07:55:25 -07:00
4028ff6c3a Revert "quick patch for PackPadded removal to propagate the correct s… (#9613)
Summary:
…ize. (#9593)"

This reverts commit 85b28163584380bf4953f2ac2fa21df9715f12d5.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9613

Reviewed By: bddppq

Differential Revision: D8929322

Pulled By: anderspapitto

fbshipit-source-id: 3ae4d320e5407acc1fb63a26b7d1f2ff4059eba9
2018-07-20 00:39:29 -07:00
aa7af94656 Make JIT tracing a thread-local property (#9414)
Summary:
As in the title. Lets us simplify a lot of code.

Depends on #9363, so please review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414

Reviewed By: zdevito

Differential Revision: D8836496

Pulled By: apaszke

fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
2018-07-19 19:09:39 -07:00
5651b27458 Add CAFFE_STATIC_EVENT to Stats (#9501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9501

Added a new stat value to log static states like CPU and memory usage.

Reviewed By: pjh5

Differential Revision: D8872254

fbshipit-source-id: 469e94cab99029a3da55f8986dddeadac076e2a8
2018-07-19 16:25:59 -07:00
b770156a7a Functional DataParallel (#9234)
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.

For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.

`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).

`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).

Added lots of tests for these things.

apaszke ezyang ebetica colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234

Differential Revision: D8865182

Pulled By: goldsborough

fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
2018-07-19 16:12:04 -07:00
7e78e80d94 Make error message for empty module friendlier (#9565)
Summary:
In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer.

I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment.

E.g.

```
Linear linear;

In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0,
                 from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1:   required from here
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt
y std::shared_ptr).
     static_assert(
```

ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565

Differential Revision: D8903666

Pulled By: goldsborough

fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec
2018-07-19 15:56:54 -07:00
bcf0bf42a1 Extend DispatchStub to support CUDA dispatch (#9579)
Summary:
This is a few files taken from https://github.com/pytorch/pytorch/pull/8919. They're unchanged from the latest versions of that PR.

```
This is part of https://github.com/pytorch/pytorch/pull/8919. It's
separated to make it easier to merge the PR in pieces.

There are a few major changes to DispatchStub

 - The environment variable ATEN_CPU_CAPABILITY overrides the CPU
   capability detection code (Previous ATEN_DISABLE_AVX/AVX2)

 - DispatchStub is defined in the generic native code instead of the
   CPU_CAPABILITY_DEFAULT kernel.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9579

Differential Revision: D8909000

Pulled By: colesbury

fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0
2018-07-19 14:25:40 -07:00
a08119afc2 Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561)
Summary:
* THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>`
* Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet.
* There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides)
* Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides
* Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides)

Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go.

Note for gchanan: review from commit "ci" and after
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561

Reviewed By: cpuhrsch

Differential Revision: D8901926

Pulled By: ezyang

fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510
2018-07-19 14:10:06 -07:00
f521823b7b Do not always set broadcast argument when exporting new onnx add and sub to caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597

Reviewed By: colesbury

Differential Revision: D8920575

Pulled By: bddppq

fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129
2018-07-19 14:10:05 -07:00
6557856671 Fix l2 normalization when handling zero vector (#9594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594

When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this.

Reviewed By: pjh5

Differential Revision: D8849732

fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706
2018-07-19 14:10:03 -07:00
85b2816358 quick patch for PackPadded removal to propagate the correct size. (#9593)
Summary:
The underlying reason why this is even an issue is that the conversion
into and out of the 'fictional' onnx operators is done in an unhygenic
order. This doesn't address that, but it does fix the one observable
case where this produces an incorrect result, and unblocks some other
work being done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9593

Differential Revision: D8919125

Pulled By: anderspapitto

fbshipit-source-id: a88ca979c3b9d439863e223717d3697180c26121
2018-07-19 14:10:02 -07:00
f33cd36c9b Use int64_t for im2col and col2im (#9590)
Summary:
Fixes #9404
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9590

Differential Revision: D8916020

Pulled By: SsnL

fbshipit-source-id: ac6758326bbb09b48642b149f4eb8f466ef7044e
2018-07-19 11:29:24 -07:00
f180373d68 Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573)
Summary:
This is mainly straightforward, with two exceptions:
1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y).  This is handled by manually calling zero / mul.

2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors.  Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called.  But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573

Reviewed By: ezyang

Differential Revision: D8906144

Pulled By: gchanan

fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56
2018-07-19 09:50:27 -07:00
aee9e90abd Fix TestAutograd.test_as_strided (#9538)
Summary:
0. Fixes #9479
1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check.
2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538

Differential Revision: D8894556

Pulled By: SsnL

fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299
2018-07-19 09:11:13 -07:00
e0446fcfa9 Pass dtype to tensor contructor in test_neg (#9558)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/9554.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558

Differential Revision: D8901085

Pulled By: yf225

fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8
2018-07-19 08:54:39 -07:00
54db14e390 HIP Operators Generator--> HipOpG (#9322)
Summary:
The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at **compile** time.

Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322

Differential Revision: D8884707

Pulled By: bddppq

fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8
2018-07-19 00:26:06 -07:00
45f0d05202 Adapt OnnxifiOp to removed suffix handling in ONNXIFI loader (#9571)
Summary:
Adapt to changes in onnx/onnx#1203
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9571

Reviewed By: yinghai

Differential Revision: D8907892

Pulled By: bddppq

fbshipit-source-id: 9f88471639dbe9050194e84340f335bece834d5d
2018-07-18 19:26:23 -07:00
604f7e98c3 Expose CAFFE2_USE_OPENCV preprocessor flag (#9509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9509

generate_proposals_op_util_nms.h conditionally requires OpenCV in some cases,
and earlier this was checking just CV_MAJOR_VERSION macro, but that is
undefined unless opencv.hpp is included. Adding `-DCAFFE2_USE_OPENCV` to
TARGETS when opencv is included in external_deps to check for this correctly.
Thanks jinghuang for flagging this issue!

Differential Revision: D8880401

fbshipit-source-id: 65abbcf4ffe3feffc0ee2560882cb8eb0b7476f9
2018-07-18 18:56:49 -07:00
b3e141e84c Add predictor config into Predictor (#9434)
Summary:
This is the first step of refactoring the Predictor. In this diff the config struct
is introduced and the internal data structure of Predictor has been updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9434

Differential Revision: D8843262

Pulled By: fishbone

fbshipit-source-id: 23f5e4751614e3fedc9a04060d69331bfdecf864
2018-07-18 16:39:56 -07:00
04b33b7231 Add byte_weight_dequant_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9541

Reviewed By: hlu1

Differential Revision: D8882964

fbshipit-source-id: 06d2e0d227ea6a4a8dc5ef1ea9dd1d449c149b47
2018-07-18 16:27:21 -07:00
c1ee8835b6 Constructors and member functions for THStorage (#9357)
Summary:
Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357

Reviewed By: ezyang

Differential Revision: D8863934

Pulled By: cpuhrsch

fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1
2018-07-18 15:56:26 -07:00
4c615b1796 Introduce libtorch to setup.py build (#8792)
Summary:
Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.

1) with setup.py. This method
- used the setuptools C extension functionality
- worked on all platforms
- did not build test_jit/test_api binaries
- did not include the C++ api
- always included python functionality
- produced _C.so

2) with cpp_build. This method
- used CMake
- did not support Windows or ROCM
- was capable of building the test binaries
- included the C++ api
- did not build the python functionality
- produced libtorch.so

This diff combines the two.

1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
- is CMake-based
- works on all platforms
- builds the test binaries
- includes the C++ api
- does not include the python functionality
- produces libtorch.so

2) the setup.py build
- compiles the python functionality
- calls into the CMake build to build libtorch.so
- produces _C.so, which has a dependency on libtorch.so

In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792

Reviewed By: ezyang

Differential Revision: D8764181

Pulled By: anderspapitto

fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f
2018-07-18 14:59:33 -07:00
3b886500a0 Add CUDAGuard to ATen (#9277)
Summary:
THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes https://github.com/pytorch/pytorch/issues/7800
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
2018-07-18 14:40:31 -07:00
8769fec03f Move clamp into ATen (#9506)
Summary:
Glue component of https://github.com/pytorch/pytorch/pull/9319

Important to unblock wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506

Reviewed By: wanchaol

Differential Revision: D8879437

Pulled By: cpuhrsch

fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1
2018-07-18 13:40:11 -07:00
c506ff97c8 Disable py2-clang3.8-rocmnightly-ubuntu16.04-test in disabled-configs… (#9543)
Summary:
….txt setting

In the ROCm branches we will experiment with turning this on.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9543

Differential Revision: D8897990

Pulled By: ezyang

fbshipit-source-id: ae9d25d1b79ee421d49436593edf8c7e49b3a4e5
2018-07-18 12:58:56 -07:00
ca3b36aa6a Add implementation for batch_moments_op (#9510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510

Add implementation for batch_moments_op

Reviewed By: houseroad

Differential Revision: D8587654

fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b
2018-07-18 11:59:54 -07:00
8c741b7c4f Add transformation from caffe2::resizeop to onnx::upsample
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511

Reviewed By: hlu1

Differential Revision: D8876692

fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc
2018-07-18 11:59:52 -07:00
b6b6e1b39f Fix core.Plan.create_from_proto (#9438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438

Current implementation of create_from_proto doesn't work as expected: it
duplicates networks and execution steps by copying original PlanDef first and
adding each step one-by-one later.

Reviewed By: pjh5

Differential Revision: D8850316

fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b
2018-07-18 10:55:55 -07:00
27455e9c78 Use _six for inf and nan (#9500)
Summary:
Things like `float('inf')` are actually quite expensive.
```py
In [1]: import math

In [2]: %timeit -n 200 math.inf
49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)

In [3]: %timeit -n 200 float('inf')
194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500

Reviewed By: soumith

Differential Revision: D8876229

Pulled By: SsnL

fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7
2018-07-18 10:40:29 -07:00
4413 changed files with 267491 additions and 164449 deletions

File diff suppressed because it is too large Load Diff

View File

@ -1,43 +1,31 @@
---
# NOTE: there must be no spaces before the '-', so put the comma first.
# NOTE there must be no spaces before the '-', so put the comma first.
Checks: '
*
,modernize-*
,-cert-err58-cpp
,-cert-err60-cpp
,-clang-diagnostic-*
,-cppcoreguidelines-owning-memory
-*
,bugprone-*
,-bugprone-macro-parentheses
,-bugprone-forward-declaration-namespace
,cppcoreguidelines-*
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,-fuchsia-*
,-google-build-using-namespace
,-google-explicit-constructor
,-google-readability-braces-around-statements
,-google-readability-namespace-comments
,-google-readability-todo
,-google-runtime-references
,-google-runtime-references
,-hicpp-braces-around-statements
,-hicpp-explicit-conversions
,-hicpp-no-array-decay
,-hicpp-special-member-functions
,-hicpp-vararg
,-llvm-header-guard
,-llvm-namespace-comment
,-misc-unused-parameters
,-modernize-make-unique
,-cppcoreguidelines-interfaces-global-init
,-cppcoreguidelines-owning-memory
,hicpp-signed-bitwise
,hicpp-exception-baseclass
,hicpp-avoid-goto
,modernize-*
,-modernize-use-default-member-init
,-performance-unnecessary-value-param
,-readability-braces-around-statements
,-readability-else-after-return
,-readability-named-parameter
,clang-analyzer-*
,-modernize-return-braced-init-list
,-modernize-use-auto
'
WarningsAsErrors: ''
HeaderFilterRegex: 'torch/csrc/'
WarningsAsErrors: '*'
HeaderFilterRegex: 'torch/csrc/.*'
AnalyzeTemporaryDtors: false
CheckOptions:
...

1
.gitattributes vendored Normal file
View File

@ -0,0 +1 @@
*.bat text eol=crlf

49
.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file
View File

@ -0,0 +1,49 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve PyTorch
---
## 🐛 Bug
<!-- A clear and concise description of what the bug is. -->
## To Reproduce
Steps to reproduce the behavior:
1.
1.
1.
<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
## Expected behavior
<!-- A clear and concise description of what you expected to happen. -->
## Environment
Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).
You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (`conda`, `pip`, source):
- Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
## Additional context
<!-- Add any other context about the problem here. -->

View File

@ -0,0 +1,9 @@
---
name: "\U0001F4DA Documentation"
about: Report an issue related to https://pytorch.org/docs
---
## 📚 Documentation
<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

View File

@ -0,0 +1,24 @@
---
name: "\U0001F680Feature Request"
about: Submit a proposal/request for a new PyTorch feature
---
## 🚀 Feature
<!-- A clear and concise description of the feature proposal -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
## Pitch
<!-- A clear and concise description of what you want to happen. -->
## Alternatives
<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
## Additional context
<!-- Add any other context or screenshots about the feature request here. -->

View File

@ -0,0 +1,13 @@
---
name: "❓Questions/Help/Support"
about: Do you need support? We have resources.
---
## ❓ Questions and Help
### Please note that this issue tracker is not a help form and this issue will be closed.
We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:
- [Discussion Forum](https://discuss.pytorch.org/)

35
.gitignore vendored
View File

@ -25,13 +25,17 @@ aten/src/ATen/cuda/CUDAConfig.h
build/
dist/
docs/src/**/*
docs/cpp/build
docs/cpp/source/api
test/.coverage
test/cpp/api/mnist
test/custom_operator/model.pt
test/data/gpu_tensors.pt
test/data/legacy_modules.t7
test/data/legacy_serialized.pt
test/data/linear.pt
test/htmlcov
test/cpp_extensions/install/
third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/
@ -40,6 +44,7 @@ torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
torch/csrc/generic/TensorMethods.cpp
torch/csrc/jit/generated/*
torch/csrc/jit/fuser/config.h
torch/csrc/nn/THCUNN.cpp
torch/csrc/nn/THCUNN.cwrap
torch/csrc/nn/THNN_generic.cpp
@ -49,6 +54,7 @@ torch/csrc/nn/THNN.cpp
torch/csrc/nn/THNN.cwrap
torch/lib/*.a*
torch/lib/*.dll*
torch/lib/*.exe*
torch/lib/*.dylib*
torch/lib/*.h
torch/lib/*.lib
@ -60,6 +66,8 @@ torch/lib/pkgconfig
torch/lib/protoc
torch/lib/tmp_install
torch/lib/torch_shm_manager
torch/lib/python*
torch/share/
torch/version.py
# IPython notebook checkpoints
@ -140,10 +148,6 @@ docs/source/scripts/activation_images/
# PyCharm files
.idea
# Visual Studio Code files
.vscode
.vs
# OSX dir files
.DS_Store
@ -194,3 +198,26 @@ caffe2.egg-info
# Atom/Watchman required file
.watchmanconfig
# Files generated by CLion
cmake-build-debug
# Files generated by ctags
CTAGS
tags
TAGS
# BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
#
# Below files are not deleted by "setup.py clean".
# Visual Studio Code files
.vscode
.vs
# YouCompleteMe config file
.ycm_extra_conf.py
# Files generated when a patch is rejected
*.orig
*.rej

27
.gitmodules vendored
View File

@ -1,9 +1,3 @@
[submodule "third_party/catch"]
path = third_party/catch
url = https://github.com/catchorg/Catch2.git
[submodule "third_party/nanopb"]
path = third_party/nanopb
url = https://github.com/nanopb/nanopb.git
[submodule "third_party/pybind11"]
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
@ -16,9 +10,6 @@
[submodule "third_party/googletest"]
path = third_party/googletest
url = https://github.com/google/googletest.git
[submodule "third_party/nervanagpu"]
path = third_party/nervanagpu
url = https://github.com/NervanaSystems/nervanagpu.git
[submodule "third_party/benchmark"]
path = third_party/benchmark
url = https://github.com/google/benchmark.git
@ -67,9 +58,6 @@
[submodule "third_party/onnx"]
path = third_party/onnx
url = https://github.com/onnx/onnx.git
[submodule "third_party/cereal"]
path = third_party/cereal
url = https://github.com/USCiLab/cereal
[submodule "third_party/onnx-tensorrt"]
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
@ -79,3 +67,18 @@
[submodule "third_party/ideep"]
path = third_party/ideep
url = https://github.com/intel/ideep
[submodule "third_party/nccl/nccl"]
path = third_party/nccl/nccl
url = https://github.com/NVIDIA/nccl
[submodule "third_party/gemmlowp/gemmlowp"]
path = third_party/gemmlowp/gemmlowp
url = https://github.com/google/gemmlowp.git
[submodule "third_party/QNNPACK"]
path = third_party/QNNPACK
url = https://github.com/pytorch/QNNPACK
[submodule "third_party/neon2sse"]
path = third_party/neon2sse
url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
[submodule "third_party/fbgemm"]
path = third_party/fbgemm
url = https://github.com/pytorch/fbgemm

View File

@ -2,46 +2,57 @@
set -ex
pip install --user --no-cache-dir hypothesis==3.59.0
# The INSTALL_PREFIX here must match up with test.sh
INSTALL_PREFIX="/usr/local/caffe2"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
CMAKE_ARGS=()
SCCACHE="$(which sccache)"
if [ "$(which gcc)" != "/root/sccache/gcc" ]; then
# Setup SCCACHE
###############################################################################
# Setup sccache if SCCACHE_BUCKET is set
if [ -n "${SCCACHE_BUCKET}" ]; then
mkdir -p ./sccache
# Setup SCCACHE
###############################################################################
# Setup sccache if SCCACHE_BUCKET is set
if [ -n "${SCCACHE_BUCKET}" ]; then
mkdir -p ./sccache
SCCACHE="$(which sccache)"
if [ -z "${SCCACHE}" ]; then
echo "Unable to find sccache..."
exit 1
fi
SCCACHE="$(which sccache)"
if [ -z "${SCCACHE}" ]; then
echo "Unable to find sccache..."
exit 1
# Setup wrapper scripts
wrapped="cc c++ gcc g++ x86_64-linux-gnu-gcc"
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then
wrapped="$wrapped nvcc"
fi
for compiler in $wrapped; do
(
echo "#!/bin/sh"
# TODO: if/when sccache gains native support for an
# SCCACHE_DISABLE flag analogous to ccache's CCACHE_DISABLE,
# this can be removed. Alternatively, this can be removed when
# https://github.com/pytorch/pytorch/issues/13362 is fixed.
#
# NOTE: carefully quoted - we want `which compiler` to be
# resolved as we execute the script, but SCCACHE_DISABLE and
# $@ to be evaluated when we execute the script
echo 'test $SCCACHE_DISABLE && exec '"$(which $compiler)"' "$@"'
echo "exec $SCCACHE $(which $compiler) \"\$@\""
) > "./sccache/$compiler"
chmod +x "./sccache/$compiler"
done
export CACHE_WRAPPER_DIR="$PWD/sccache"
# CMake must find these wrapper scripts
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""
) > "./sccache/$compiler"
chmod +x "./sccache/$compiler"
done
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which nvcc) \"\$@\""
) > "./sccache/nvcc"
chmod +x "./sccache/nvcc"
fi
export CACHE_WRAPPER_DIR="$PWD/sccache"
# CMake must find these wrapper scripts
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
# Setup ccache if configured to use it (and not sccache)
@ -59,6 +70,15 @@ if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
# sccache will fail for CUDA builds if all cores are used for compiling
if [ -z "$MAX_JOBS" ]; then
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then
MAX_JOBS=`expr $(nproc) - 1`
else
MAX_JOBS=$(nproc)
fi
fi
report_compile_cache_stats() {
if [[ -n "${SCCACHE}" ]]; then
"$SCCACHE" --show-stats
@ -79,7 +99,7 @@ fi
###############################################################################
# Use special scripts for Android, conda, and setup builds
# Use special scripts for Android and setup builds
###############################################################################
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
@ -89,23 +109,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
"${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"
exit 0
elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
"${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"
report_compile_cache_stats
# This build will be tested against onnx tests, which needs onnx installed.
# At this point the visible protbuf installation will be in conda, since one
# of Caffe2's dependencies uses conda, so the correct protobuf include
# headers are those in conda as well
# This path comes from install_anaconda.sh which installs Anaconda into the
# docker image
PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
report_compile_cache_stats
exit 0
elif [[ $BUILD_ENVIRONMENT == *setup* ]]; then
rm -rf $INSTALL_PREFIX && mkdir $INSTALL_PREFIX
PYTHONPATH=$INSTALL_PREFIX $PYTHON setup_caffe2.py develop --install-dir $INSTALL_PREFIX
exit 0
fi
@ -119,11 +122,6 @@ CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")
if [[ $BUILD_ENVIRONMENT == *-aten-* ]]; then
if [[ CMAKE_ARGS != *USE_ATEN* ]] && [[ CMAKE_ARGS != *BUILD_ATEN* ]]; then
CMAKE_ARGS+=("-DBUILD_ATEN=ON")
fi
fi
if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then
CMAKE_ARGS+=("-DBLAS=MKL")
fi
@ -144,17 +142,19 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
export PATH="/usr/local/cuda/bin:$PATH"
fi
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# TODO: This is patching the official FindHip to properly handly
# cmake generator expression. A PR is opened in the upstream repo here:
# https://github.com/ROCm-Developer-Tools/HIP/pull/516
# remove this hack once it's merged.
if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then
sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake
fi
# This is needed to enable ImageInput operator in resnet50_trainer
CMAKE_ARGS+=("-USE_OPENCV=ON")
# This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
CMAKE_ARGS+=("-USE_LMDB=ON")
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
export HCC_AMDGPU_TARGET=gfx900
########## HIPIFY Caffe2 operators
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"
fi
# building bundled nccl in this config triggers a bug in nvlink. For
# more, see https://github.com/pytorch/pytorch/issues/14486
if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then
CMAKE_ARGS+=("-DUSE_SYSTEM_NCCL=ON")
fi
# Try to include Redis support for Linux builds
@ -172,14 +172,6 @@ fi
# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace
CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")
if [[ -n "$INTEGRATED" ]]; then
# TODO: This is a temporary hack to work around the issue that both
# caffe2 and pytorch have libcaffe2.so and crossfire at runtime.
CMAKE_ARGS+=("-DBUILD_SHARED_LIBS=OFF")
CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=OFF")
CMAKE_ARGS+=("-DCAFFE2_LINK_LOCAL_PROTOBUF=OFF")
fi
# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)
# and use that if so.
if [[ -x "$(command -v cmake3)" ]]; then
@ -187,38 +179,49 @@ if [[ -x "$(command -v cmake3)" ]]; then
else
CMAKE_BINARY=cmake
fi
# sccache will fail for CUDA builds if all cores are used for compiling
if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then
MAX_JOBS=`expr $(nproc) - 1`
else
MAX_JOBS=$(nproc)
fi
###############################################################################
# Configure and make
###############################################################################
# Run cmake from ./build_caffe2 directory so it doesn't conflict with
# standard PyTorch build directory. Eventually these won't need to
# be separate.
rm -rf build_caffe2
mkdir build_caffe2
cd ./build_caffe2
# Configure
${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"
if [[ -z "$INTEGRATED" ]]; then
# Run cmake from ./build_caffe2 directory so it doesn't conflict with
# standard PyTorch build directory. Eventually these won't need to
# be separate.
rm -rf build_caffe2
mkdir build_caffe2
cd ./build_caffe2
# Configure
${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"
# Build
if [ "$(uname)" == "Linux" ]; then
make "-j${MAX_JOBS}" install
else
echo "Don't know how to build on $(uname)"
exit 1
fi
# Build
if [ "$(uname)" == "Linux" ]; then
make "-j${MAX_JOBS}" install
else
echo "Don't know how to build on $(uname)"
exit 1
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user
# This is to save test binaries for testing
cp -r torch/lib/tmp_install $INSTALL_PREFIX
ls $INSTALL_PREFIX
report_compile_cache_stats
fi
report_compile_cache_stats
###############################################################################
# Install ONNX
###############################################################################
@ -228,17 +231,6 @@ pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx
report_compile_cache_stats
if [[ -n "$INTEGRATED" ]]; then
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
pip install --user -v -b /tmp/pip_install_torch "file://${ROOT_DIR}#egg=torch"
fi
report_compile_cache_stats
# Symlink the caffe2 base python path into the system python path,
# so that we can import caffe2 without having to change $PYTHONPATH.
# Run in a subshell to contain environment set by /etc/os-release.
@ -246,28 +238,30 @@ report_compile_cache_stats
# This is only done when running on Jenkins! We don't want to pollute
# the user environment with Python symlinks and ld.so.conf.d hacks.
#
if [ -n "${JENKINS_URL}" ]; then
(
source /etc/os-release
if [[ -z "$INTEGRATED" ]]; then
if [ -n "${JENKINS_URL}" ]; then
(
source /etc/os-release
function python_version() {
"$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'
}
function python_version() {
"$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'
}
# Debian/Ubuntu
if [[ "$ID_LIKE" == *debian* ]]; then
python_path="/usr/local/lib/$(python_version)/dist-packages"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# Debian/Ubuntu
if [[ "$ID_LIKE" == *debian* ]]; then
python_path="/usr/local/lib/$(python_version)/dist-packages"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# RHEL/CentOS
if [[ "$ID_LIKE" == *rhel* ]]; then
python_path="/usr/lib64/$(python_version)/site-packages/"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# RHEL/CentOS
if [[ "$ID_LIKE" == *rhel* ]]; then
python_path="/usr/lib64/$(python_version)/site-packages/"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# /etc/ld.so.conf.d is used on both Debian and RHEL
echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf
sudo ldconfig
)
# /etc/ld.so.conf.d is used on both Debian and RHEL
echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf
sudo ldconfig
)
fi
fi

View File

@ -15,14 +15,6 @@ fi
# The prefix must mirror the setting from build.sh
INSTALL_PREFIX="/usr/local/caffe2"
# Anaconda builds have a special install prefix and python
if [[ "$BUILD_ENVIRONMENT" == conda* ]]; then
# This path comes from install_anaconda.sh which installs Anaconda into the
# docker image
PYTHON="/opt/conda/bin/python"
INSTALL_PREFIX="/opt/conda/"
fi
# Add the site-packages in the caffe2 install prefix to the PYTHONPATH
SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")
INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"
@ -34,11 +26,9 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
fi
# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed
# Caffe2. This shouldn't be done on Anaconda, as Anaconda should handle this.
if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"
fi
# Caffe2.
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"
cd "$ROOT_DIR"
@ -49,7 +39,7 @@ fi
mkdir -p $TEST_DIR/{cpp,python}
cd ${INSTALL_PREFIX}
cd "${WORKSPACE}"
# C++ tests
echo "Running C++ tests.."
@ -62,12 +52,26 @@ for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
*/mkl_utils_test|*/aten/integer_divider_test)
continue
;;
*/aten/*)
# ATen uses test framework Catch2
"$test" -r=xml -o "${junit_reports_dir}/$(basename $test).xml"
;;
*)
"$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
*/scalar_tensor_test|*/basic|*/native_test)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
"$test"
fi
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
# currently do NOT use the xml test reporter, because Catch doesn't
# support multiple reporters
# c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223
# which means that enabling XML output means you lose useful stdout
# output for Jenkins. It's more important to have useful console
# output than it is to have XML output for Jenkins.
# Note: in the future, if we want to use xml test reporter once we switch
# to all gtest, one can simply do:
# "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
"$test"
;;
esac
done
@ -83,33 +87,39 @@ if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then
EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")
fi
conda_ignore_test=()
if [[ $BUILD_ENVIRONMENT == conda* ]]; then
# These tests both assume Caffe2 was built with leveldb, which is not the case
conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")
conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")
rocm_ignore_test=()
if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# Currently these tests are failing on ROCM platform:
# Unknown reasons, need to debug
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/arg_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")
fi
# Python tests
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
echo "Running Python tests.."
pip install --user pytest-sugar
"$PYTHON" \
-m pytest \
-x \
-v \
--disable-warnings \
--junit-xml="$TEST_DIR/python/result.xml" \
--ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \
--ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \
${rocm_ignore_test[@]} \
"$CAFFE2_PYPATH/python" \
"${EXTRA_TESTS[@]}"
# TODO: re-enable this for rocm CI jobs once we have more rocm workers
if [[ $BUILD_ENVIRONMENT != *rocm* ]]; then
# Python tests
echo "Running Python tests.."
"$PYTHON" \
-m pytest \
-x \
-v \
--junit-xml="$TEST_DIR/python/result.xml" \
--ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \
--ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \
${conda_ignore_test[@]} \
"$CAFFE2_PYPATH/python" \
"${EXTRA_TESTS[@]}"
fi
cd ${INSTALL_PREFIX}
if [[ -n "$INTEGRATED" ]]; then
pip install --user pytest-xdist torchvision
"$ROOT_DIR/scripts/onnx/test.sh" -p
pip install --user torchvision
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -14,8 +14,18 @@ clang --version
# symbolize=1: Gives us much better errors when things go wrong
export ASAN_OPTIONS=detect_leaks=0:symbolize=1
# FIXME: Remove the hardcoded "-pthread" option.
# With asan build, the cmake thread CMAKE_HAVE_LIBC_CREATE[1] checking will
# succeed because "pthread_create" is in libasan.so. However, libasan doesn't
# have the full pthread implementation. Other advanced pthread functions doesn't
# exist in libasan.so[2]. If we need some pthread advanced functions, we still
# need to link the pthread library.
# [1] https://github.com/Kitware/CMake/blob/8cabaaf054a16ea9c8332ce8e9291bd026b38c62/Modules/FindThreads.cmake#L135
# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems
#
# TODO: Make the ASAN flags a more unified env var
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \
NO_CUDA=1 DEBUG=1 \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
NO_CUDA=1 USE_MKLDNN=0 \
python setup.py install

View File

@ -1,15 +1,5 @@
#!/bin/bash
if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*
fi
# TODO: move this to Docker
# TODO: add both NCCL and MPI in CI test by fixing these test first
# sudo apt-get update
# sudo apt-get install libnccl-dev libnccl2
# sudo apt-get install openmpi-bin libopenmpi-dev
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
@ -17,6 +7,33 @@ fi
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# For distributed, four environmental configs:
# (1) build with only NCCL
# (2) build with NCCL and MPI
# (3) build with only MPI
# (4) build with neither
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
sudo apt-get -qq install openmpi-bin libopenmpi-dev
else
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
fi
sudo apt-get -qq install --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*
fi
echo "Python version:"
python --version
@ -27,66 +44,105 @@ echo "CMake version:"
cmake --version
# TODO: Don't run this...
pip install -r requirements.txt || true
pip install -q -r requirements.txt || true
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export HCC_AMDGPU_TARGET=gfx900
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
# When hcc runs out of memory, it silently exits without stopping
# the build process, leaving undefined symbols in the shared lib
# which will cause undefined symbol errors when later running
# tests. Setting MAX_JOBS to smaller number to make CI less flaky.
export MAX_JOBS=4
sudo chown -R jenkins:jenkins /usr/local
rm -rf "$(dirname "${BASH_SOURCE[0]}")/../../../pytorch_amd/" || true
python "$(dirname "${BASH_SOURCE[0]}")/../../tools/amd_build/build_pytorch_amd.py"
USE_ROCM=1 python setup.py install
exit
# ROCm CI is using Caffe2 docker images, which needs these wrapper
# scripts to correctly use sccache.
if [ -n "${SCCACHE_BUCKET}" ]; then
mkdir -p ./sccache
SCCACHE="$(which sccache)"
if [ -z "${SCCACHE}" ]; then
echo "Unable to find sccache..."
exit 1
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""
) > "./sccache/$compiler"
chmod +x "./sccache/$compiler"
done
export CACHE_WRAPPER_DIR="$PWD/sccache"
# CMake must find these wrapper scripts
export PATH="$CACHE_WRAPPER_DIR:$PATH"
fi
python tools/amd_build/build_amd.py
# OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer
# LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user
exit 0
fi
# TODO: Don't install this here
if ! which conda; then
pip install mkl mkl-devel
pip install -q mkl mkl-devel
if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc7.2* ]] || [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc4.8* ]]; then
export USE_MKLDNN=1
else
export USE_MKLDNN=0
fi
fi
# sccache will fail for CUDA builds if all cores are used for compiling
# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then
export MAX_JOBS=`expr $(nproc) - 1`
if [ -z "$MAX_JOBS" ]; then
if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
fi
# Target only our CI GPU machine's CUDA arch to speed up the build
export TORCH_CUDA_ARCH_LIST=5.2
export TORCH_CUDA_ARCH_LIST="5.2"
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0"
fi
if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then
export DEBUG=1
fi
WERROR=1 python setup.py install
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
WERROR=1 python setup.py install
elif [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
python setup.py install
fi
# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin
# Testing ATen install
if [[ "$BUILD_ENVIRONMENT" != *cuda* ]]; then
echo "Testing ATen install"
time tools/test_aten_install.sh
fi
# Test C FFI plugins
# cffi install doesn't work for Python 3.7
if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then
# TODO: Don't run this here
pip install cffi
git clone https://github.com/pytorch/extension-ffi.git
pushd extension-ffi/script
python build.py
popd
fi
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip install -r requirements.txt || true
make html
pip install -q -r requirements.txt || true
LC_ALL=C make html
popd
fi
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
make -j
popd
fi
@ -95,5 +151,19 @@ if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Building libtorch"
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
WERROR=1 VERBOSE=1 tools/cpp_build/build_all.sh "$PWD/../cpp-build"
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
mkdir -p ../cpp-build/caffe2
pushd ../cpp-build/caffe2
WERROR=1 VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
# Build custom operator tests.
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"
make VERBOSE=1
popd
fi

View File

@ -14,6 +14,8 @@ pytorch-linux-xenial-cuda9-cudnn7-py3-build
pytorch-linux-xenial-cuda9-cudnn7-py3-test
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-test
pytorch-linux-xenial-py3-clang5-asan-build
pytorch-linux-xenial-py3-clang5-asan-test
pytorch-linux-trusty-py2.7.9-build
@ -40,4 +42,10 @@ pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
pytorch-docker-build-test
short-perf-test-cpu
short-perf-test-gpu
py2-clang3.8-rocmnightly-ubuntu16.04-build
py2-clang7-rocmdeb-ubuntu16.04-build
py2-clang7-rocmdeb-ubuntu16.04-test
py2-devtoolset7-rocmrpm-centos7.5-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-test
pytorch-ppc64le-cuda9.1-cudnn7-py3-build
pytorch-ppc64le-cuda9.1-cudnn7-py3-test

View File

@ -29,11 +29,15 @@ if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
export NO_CUDA=0
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
if [ -z "${IN_CIRCLECI}" ]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
fi
else
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
if [ -z "${IN_CIRCLECI}" ]; then
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
@ -62,5 +66,7 @@ export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
python setup.py install
# Upload torch binaries when the build job is finished
7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read
if [ -z "${IN_CIRCLECI}" ]; then
7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read
fi

View File

@ -15,19 +15,24 @@ if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six
pip install hypothesis
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
fi
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
# Test PyTorch
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
else
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
if [ -z "${IN_CIRCLECI}" ]; then
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
else
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
@ -38,9 +43,11 @@ export MAX_JOBS=2
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
# Download torch binaries in the test jobs
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z
7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z
7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"
fi
test_python_all() {
echo "Ninja version: $(ninja --version)"
@ -56,8 +63,12 @@ test_cpp_api() {
#
CPP_BUILD="$PWD/../cpp-build"
rm -rf $CPP_BUILD
mkdir -p $CPP_BUILD
WERROR=1 VERBOSE=1 tools/cpp_build/build_all.sh "$CPP_BUILD"
mkdir -p $CPP_BUILD/caffe2
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd $CPP_BUILD/caffe2
VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
@ -65,16 +76,38 @@ test_cpp_api() {
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
"$CPP_BUILD"/libtorch/bin/test_api
"$CPP_BUILD"/caffe2/bin/test_api
}
test_custom_script_ops() {
echo "Testing custom script operators"
pushd test/custom_operator
# Build the custom operator library.
rm -rf build && mkdir build
pushd build
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..
make VERBOSE=1
popd
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
test_python_all
test_cpp_api
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
test_python_all
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_cpp_api
test_custom_script_ops
fi
fi

View File

@ -8,4 +8,21 @@ COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch (distributed only)"
if [ -n "${IN_CIRCLECI}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
sudo apt-get install -y --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
fi
time python test/run_test.py --verbose -i distributed

View File

@ -1,5 +1,6 @@
import sys
import json
import math
import numpy
import argparse
@ -35,14 +36,25 @@ else:
print("population mean: ", mean)
print("population sigma: ", sigma)
# Let the test pass if baseline number is NaN (which happened in
# the past when we didn't have logic for catching NaN numbers)
if math.isnan(mean) or math.isnan(sigma):
mean = sys.maxsize
sigma = 0.001
sample_stats_data = json.loads(args.sample_stats)
sample_mean = sample_stats_data['mean']
sample_sigma = sample_stats_data['sigma']
sample_mean = float(sample_stats_data['mean'])
sample_sigma = float(sample_stats_data['sigma'])
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception('''Error: sample mean is NaN''')
elif math.isnan(sample_sigma):
raise Exception('''Error: sample sigma is NaN''')
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)

View File

@ -20,6 +20,9 @@ test_gpu_speed_mnist () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
# Needs warm up to get accurate number
python main.py --epochs 1 --no-log
for (( i=1; i<=$NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo $runtime

View File

@ -1,32 +1,57 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch"
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
echo "Skipping ROCm tests for now"
exit 0
if [ -n "${IN_CIRCLECI}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
sudo apt-get -qq install --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
fi
# JIT C++ extensions require ninja.
git clone https://github.com/ninja-build/ninja --quiet
pushd ninja
python ./configure.py --bootstrap
export PATH="$PWD:$PATH"
popd
# --user breaks ppc64le builds and these packages are already in ppc64le docker
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# JIT C++ extensions require ninja.
pip install -q ninja --user
# ninja is installed in /var/lib/jenkins/.local/bin
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
pip install -q hypothesis --user
fi
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
# if you're not careful. Check this if you made some changes and the
# ASAN test is not working
if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
export ASAN_OPTIONS=detect_leaks=0:symbolize=1
export UBSAN_OPTIONS=print_stacktrace=1
export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true
# We suppress the vptr volation, since we have separate copies of
# libprotobuf in both libtorch.so and libcaffe2.so, and it causes
# the following problem:
# test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:
# runtime error: member call on address ... which does not point
# to an object of type 'google::protobuf::MessageLite'
# ...: note: object is of type 'onnx_torch::ModelProto'
#
# This problem should be solved when libtorch.so and libcaffe2.so are
# merged.
export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp
export PYTORCH_TEST_WITH_ASAN=1
export PYTORCH_TEST_WITH_UBSAN=1
# TODO: Figure out how to avoid hard-coding these paths
@ -49,13 +74,16 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")
fi
export ATEN_DISABLE_AVX=
export ATEN_DISABLE_AVX2=
if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then
export ATEN_DISABLE_AVX=1
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export PYTORCH_TEST_WITH_ROCM=1
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
fi
if [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then
export ATEN_DISABLE_AVX2=1
if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then
export ATEN_CPU_CAPABILITY=default
elif [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then
export ATEN_CPU_CAPABILITY=avx
fi
test_python_nn() {
@ -68,14 +96,23 @@ test_python_all_except_nn() {
test_aten() {
# Test ATen
if [[ "$BUILD_ENVIRONMENT" != *asan* ]]; then
# The following test(s) of ATen have already been skipped by caffe2 in rocm environment:
# scalar_tensor_test, basic, native_test
if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then
echo "Running ATen tests with pytorch lib"
TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib
# NB: the ATen test binaries don't have RPATH set, so it's necessary to
# put the dynamic libraries somewhere were the dynamic linker can find them.
# This is a bit of a hack.
ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin
ln -s "$TORCH_LIB_PATH"/libnccl* build/bin
if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
SUDO=sudo
fi
${SUDO} ln -s "$TORCH_LIB_PATH"/libc10* build/bin
${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin
${SUDO} ln -s "$TORCH_LIB_PATH"/libmkldnn* build/bin
${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin
ls build/bin
aten/tools/run_tests.sh build/bin
fi
@ -95,7 +132,7 @@ test_torchvision() {
# this should be a transient requirement...)
# See https://github.com/pytorch/pytorch/issues/7525
#time python setup.py install
pip install .
pip install -q --user .
popd
}
@ -104,28 +141,46 @@ test_libtorch() {
echo "Testing libtorch"
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/libtorch/bin/test_jit
"$CPP_BUILD"/caffe2/bin/test_jit
else
"$CPP_BUILD"/libtorch/bin/test_jit "[cpu]"
"$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 "$CPP_BUILD"/libtorch/bin/test_api
python tools/download_mnist.py --quiet -d mnist
OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api
fi
}
test_custom_script_ops() {
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Testing custom script operators"
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
pushd test/custom_operator
cp -r "$CUSTOM_OP_BUILD" build
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
fi
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
test_torchvision
test_python_nn
test_python_all_except_nn
test_aten
test_torchvision
test_libtorch
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
test_torchvision
test_python_nn
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_torchvision
test_python_all_except_nn
test_aten
test_torchvision
test_libtorch
test_custom_script_ops
fi
fi

View File

@ -38,7 +38,7 @@ EOL
cat >ci_scripts/build_pytorch.bat <<EOL
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
:: Install MKL
if "%REBUILD%"=="" (
@ -55,11 +55,11 @@ set LIB=%cd%\\mkl\\lib;%LIB
:: Install MAGMA
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.4.0_cuda90_release.7z --output magma_2.4.0_cuda90_release.7z
) else (
aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet
aws s3 cp s3://ossci-windows/magma_2.4.0_cuda90_release.7z magma_2.4.0_cuda90_release.7z --quiet
)
7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma
7z x -aoa magma_2.4.0_cuda90_release.7z -omagma
)
set MAGMA_HOME=%cd%\\magma
@ -80,18 +80,29 @@ if "%REBUILD%"=="" (
)
:: Install Miniconda3
if "%REBUILD%"=="" (
IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\\Jenkins
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\\Miniconda3
)
call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3
if "%REBUILD%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3
)
call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3
if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )
:: Install ninja
if "%REBUILD%"=="" ( pip install ninja )
set WORKING_DIR=%CD%
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64
cd %WORKING_DIR%
git submodule update --init --recursive
@ -129,7 +140,7 @@ if not "%USE_CUDA%"=="0" (
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch
rd /s /q %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch
copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe
)
@ -139,9 +150,10 @@ if not "%USE_CUDA%"=="0" (
python setup.py install && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."
echo NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3\` in Command Prompt before running Git Bash.
) else (
7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z
mv %CD%\\build\\bin\\test_api.exe %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch\\lib
7z a %IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z
)
)
)

View File

@ -36,18 +36,29 @@ EOL
cat >ci_scripts/setup_pytorch_env.bat <<EOL
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
:: Install Miniconda3
IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )
curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3
call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3
call conda install -y -q numpy mkl cffi pyyaml boto3
pip install ninja
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\\Jenkins
)
if NOT "%BUILD_ENVIRONMENT%"=="" (
IF EXIST %CONDA_PARENT_DIR%\\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\\Miniconda3 )
curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\\Miniconda3
)
call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3
if NOT "%BUILD_ENVIRONMENT%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3
)
pip install ninja future hypothesis
set WORKING_DIR=%CD%
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64
cd %WORKING_DIR%
set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%
set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
@ -58,13 +69,14 @@ set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\
set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set PYTHONPATH=%CD%\\test;%PYTHONPATH%
cd test/
python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z
7z x %IMAGE_COMMIT_TAG%.7z
cd ..
if NOT "%BUILD_ENVIRONMENT%"=="" (
cd test/
python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z
7z x %IMAGE_COMMIT_TAG%.7z
cd ..
) else (
xcopy /s %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch .\\test\\torch\\
)
EOL
@ -78,14 +90,47 @@ call ci_scripts/setup_pytorch_env.bat
cd test/ && python run_test.py --exclude nn --verbose && cd ..
EOL
cat >ci_scripts/test_custom_script_ops.bat <<EOL
call ci_scripts/setup_pytorch_env.bat
cd test/custom_operator
:: Build the custom operator library.
mkdir build
cd build
:: Note: Caffe2 does not support MSVC + CUDA + Debug mode (has to be Release mode)
cmake -DCMAKE_PREFIX_PATH=%CD%\\..\\..\\torch -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja -v
cd ..
:: Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module="build/model.pt"
:: Run tests C++-side and load the exported script module.
cd build
set PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt/bin/x64;%CD%\\..\\..\\torch\\lib;%PATH%
test_custom_ops.exe model.pt
EOL
cat >ci_scripts/test_libtorch.bat <<EOL
call ci_scripts/setup_pytorch_env.bat
dir
dir %CD%\\test
dir %CD%\\test\\torch
dir %CD%\\test\\torch\\lib
cd %CD%\\test\\torch\\lib
set PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt/bin/x64;%CD%\\..\\..\\torch\\lib;%PATH%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*"
EOL
run_tests() {
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat
ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat && ci_scripts/test_custom_script_ops.bat && ci_scripts/test_libtorch.bat
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
ci_scripts/test_python_nn.bat
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
ci_scripts/test_python_all_except_nn.bat
ci_scripts/test_python_all_except_nn.bat && ci_scripts/test_custom_script_ops.bat && ci_scripts/test_libtorch.bat
fi
fi
}

View File

@ -16,7 +16,28 @@ matrix:
python: "2.7"
install: pip install flake8
script: flake8
- env: LINT_CHECK
python: "3.7"
dist: xenial # required for Python 3.7 (travis-ci/travis-ci#9069)
sudo: required # required for Python 3.7 (travis-ci/travis-ci#9069)
install: pip install flake8
script: flake8
- env: MYPY_TYPE_CHECK
python: "3.6"
install: pip install mypy mypy-extensions
script: mypy @mypy-files.txt
- env: CPP_DOC_CHECK
python: "3.6"
install:
- sudo apt-get install -y doxygen
- pip install -r requirements.txt
script: cd docs/cpp/source && ./check-doxygen.sh
- env: CLANG_TIDY
python: "3.6"
addons:
apt:
sources:
- ubuntu-toolchain-r-test
- llvm-toolchain-trusty
packages: clang-tidy
script: tools/run-clang-tidy-in-ci.sh

View File

@ -5,11 +5,14 @@ cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
# ---[ Project and semantic versioning.
project(Caffe2 CXX C)
set(CAFFE2_VERSION_MAJOR 0)
set(CAFFE2_VERSION_MINOR 8)
set(CAFFE2_VERSION_PATCH 2)
set(CAFFE2_VERSION
"${CAFFE2_VERSION_MAJOR}.${CAFFE2_VERSION_MINOR}.${CAFFE2_VERSION_PATCH}")
set(CMAKE_INSTALL_MESSAGE NEVER)
set(CMAKE_CXX_STANDARD 11)
if (NOT MSVC)
set(CMAKE_C_STANDARD 11)
endif()
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# One variable that determines whether the current cmake process is being run
# with the main Caffe2 library. This is useful for building modules - if
@ -53,12 +56,16 @@ endif()
# Note to developers: if you add an option below, make sure you also add it to
# cmake/Summary.cmake so that the summary prints out the option values.
include(CMakeDependentOption)
option(BUILD_CAFFE2 "Build Caffe2" ON)
option(BUILD_ATEN "Build ATen" OFF)
option(BUILD_BINARY "Build C++ binaries" ON)
option(BUILD_TORCH "Build Torch" OFF)
option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)
option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)
option(BUILD_ATEN_ONLY "Build only a subset focused on ATen only" OFF)
option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_C10_EXPERIMENTAL_OPS "Build c10 experimental operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
@ -66,95 +73,107 @@ cmake_dependent_option(
cmake_dependent_option(
CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON
"NOT BUILD_SHARED_LIBS" OFF)
cmake_dependent_option(
BUILD_TEST "Build Caffe2 C++ test binaries (need gtest and gbenchmark)" OFF
"BUILD_CAFFE2" OFF)
option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)
cmake_dependent_option(
INSTALL_TEST "Install test binaries if BUILD_TEST is on" OFF
"BUILD_TEST" OFF)
option(USE_ACL "Use ARM Compute Library" OFF)
option(USE_ASAN "Use Address Sanitizer" OFF)
option(USE_ATEN "Use ATen" OFF)
option(USE_CUDA "Use CUDA" ON)
option(USE_ROCM "Use ROCm" OFF)
option(USE_ROCM "Use ROCm" ON)
option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)
cmake_dependent_option(
USE_CUDNN "Use cuDNN" ON
"USE_CUDA" OFF)
option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" OFF)
option(USE_FFMPEG "Use ffmpeg" OFF)
cmake_dependent_option(
USE_GFLAGS "Use GFLAGS" ON
"BUILD_CAFFE2" OFF)
cmake_dependent_option(
USE_GLOG "Use GLOG" ON
"BUILD_CAFFE2" OFF)
cmake_dependent_option(
USE_GLOO "Use Gloo" ON
"BUILD_CAFFE2" OFF)
option(USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed support" OFF)
cmake_dependent_option(
USE_LEVELDB "Use LEVELDB" ON
"BUILD_CAFFE2" OFF)
option(USE_GFLAGS "Use GFLAGS" ON)
option(USE_GLOG "Use GLOG" ON)
option(USE_LEVELDB "Use LEVELDB" ON)
option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
cmake_dependent_option(
USE_LMDB "Use LMDB" ON
"BUILD_CAFFE2" OFF)
cmake_dependent_option(
USE_METAL "Use Metal for iOS build" ON
"BUILD_CAFFE2" OFF)
cmake_dependent_option(
USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON
"BUILD_CAFFE2" OFF)
cmake_dependent_option(
USE_MPI "Use MPI" ON
"BUILD_CAFFE2" OFF)
option(USE_LMDB "Use LMDB" ON)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
option(USE_NERVANA_GPU "Use Nervana GPU backend" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)
option(USE_NUMA "Use NUMA (only available on Linux)" ON)
cmake_dependent_option(
USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF
"USE_CUDA" OFF)
option(USE_NUMPY "Use NumPy" ON)
option(USE_OBSERVERS "Use observers module." OFF)
option(USE_OPENCL "Use OpenCL" OFF)
cmake_dependent_option(
USE_OPENCV "Use OpenCV" ON
"BUILD_CAFFE2" OFF)
option(USE_OPENCV "Use OpenCV" ON)
option(USE_OPENMP "Use OpenMP for parallel code" OFF)
option(USE_PROF "Use profiling" OFF)
option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" ON)
option(USE_REDIS "Use Redis" OFF)
option(USE_ROCKSDB "Use RocksDB" OFF)
option(USE_SNPE "Use Qualcomm's SNPE library" OFF)
option(USE_SYSTEM_EIGEN_INSTALL
"Use system Eigen instead of the one under third_party" OFF)
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_ZMQ "Use ZMQ" OFF)
option(USE_ZSTD "Use ZSTD" OFF)
option(USE_MKLDNN "Use MKLDNN" OFF)
option(USE_DISTRIBUTED "Use distributed" ON)
cmake_dependent_option(
USE_IDEEP "Use IDEEP interface in MKL BLAS" ON
"BUILD_CAFFE2" OFF)
USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
cmake_dependent_option(
USE_MKLML "Use MKLML interface in MKL BLAS" ON
"BUILD_CAFFE2" OFF)
option(USE_DISTRIBUTED "Use THD (distributed)" OFF)
option(USE_DISTRIBUTED_MW "Use THD (distributed) master worker" OFF)
USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
cmake_dependent_option(
USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF
"USE_GLOO" OFF)
# Used when building Caffe2 through setup.py
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)
if (USE_ATEN)
set(BUILD_ATEN ${USE_ATEN})
SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")
if (ANDROID OR IOS)
set(BUILD_ATEN_MOBILE ON)
endif()
if (BUILD_ATEN_ONLY)
set(BUILD_CAFFE2_OPS OFF)
set(BUILD_PYTHON OFF)
set(USE_NUMA OFF)
set(USE_LEVELDB OFF)
set(USE_GFLAGS OFF)
set(USE_GLOG OFF)
set(USE_NCCL OFF)
set(USE_NNPACK OFF)
set(USE_NUMPY OFF)
set(USE_OPENCV OFF)
set(USE_MKLDNN OFF)
set(USE_DISTRIBUTED OFF)
set(USE_LMDB OFF)
endif()
# ---[ Utils
# TODO: merge the following 3 files into cmake/public/utils.cmake.
include(cmake/Utils.cmake)
include(cmake/public/utils.cmake)
# ---[ Version numbers for generated libraries
set(TORCH_DEFAULT_VERSION "1.0.0")
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")
if (NOT TORCH_BUILD_VERSION)
# An empty string was specified so force version to the default
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"
CACHE STRING "Torch build version" FORCE)
endif()
caffe2_parse_version_str(TORCH ${TORCH_BUILD_VERSION})
caffe2_parse_version_str(CAFFE2 ${TORCH_BUILD_VERSION})
# ---[ CMake scripts + modules
list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)
if (MSVC AND ${BUILD_SHARED_LIBS})
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
endif()
# ---[ CMake build directories
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
@ -178,11 +197,6 @@ include(cmake/MiscCheck.cmake)
# External projects
include(ExternalProject)
# ---[ Utils
# TODO: merge the following 3 files into cmake/public/utils.cmake.
include(cmake/Utils.cmake)
include(cmake/public/utils.cmake)
# ---[ Dependencies
include(cmake/Dependencies.cmake)
@ -214,11 +228,18 @@ if(NOT MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-variable")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-function")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-result")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-overflow")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-aliasing")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations")
if (CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0))
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pedantic")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=redundant-decls")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=old-style-cast")
# These flags are not available in GCC-4.8.5. Set only when using clang.
# Compared against https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Option-Summary.html
if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-invalid-partial-specialization")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-typedef-redefinition")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-warning-option")
@ -228,15 +249,20 @@ if(NOT MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Qunused-arguments")
endif()
if ((APPLE AND (NOT ("${CLANG_VERSION_STRING}" VERSION_LESS "9.0")))
OR (CMAKE_COMPILER_IS_GNUCXX
OR (CMAKE_COMPILER_IS_GNUCXX
AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0 AND NOT APPLE)))
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -faligned-new")
endif()
if ($ENV{WERROR})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror")
endif($ENV{WERROR})
if (NOT APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-but-set-variable")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-maybe-uninitialized")
endif()
else()
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
@ -263,6 +289,21 @@ if (USE_ASAN)
set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fsanitize=address")
endif()
if (APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")
endif()
if (EMSCRIPTEN)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0")
endif()
if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
endif()
if(ANDROID)
if(CMAKE_COMPILER_IS_GNUCXX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")
@ -286,12 +327,10 @@ include_directories(BEFORE ${PROJECT_SOURCE_DIR})
# in PROJECT_SOURCE_DIR.
include_directories(BEFORE ${PROJECT_BINARY_DIR})
# ---[ Old caffe protobuf
if(BUILD_CAFFE2)
add_subdirectory(caffe/proto)
endif()
include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)
# ---[ Main build
add_subdirectory(c10)
add_subdirectory(caffe2)
# --[ Documentation
@ -308,7 +347,7 @@ if(BUILD_DOCS)
if(EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)
file(REMOVE_RECURSE ${CMAKE_CURRENT_BINARY_DIR}/docs)
endif (EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)
endif()
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/docs)
configure_file(${DOXYGEN_C_IN} ${DOXYGEN_C_OUT} @ONLY)
@ -325,10 +364,10 @@ if(BUILD_DOCS)
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "Generating Python API documentation with Doxygen"
VERBATIM)
else (DOXYGEN_FOUND)
else()
message(FATAL_ERROR "Doxygen needs to be installed to generate the documentation")
endif (DOXYGEN_FOUND)
endif (BUILD_DOCS)
endif()
endif()
# ---[ CMake related files
# Uninistall option.
@ -383,6 +422,8 @@ if (BUILD_SHARED_LIBS)
${PROJECT_SOURCE_DIR}/cmake/public/cuda.cmake
${PROJECT_SOURCE_DIR}/cmake/public/glog.cmake
${PROJECT_SOURCE_DIR}/cmake/public/gflags.cmake
${PROJECT_SOURCE_DIR}/cmake/public/mkl.cmake
${PROJECT_SOURCE_DIR}/cmake/public/mkldnn.cmake
${PROJECT_SOURCE_DIR}/cmake/public/protobuf.cmake
${PROJECT_SOURCE_DIR}/cmake/public/threads.cmake
${PROJECT_SOURCE_DIR}/cmake/public/utils.cmake
@ -402,19 +443,15 @@ else()
endif()
# ---[ Modules
if (BUILD_CAFFE2)
add_subdirectory(modules)
endif()
add_subdirectory(modules)
# ---[ Binaries
# Binaries will be built after the Caffe2 main libraries and the modules
# are built. For the binaries, they will be linked to the Caffe2 main
# libraries, as well as all the modules that are built with Caffe2 (the ones
# built in the previous Modules section above).
if (BUILD_CAFFE2)
if (BUILD_BINARY)
add_subdirectory(binaries)
endif()
if (BUILD_BINARY)
add_subdirectory(binaries)
endif()
include(cmake/Summary.cmake)

View File

@ -1,21 +1,9 @@
# This is a comment.
# Each line is a file pattern followed by one or more owners.
/aten/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/torch/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ssnl @zou3519
/test @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/tools @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/README.md @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/setup.py @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/requirements.txt @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
/torch/csrc/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
/test/cpp/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
/torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/torch/csrc/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/torch/csrc/jit/passes/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/test/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/scripts/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
/docs/cpp @goldsborough @ebetica
/torch/csrc/api/ @ebetica @goldsborough
/test/cpp/api/ @ebetica @goldsborough
/torch/lib/c10d/ @apaszke @pietern @teng-li
/torch/csrc/distributed/ @apaszke @pietern @teng-li
/torch/distributed/ @apaszke @pietern @teng-li

View File

@ -19,18 +19,18 @@ If you are not familiar with creating a Pull Request, here are some guides:
- https://help.github.com/articles/creating-a-pull-request/
## Developing locally with PyTorch
## Developing PyTorch
To locally develop with PyTorch, here are some tips:
To develop PyTorch on your machine, here are some tips:
1. Uninstall all existing pytorch installs
1. Uninstall all existing PyTorch installs:
```
conda uninstall pytorch
pip uninstall torch
pip uninstall torch # run this command twice
```
2. Locally clone a copy of PyTorch from source:
2. Clone a copy of PyTorch from source:
```
git clone https://github.com/pytorch/pytorch
@ -75,6 +75,84 @@ You do not need to repeatedly install after modifying python files.
In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`
and `python setup.py clean`. Then you can install in `build develop` mode again.
## Codebase structure
* [c10](c10) - Core library files that work everywhere, both server
and mobile. We are slowly moving pieces from ATen/core here.
This library is intended only to contain essential functionality,
and appropriate to use in settings where binary size matters. (But
you'll have a lot of missing functionality if you try to use it
directly.)
* [aten](aten) - C++ tensor library for PyTorch (no autograd support)
* src
* [TH](aten/src/TH)
[THC](aten/src/THC)
[THNN](aten/src/THNN)
[THCUNN](aten/src/THCUNN) - Legacy library code from the original
Torch. Try not to add things here; we're slowly porting these to
native.
* generic - Contains actual implementations of operators,
parametrized over `scalar_t`. Files here get compiled N times
per supported scalar type in PyTorch.
* ATen
* [core](aten/src/ATen/core) - Core functionality of ATen. This
is migrating to top-level c10 folder.
* [native](aten/src/ATen/native) - Modern implementations of
operators. If you want to write a new operator, here is where
it should go. Most CPU operators go in the top level directory,
except for operators which need to be compiled specially; see
cpu below.
* [cpu](aten/src/ATen/native/cpu) - Not actually CPU
implementations of operators, but specifically implementations
which are compiled with processor-specific instructions, like
AVX. See the README for more details.
* [cuda](aten/src/ATen/native/cuda) - CUDA implementations of
operators.
* [sparse](aten/src/ATen/native/sparse) - CPU and CUDA
implementations of COO sparse tensor operations
* [mkl](aten/src/ATen/native/mkl) [mkldnn](aten/src/ATen/native/mkldnn)
[miopen](aten/src/ATen/native/miopen) [cudnn](aten/src/ATen/native/cudnn)
- implementations of operators which simply bind to some
backend library.
* [torch](torch) - The actual PyTorch library. Everything that is not
in csrc is Python modules, following the PyTorch Python frontend
module structure.
* [csrc](torch/csrc) - C++ files composing the PyTorch library. Files
in this directory tree are a mix of Python binding code, and C++
heavy lifting. Consult `setup.py` for the canonical list of Python
binding files; conventionally, they are often prefixed with
`python_`.
* [jit](torch/csrc/jit) - Compiler and frontend for TorchScript JIT
frontend.
* [autograd](torch/csrc/autograd) - Implementation of reverse-mode automatic
differentation
* [api](torch/csrc/api) - The PyTorch C++ frontend.
* [distributed](torch/csrc/distributed) - Distributed training
support for PyTorch.
* [tools](tools) - Code generation scripts for the PyTorch library.
See README of this directory for more details.
* [test](tests) - Python unit tests for PyTorch Python frontend
* [test_torch.py](test/test_torch.py) - Basic tests for PyTorch
functionality
* [test_autograd.py](test/test_autograd.py) - Tests for non-NN
automatic differentiation support
* [test_nn.py](test/test_nn.py) - Tests for NN operators and
their automatic differentiation
* [test_jit.py](test/test_jit.py) - Tests for the JIT compiler
and TorchScript
* ...
* [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend
* [expect](test/expect) - Automatically generated "expect" files
which are used to compare against expected output.
* [onnx](test/onnx) - Tests for ONNX export functionality,
using both PyTorch and Caffe2.
* [caffe2](caffe2) - The Caffe2 library.
* [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace,
blobs, etc.
* [operators](caffe2/operators) - Operators of Caffe2
* [python](caffe2/python) - Python bindings to Caffe2
* ...
## Unit testing
PyTorch's testing is located under `test/`. Run the entire test suite with
@ -104,6 +182,18 @@ PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/late
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
fit into Jupyter documentation popups.
For C++ documentation (https://pytorch.org/cppdocs), we use
[Doxygen](http://www.doxygen.nl/) and then convert it to
[Sphinx](http://www.sphinx-doc.org/) via
[Breathe](https://github.com/michaeljones/breathe) and
[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen
reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more
information on the documentation syntax. To build the documentation locally,
`cd` into `docs/cpp` and then `make html`.
We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
commands. To run this check locally, run `./check-doxygen.sh` from inside
`docs/cpp`.
## Managing multiple build trees
@ -139,19 +229,20 @@ not very optimized for incremental rebuilds, this will actually be very slow.
Far better is to only request rebuilds of the parts of the project you are
working on:
- Working on `torch/csrc`? Run `python setup.py develop` to rebuild
- Working on the Python bindings? Run `python setup.py develop` to rebuild
(NB: no `build` here!)
- Working on `torch/lib/TH`, did not make any cmake changes, and just want to
see if it compiles? Run `(cd torch/lib/build/TH && make install -j$(getconf _NPROCESSORS_ONLN))`. This
applies for any other subdirectory of `torch/lib`. **Warning: Changes you
make here will not be visible from Python.** See below.
- Working on `torch/csrc` or `aten`? Run `python setup.py rebuild_libtorch` to
rebuild and avoid having to rebuild other dependent libraries we
depend on.
- Working on `torch/lib` and want to run your changes / rerun cmake? Run
`python setup.py build_deps`. Note that this will rerun cmake for
every subdirectory in TH; if you are only working on one project,
consider editing `torch/lib/build_all.sh` and commenting out the
`build` lines of libraries you are not working on.
- Working on one of the other dependent libraries? The other valid
targets are listed in `dep_libs` in `setup.py`. prepend `build_` to
get a target, and run as e.g. `python setup.py build_gloo`.
- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to
rebuild only that test binary (without rerunning cmake). (Replace `ninja` with
`make` if you don't have ninja installed).
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
@ -181,6 +272,8 @@ information for the code in `torch/csrc`. More information at:
Python `setuptools` is pretty dumb, and always rebuilds every C file in a
project. If you install the ninja build system with `pip install ninja`,
then PyTorch will use it to track dependencies correctly.
If pytorch was already built, you will need to run `python setup.py clean` once
after installing ninja for builds to succeed.
#### Use CCache
@ -247,9 +340,9 @@ than Linux, which are worth keeping in mind when fixing these problems.
1. Symbols are NOT exported by default on Windows; instead, you have to explicitly
mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /
`__declspec(dllimport)`. We have codified this pattern into a set of macros
which follow the convention `*_API`, e.g., `AT_API` inside ATen. (Every separate
shared library needs a unique macro name, because symbol visibility is on a per
shared library basis.)
which follow the convention `*_API`, e.g., `CAFFE2_API` inside Caffe2 and ATen.
(Every separate shared library needs a unique macro name, because symbol visibility
is on a per shared library basis. See c10/macros/Macros.h for more details.)
The upshot is if you see an "unresolved external" error in your Windows build, this
is probably because you forgot to mark a function with `*_API`. However, there is
@ -266,8 +359,7 @@ than Linux, which are worth keeping in mind when fixing these problems.
3. If you have a Windows box (we have a few on EC2 which you can request access to) and
you want to run the build, the easiest way is to just run `.jenkins/pytorch/win-build.sh`.
If you need to rebuild, run `REBUILD=1 .jenkins/pytorch/win-build.sh` (this will avoid
blowing away your Conda environment.) I recommend opening `cmd.exe`, and then running
`bash` to work in a bash shell (which will make various Linux commands available.)
blowing away your Conda environment.)
Even if you don't know anything about MSVC, you can use cmake to build simple programs on
Windows; this can be helpful if you want to learn more about some peculiar linking behavior
@ -293,6 +385,84 @@ cmake ..
cmake --build .
```
### Known MSVC (and MSVC with NVCC) bugs
The PyTorch codebase sometimes likes to use exciting C++ features, and
these exciting features lead to exciting bugs in Windows compilers.
To add insult to injury, the error messages will often not tell you
which line of code actually induced the erroring template instantiation.
I've found the most effective way to debug these problems is to
carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.
Here are a few well known pitfalls and workarounds:
* This is not actually a bug per se, but in general, code generated by MSVC
is more sensitive to memory errors; you may have written some code
that does a use-after-free or stack overflows; on Linux the code
might work, but on Windows your program will crash. ASAN may not
catch all of these problems: stay vigilant to the possibility that
your crash is due to a real memory problem.
* (NVCC) `c10::optional` does not work when used from device code. Don't use
it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58
and our local issue #10329.
* `constexpr` generally works less well on MSVC.
* The idiom `static_assert(f() == f())` to test if `f` is constexpr
does not work; you'll get "error C2131: expression did not evaluate
to a constant". Don't use these asserts on Windows.
(Example: `c10/util/intrusive_ptr.h`)
* (NVCC) Code you access inside a `static_assert` will eagerly be
evaluated as if it were device code, and so you might get an error
that the code is "not accessible".
```
class A {
static A singleton_;
static constexpr inline A* singleton() {
return &singleton_;
}
};
static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");
```
* The compiler will run out of heap if you attempt to compile files that
are too large. Splitting such files into separate files helps.
(Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)
### Running Clang-Tidy
[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++
linter and static analysis tool based on the clang compiler. We run clang-tidy
in our CI to make sure that new C++ code is safe, sane and efficient. See our
[.travis.yml](https://github.com/pytorch/pytorch/blob/master/.travis.yml) file
for the simple commands we use for this.
To run clang-tidy locally, follow these steps:
1. Install clang-tidy. First, check if you already have clang-tidy by simply
writing `clang-tidy` in your terminal. If you don't yet have clang-tidy, you
should be able to install it easily with your package manager, e.g. by writing
`apt-get install clang-tidy` on Ubuntu. See https://apt.llvm.org for details on
how to install the latest version. Note that newer versions of clang-tidy will
have more checks than older versions. In our CI, we run clang-tidy-6.0.
2. Use our driver script to run clang-tidy over any changes relative to some
git revision (you may want to replace `HEAD~1` with `HEAD` to pick up
uncommitted changes). Changes are picked up based on a `git diff` with the
given revision:
```sh
$ python tools/clang_tidy.py -d build -p torch/csrc --diff 'HEAD~1'
```
Above, it is assumed you are in the PyTorch root folder. `path/to/build` should
be the path to where you built PyTorch from source, e.g. `build` in the PyTorch
root folder if you used `setup.py build`. You can use `-c <clang-tidy-binary>`
to change the clang-tidy this script uses. Make sure you have PyYaml installed,
which is in PyTorch's `requirements.txt`.
## Caffe2 notes
In 2018, we merged Caffe2 into the PyTorch source repository. While the

View File

@ -15,17 +15,20 @@ We are in an early-release beta. Expect some adventures and rough edges.
- [Binaries](#binaries)
- [From Source](#from-source)
- [Docker Image](#docker-image)
- [Building the Documentation](#building-the-documentation)
- [Previous Versions](#previous-versions)
- [Getting Started](#getting-started)
- [Communication](#communication)
- [Releases and Contributing](#releases-and-contributing)
- [The Team](#the-team)
| System | 2.7 | 3.5 |
| --- | --- | --- |
| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |
| Windows GPU | <center></center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/)
| System | 2.7 | 3.5 | 3.6 |
| :---: | :---: | :---: | :--: |
| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center></center> |
| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center></center> |
| Windows GPU | <center></center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/) | <center></center> |
| Linux (ppc64le) CPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |
| Linux (ppc64le) GPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |
See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).
@ -56,8 +59,8 @@ If you use NumPy, then you have used Tensors (a.k.a ndarray).
![Tensor illustration](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/tensor_illustration.png)
PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerate
compute by a huge amount.
PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the
computation by a huge amount.
We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs
such as slicing, indexing, math operations, linear algebra, reductions.
@ -76,7 +79,7 @@ change the way your network behaves arbitrarily with zero lag or overhead. Our i
from several research papers on this topic, as well as current and past work such as
[torch-autograd](https://github.com/twitter/torch-autograd),
[autograd](https://github.com/HIPS/autograd),
[Chainer](http://chainer.org), etc.
[Chainer](https://chainer.org), etc.
While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date.
You get the best of speed and flexibility for your crazy research.
@ -87,7 +90,7 @@ You get the best of speed and flexibility for your crazy research.
PyTorch is not a Python binding into a monolithic C++ framework.
It is built to be deeply integrated into Python.
You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.
You can use it naturally like you would use [NumPy](http://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](http://scikit-learn.org) etc.
You can write your new neural network layers in Python itself, using your favorite libraries
and use packages such as Cython and Numba.
Our goal is to not reinvent the wheel where appropriate.
@ -103,10 +106,9 @@ We hope you never spend hours debugging your code because of bad stack traces or
### Fast and Lean
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.
such as [Intel MKL](https://software.intel.com/mkl) and NVIDIA (cuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and neural network backends
(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.
They are mature and have been tested for years.
(TH, THC, THNN, THCUNN) are mature and have been tested for years.
Hence, PyTorch is quite fast whether you run small or large neural networks.
@ -121,10 +123,10 @@ Writing new neural network modules, or interfacing with PyTorch's Tensor API was
and with minimal abstractions.
You can write new neural network layers in Python using the torch API
[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
[or your favorite NumPy-based libraries such as SciPy](https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).
There is no wrapper code that needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).
## Installation
@ -132,7 +134,7 @@ There is no wrapper code that needs to be written. You can see [a tutorial here]
### Binaries
Commands to install from binaries via Conda or pip wheels are on our website:
[http://pytorch.org](http://pytorch.org)
[https://pytorch.org](https://pytorch.org)
### From Source
@ -148,7 +150,9 @@ If you want to compile with CUDA support, install
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
Other potentially useful environment variables may be found in `setup.py`.
If you want to build on Windows, Visual Studio 2017 and NVTX are also needed.
If you want to build on Windows, Visual Studio 2017 14.11 toolset and NVTX are also needed.
Especially, for CUDA 8 build on Windows, there will be an additional requirement for VS 2015 Update 3 and a patch for it.
The details of the patch can be found out [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).
#### Install optional dependencies
@ -161,7 +165,7 @@ conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn
# Add LAPACK support for the GPU
conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9
conda install -c pytorch magma-cuda92 # or [magma-cuda80 | magma-cuda91] depending on your cuda version
```
On macOS
@ -196,11 +200,11 @@ On Windows
set "VS150COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build"
set CMAKE_GENERATOR=Visual Studio 15 2017 Win64
set DISTUTILS_USE_SDK=1
REM The following line is needed for Python 2.7, but the support for it is very experimental.
REM The following two lines are needed for Python 2.7, but the support for it is very experimental.
set MSSdk=1
REM As for CUDA 8, VS2015 Update 2 or up is required to build PyTorch. Use the following two lines.
set "PREBUILD_COMMAND=%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat"
set PREBUILD_COMMAND_ARGS=x64
set FORCE_PY27_BUILD=1
REM As for CUDA 8, VS2015 Update 3 is also required to build PyTorch. Use the following line.
set "CUDAHOSTCXX=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"
call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11
python setup.py install
@ -208,7 +212,7 @@ python setup.py install
### Docker image
Dockerfile is supplied to build images with cuda support and cudnn v7. Build as usual
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which python version is to be used by Miniconda, or leave it unset to use the default. Build as usual
```
docker build -t pytorch -f docker/pytorch/Dockerfile .
```
@ -222,24 +226,36 @@ Please note that PyTorch uses shared memory to share data between processes, so
for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you
should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.
### Building the Documentation
To build documentation in various formats, you will need [Sphinx](http://www.sphinx-doc.org) and the
readthedocs theme.
```
cd docs/
pip install -r requirements.txt
```
You can then build the documentation by running ``make <format>`` from the
``docs/`` folder. Run ``make`` to get a list of all available output formats.
### Previous Versions
Installation instructions and binaries for previous PyTorch versions may be found
on [our website](http://pytorch.org/previous-versions/).
on [our website](https://pytorch.org/previous-versions).
## Getting Started
Three pointers to get you started:
- [Tutorials: get you started with understanding and using PyTorch](http://pytorch.org/tutorials/)
- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)
- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)
- [The API Reference](http://pytorch.org/docs/)
- [The API Reference](https://pytorch.org/docs/)
## Communication
* forums: discuss implementations, research, etc. http://discuss.pytorch.org
* forums: discuss implementations, research, etc. https://discuss.pytorch.org
* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at slack@pytorch.org
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: https://eepurl.com/cbG0rv
## Releases and Contributing
@ -259,3 +275,7 @@ PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [S
A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.
Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.
## License
PyTorch is BSD-style licensed, as found in the LICENSE file.

View File

@ -1,3 +0,0 @@
[flake8]
max-line-length = 120

3
aten/.gitignore vendored
View File

@ -1,3 +0,0 @@
__pycache__/
build/
*.pyc

View File

@ -1,22 +1,5 @@
if (CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
if (NOT BUILD_ATEN)
return()
endif()
else()
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(ATen CXX C)
include(CMakeDependentOption)
option(USE_CUDA "Use CUDA" ON)
option(USE_ROCM "Use ROCm" OFF)
option(USE_CUDNN "Use cuDNN" ON)
option(USE_MKLDNN "Use MKLDNN" ON)
cmake_dependent_option(
USE_CUDNN "Use cuDNN" ON
"USE_CUDA" OFF)
option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)
# Flag for shared dependencies
set(BUILD_ATEN ON)
if (BUILD_ATEN_MOBILE)
return()
endif()
# Find modules
@ -45,32 +28,6 @@ SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")
SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")
SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")
if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
# ---[ Build variables set within the cmake tree
include(../cmake/BuildVariables.cmake)
set(CAFFE2_WHITELIST "" CACHE STRING "A whitelist file of files that one should build.")
# ---[ Misc checks to cope with various compiler modes
include(../cmake/MiscCheck.cmake)
# External projects
include(ExternalProject)
# ---[ Utils
# TODO: merge the following 3 files into cmake/public/utils.cmake.
include(../cmake/Utils.cmake)
include(../cmake/public/utils.cmake)
# ---[ Dependencies
include(../cmake/Dependencies.cmake)
list(APPEND ATen_CPU_INCLUDE ${Caffe2_CPU_INCLUDE})
list(APPEND ATen_CUDA_INCLUDE ${Caffe2_GPU_INCLUDE})
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${Caffe2_DEPENDENCY_LIBS})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${Caffe2_CUDA_DEPENDENCY_LIBS})
list(APPEND ATen_PUBLIC_CUDA_DEPENDENCY_LIBS
${Caffe2_PUBLIC_CUDA_DEPENDENCY_LIBS})
endif()
if(USE_CUDA)
list(APPEND ATen_CUDA_INCLUDE ${CUDA_INCLUDE_DIRS})
endif()
@ -80,14 +37,20 @@ add_subdirectory(src/TH)
set(TH_CPU_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/TH
${CMAKE_CURRENT_SOURCE_DIR}/src/THC
${CMAKE_CURRENT_BINARY_DIR}/src/TH
${CMAKE_CURRENT_BINARY_DIR}/src/THC
${CMAKE_CURRENT_SOURCE_DIR}/src
${CMAKE_CURRENT_BINARY_DIR}/src
${CMAKE_BINARY_DIR}/aten/src)
list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE})
if(USE_CUDA OR USE_ROCM)
set(TH_CUDA_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/THC
${CMAKE_CURRENT_BINARY_DIR}/src/THC)
list(APPEND ATen_CUDA_INCLUDE ${TH_CUDA_INCLUDE})
endif()
add_subdirectory(src/THNN)
# Find the HIP package, set the HIP paths, load the HIP CMake.
@ -129,15 +92,14 @@ list(APPEND ATen_CPU_INCLUDE
${CMAKE_CURRENT_BINARY_DIR}/src/ATen)
add_subdirectory(src/ATen)
if (CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
# Pass source, includes, and libs to parent
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)
endif()
# Pass source, includes, and libs to parent
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)

View File

@ -1,258 +0,0 @@
# ATen: A TENsor library
ATen is a simple tensor library thats exposes the Tensor operations in Torch
and PyTorch directly in C++11. The wrapper respects the semantics of operators
in PyTorch, except minor details due to differences between C++ and Python in
the way default arguments are handled. See the [documentation for tensors](http://pytorch.org/docs/tensors.html) in PyTorch for what these operations do.
ATen's API is auto-generated from the same declarations PyTorch uses so the
two APIs will track each other over time.
Tensor types are resolved dynamically, such that the API is generic and
does not include templates. That is, there is one `Tensor` type. It can hold a
CPU or CUDA Tensor, and the tensor may have Doubles, Float, Ints, etc. This design
makes it easy to write generic code without templating everything.
See the _generated_ [`Tensor.h` file](doc/Tensor.h) and [`Functions.h` file](doc/Functions.h) for the provided API. Excerpt:
```c++
Tensor atan2(const Tensor & other) const;
Tensor & atan2_(const Tensor & other);
Tensor pow(Scalar exponent) const;
Tensor pow(const Tensor & exponent) const;
Tensor & pow_(Scalar exponent);
Tensor & pow_(const Tensor & exponent);
Tensor lerp(const Tensor & end, Scalar weight) const;
Tensor & lerp_(const Tensor & end, Scalar weight);
Tensor histc() const;
Tensor histc(int64_t bins) const;
Tensor histc(int64_t bins, Scalar min) const;
Tensor histc(int64_t bins, Scalar min, Scalar max) const;
```
Inplace operations are also provided, and always suffixed by `_` to indicate they will modify the Tensor.
### Installation
TH/THC/THNN/THCUNN are provided (as git subtrees), so the repo is standalone. You will need a C++11 compiler, cmake, and the pyyaml python package.
```
# Install pyyaml used by python code generation to read API declarations
# macOS: if you don't have pip
sudo easy_install pip
# Ubuntu: if you don't have pip
apt-get -y install python-pip
# if you don't have pyyaml
sudo pip install pyyaml
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/where/you/want # specify your dest directory
# cmake .. -DUSE_NVRTC=ON -DUSE_TENSORRT=OFF -DCMAKE_INSTALL_PREFIX=../install -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DUSE_CUDA=ON # for CUDA
# cmake .. -DUSE_CUDA=OFF # for CPU only machines
make install
```
### Example usage
Here is a simple example; again, the syntax follows Torch semantics.
```c++
using namespace at; // assumed in the following
Tensor d = CPU(kFloat).ones({3, 4});
Tensor r = CPU(kFloat).zeros({3,4});
for(auto i = 0; i < 100000; i++) {
r = r.add(d);
// equivalently
r = r + d;
// or
r += d;
}
```
Want this running on the GPU?
```c++
using namespace at; // assumed in the following
Tensor d = CUDA(kFloat).ones({3, 4});
Tensor r = CUDA(kFloat).zeros({3,4});
for(auto i = 0; i < 100000; i++) {
r = r.add(d);
// equivalently
r = r + d;
// or
r += d;
}
```
Expressions like `CUDA(kFloat)` are first-class `at::Type` objects that represent
the type of a Tensor and are used to create Tensors when their type cannot be
inferred. See the _generated_ [Type header](doc/Type.h) for its API.
See more in [sample files](src/ATen/test).
### Creating your kernel
It is easy to create new kernels, thanks to the `dispatch<>()` templated function. Example:
```c++
// a simple sum kernel (for CPU only)
template<typename T>
struct sum_op {
// dispatch handles variable arguments for you
Tensor CPU(const Type & t, Tensor & x_)
{
Tensor x = x_.contiguous();
auto x_p = x.data<T>();
int64_t size = x.numel();
T sum = 0;
for(int64_t i = 0; i < size; i++) {
sum += x_p[i];
}
return sum;
};
Tensor CUDA(Tensor& x) {
throw std::invalid_argument("device not supported");
};
};
Tensor a = CPU(kFloat).rand({3, 7});
std::cout << a << std::endl;
std::cout << dispatch<sum_op>(a.type(),a) << " == " << a.sum() << std::endl;
```
### Efficient access to tensor elements
When using Tensor-wide operations, the relative cost of dynamic dispatch is very small.
However, there are cases, especially in your own kernels, where efficient element-wise access is needed,
and the cost of dynamic dispatch inside the element-wise loop is very high.
ATen provides _accessors_ that are created with a single dynamic check that a Tensor is the type and number of
dimensions. Accessors then expose an API for accessing the Tensor elements efficiently:
```c++
Tensor foo = CPU(kFloat).rand({12,12});
// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.accessor<float,2>();
float trace = 0;
for(int i = 0; i < foo_a.size(0); i++) {
// use the accessor foo_a to get tensor data.
trace += foo_a[i][i];
}
```
Accessors are temporary views of a Tensor. They are only valid for the lifetime of the tensor that they
view and hence should only be used locally in a function, like iterators.
### Using externally created data
If you already have your tensor data allocated in memory (CPU or CUDA),
you can view that memory as a Tensor in ATen:
```c++
float data[] = { 1, 2, 3,
4, 5, 6};
auto f = CPU(kFloat).tensorFromBlob(data, {2,3});
cout << f << endl;
```
These tensors cannot be resized because ATen does not own the memory, but otherwise
behave as normal tensors.
### Scalars and zero-dimensional tensors
In addition to the `Tensor` objects, ATen also includes `Scalar`s that represent a single number.
Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's [number types](doc/Type.h).
Scalars can be implicitly constructed from C++ number types. Scalars are needed because some functions like `addmm` take numbers along with Tensors and expect these
numbers to be the same dynamic type as the tensor. They are also used in the API to indicate places where
a function will _always_ return a Scalar value, like `sum`.
```c++
Tensor addmm(Scalar beta, const Tensor & self,
Scalar alpha, const Tensor & mat1,
const Tensor & mat2);
Scalar sum(const Tensor & self);
//usage
Tensor a = ...
Tensor b = ...
Tensor c = ...
Tensor r = addmm(1.0, a, .5, b, c);
```
In addition to Scalars, ATen also allows Tensor objects to be zero-dimensional. These Tensors hold
a single value and they can be references to a single element in a larger Tensor. They can be used anywhere a Tensor is expected. They are normally created by operators like `select` which reduce the dimensions of
a Tensor.
```c++
Tensor two = CPU(kFloat).rand({10,20});
two[1][2] = 4;
//~~~~~~~ zero-dimensional Tensor
```
It is possible to convert between Scalar and zero-dim Tensors:
```c++
Tensor zero_dim = CPU(kFloat).scalarTensor(4);
Scalar from_tensor = Scalar(zero_dim); //only valid when zero_dim.dim() == 0;
```
### Avoiding unnecessary CUDA synchronization in your kernels when using Scalars
Moving a single number from the GPU to the CPU introduces a synchronization point
that can add latency to your program. In certain cases the result of a GPU operator like `sum` which
returns a Scalar may be plugged into another GPU operator as an argument. If Scalars were always copied
to the CPU, this would result in 2 copies. To avoid these synchronizations, Scalar objects can be
optionally backed by a zero-dim Tensor, and are only copied to the CPU when requested.
```c++
auto a = CUDA(kFloat).rand({3,4});
Scalar on_gpu = Scalar(a[1][1]); //backed by zero-dim Tensor
assert(on_gpu.isBackedByTensor());
double value = on_gpu.toDouble(); // copied to CPU, if it was backed by GPU Tensor.
Scalar svalue = on_gpu.local(); // force the Scalar to become local to CPU.
// get the scalar as a zero-dim tensor. If it was already backed
// by a zero-dim Tensor then this op has no synchronization.
// if the Scalar was local on CPU, it performs the copy
Tensor same_tensor = CUDA(kFloat).scalarTensor(on_gpu);
```
Operators aware of the location of Scalars can arrange to do the minimal number of copies required.
### Developer notes
ATen relies heavily on code generation to automatically generate headers
and implementations for all of the tensor methods it supports. The main
entry point for the script which does all this work is
[`src/ATen/gen.py`](src/ATen/gen.py), which ingests
[`src/ATen/Declarations.cwrap`](src/ATen/Declarations.cwrap),
[`src/ATen/nn.yaml`](src/ATen/nn.yaml),
[`src/ATen/native/native_functions.yaml`](src/ATen/native/native_functions.yaml) and the THNN/THCUNN headers and
produces all of the headers and wrapping code necessary to generate
the ATen interface.
If you need to understand how ATen understands a declaration after all
of this processing occurs, it's helpful to look at the generated file
`Declarations.yaml` (NB: not cwrap) which contains information for all
ATen methods in a uniform manner. This file is utilized by PyTorch
which further extends the ATen interface with support for automatic
differentation.
#### Note [ATen preprocessor philosophy]
ATen is designed to be simple to use, and one of the things this implies is
that it should not be necessary to use preprocessor macros when using ATen;
we would rather provide all symbols, even for functionality that is not
available on the system ATen is running on.
This means that internally inside ATen, whereas other libraries might
simply omit source files for, e.g., CuDNN, when CuDNN libraries are not
installed, ATen will always build these source files, compiling stub
functions for anything that is not available. ATen never uses
`AT_ENABLED_CUDA()` in header files, and all types in ATen's public API
are always available no matter your build configuration.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,23 +1,24 @@
#pragma once
#include "ATen/ATenGeneral.h"
#include "ATen/CPUGeneral.h"
#include "ATen/Allocator.h"
#include "ATen/Scalar.h"
#include "ATen/Type.h"
#include "ATen/Generator.h"
#include "ATen/CPUGeneral.h"
#include "ATen/Context.h"
#include "ATen/Storage.h"
#include "ATen/Tensor.h"
#include "ATen/Device.h"
#include "ATen/TensorGeometry.h"
#include "ATen/Functions.h"
#include "ATen/Formatting.h"
#include "ATen/TensorOperators.h"
#include "ATen/TensorMethods.h"
#include "ATen/Dispatch.h"
#include "ATen/DimVector.h"
#include "ATen/DeviceGuard.h"
#include "ATen/TensorOptions.h"
#include "ATen/Layout.h"
#include "ATen/OptionsGuard.h"
#include "ATen/DimVector.h"
#include "ATen/Dispatch.h"
#include "ATen/Formatting.h"
#include "ATen/Functions.h"
#include "ATen/ScalarOps.h"
#include "ATen/Tensor.h"
#include "ATen/TensorGeometry.h"
#include "ATen/TensorOperators.h"
#include "ATen/Type.h"
#include "ATen/core/ATenGeneral.h"
#include "ATen/core/Generator.h"
#include <c10/core/Layout.h>
#include "ATen/core/Scalar.h"
#include <c10/core/Storage.h>
#include "ATen/core/TensorMethods.h"
#include "ATen/core/TensorOptions.h"
#include <c10/util/Exception.h>

View File

@ -1,11 +0,0 @@
#pragma once
#ifdef _WIN32
# if defined(ATen_cpu_EXPORTS) || defined(caffe2_EXPORTS)
# define AT_API __declspec(dllexport)
# else
# define AT_API __declspec(dllimport)
# endif
#else
# define AT_API
#endif

View File

@ -1,14 +1,17 @@
#pragma once
#include "ATen/Config.h"
#include "ATen/Half.h"
#include "ATen/core/Half.h"
// Defines the accumulation type for a scalar type.
// Example:
// using accscalar_t = acc_type<scalar_t, true>;
#ifdef __CUDACC__
#if defined(__CUDACC__)
#include <cuda.h>
#include <cuda_fp16.h>
#elif defined(__HIPCC__)
#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>
#endif
namespace at {
@ -16,7 +19,7 @@ namespace at {
template <typename T, bool is_cuda>
struct AccumulateType { };
#ifdef __CUDACC__
#if defined(__CUDACC__) || defined(__HIPCC__)
template <> struct AccumulateType<half, true> { using type = float; };
#endif
template <> struct AccumulateType<Half, true> { using type = float; };

View File

@ -1,14 +0,0 @@
#include <ATen/Allocator.h>
namespace at {
static void deleteInefficientStdFunctionContext(void* ptr) {
delete static_cast<InefficientStdFunctionContext*>(ptr);
}
at::DataPtr
InefficientStdFunctionContext::makeDataPtr(void* ptr, const std::function<void(void*)>& deleter, Device device) {
return {ptr, new InefficientStdFunctionContext({ptr, deleter}), &deleteInefficientStdFunctionContext, device};
}
} // namespace at

View File

@ -1,101 +1,2 @@
#pragma once
#include <memory>
#include <stddef.h>
#include <ATen/Error.h>
#include <ATen/Retainable.h>
#include <ATen/Device.h>
#include <ATen/detail/UniqueVoidPtr.h>
namespace at {
// A DataPtr is a unique pointer (with an attached deleter and some
// context for the deleter) to some memory, which also records what
// device is for its data.
//
// nullptr DataPtrs can still have a nontrivial device; this allows
// us to treat zero-size allocations uniformly with non-zero allocations.
//
class DataPtr {
private:
detail::UniqueVoidPtr ptr_;
Device device_;
public:
// Choice of CPU here is arbitrary; if there's an "undefined" device
// we could use that too
DataPtr() : ptr_(), device_(kCPU) {}
DataPtr(void* data, Device device)
: ptr_(data), device_(device) {}
DataPtr(void* data, void* ctx, DeleterFnPtr ctx_deleter, Device device)
: ptr_(data, ctx, ctx_deleter), device_(device) {}
void* operator->() const { return ptr_.get(); }
void* get() const { return ptr_.get(); }
void* get_context() const { return ptr_.get_context(); }
void* release_context() { return ptr_.release_context(); }
operator bool() const { return static_cast<bool>(ptr_); }
template <typename T>
T* cast_context(DeleterFnPtr expected_deleter) const {
return ptr_.cast_context<T>(expected_deleter);
}
Device device() const { return device_; }
};
// NB: Device is NOT tested for here; a CUDA nullptr is as much a nullptr as a
// CPU nullptr
inline bool operator==(const at::DataPtr& dp, std::nullptr_t) noexcept { return !dp; }
inline bool operator==(std::nullptr_t, const at::DataPtr& dp) noexcept { return !dp; }
inline bool operator!=(const at::DataPtr& dp, std::nullptr_t) noexcept { return dp; }
inline bool operator!=(std::nullptr_t, const at::DataPtr& dp) noexcept { return dp; }
// Note [raw_allocate/raw_deallocate and Thrust]
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// Thrust's support for custom allocators requires us to write something
// like this:
//
// class ThrustAllocator {
// char* allocate(size_t);
// void deallocate(char*, size_t);
// };
//
// This is not good for our unique_ptr based allocator interface, as
// there is no way to get to the context when we free.
//
// However, in some cases the context is exactly the same as
// the data pointer. In this case, we can support the "raw"
// allocate and deallocate interface. This is what
// raw_deleter signifies. By default, it returns a nullptr, which means that
// the raw interface is not implemented. Be sure to implement it whenever
// possible, or the raw interface will incorrectly reported as unsupported,
// when it is actually possible.
struct Allocator {
virtual ~Allocator() {}
virtual at::DataPtr allocate(size_t n) const = 0;
// If this returns a non nullptr, it means that allocate()
// is guaranteed to return a unique_ptr with this deleter attached;
// it means the rawAllocate and rawDeallocate APIs are safe to use.
// This function MUST always return the same BoundDeleter.
virtual DeleterFnPtr raw_deleter() const { return nullptr; }
void* raw_allocate(size_t n) {
auto dptr = allocate(n);
AT_ASSERT(dptr.get() == dptr.get_context());
return dptr.release_context();
}
void raw_deallocate(void* ptr) {
auto d = raw_deleter();
AT_ASSERT(d);
d(ptr);
}
};
struct AT_API InefficientStdFunctionContext {
std::unique_ptr<void, std::function<void(void*)>> ptr_;
InefficientStdFunctionContext(std::unique_ptr<void, std::function<void(void*)>>&& ptr)
: ptr_(std::move(ptr)) {}
static at::DataPtr makeDataPtr(void* ptr, const std::function<void(void*)>& deleter, Device device);
};
} // namespace at
#include <c10/core/Allocator.h>

View File

@ -1,192 +1,2 @@
//===--- ArrayRef.h - Array Reference Wrapper -------------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
// ATen: modified from llvm::ArrayRef.
// removed llvm-specific functionality
// removed some implicit const -> non-const conversions that rely on
// complicated std::enable_if meta-programming
// removed a bunch of slice variants for simplicity...
#pragma once
#include <ATen/Error.h>
#include <ATen/SmallVector.h>
#include <array>
#include <iterator>
#include <vector>
namespace at {
/// ArrayRef - Represent a constant reference to an array (0 or more elements
/// consecutively in memory), i.e. a start pointer and a length. It allows
/// various APIs to take consecutive elements easily and conveniently.
///
/// This class does not own the underlying data, it is expected to be used in
/// situations where the data resides in some other buffer, whose lifetime
/// extends past that of the ArrayRef. For this reason, it is not in general
/// safe to store an ArrayRef.
///
/// This is intended to be trivially copyable, so it should be passed by
/// value.
template<typename T>
class ArrayRef {
public:
typedef const T *iterator;
typedef const T *const_iterator;
typedef size_t size_type;
typedef std::reverse_iterator<iterator> reverse_iterator;
private:
/// The start of the array, in an external buffer.
const T *Data;
/// The number of elements.
size_type Length;
public:
/// @name Constructors
/// @{
/// Construct an empty ArrayRef.
/*implicit*/ ArrayRef() : Data(nullptr), Length(0) {}
/// Construct an ArrayRef from a single element.
/*implicit*/ ArrayRef(const T &OneElt)
: Data(&OneElt), Length(1) {}
/// Construct an ArrayRef from a pointer and length.
/*implicit*/ ArrayRef(const T *data, size_t length)
: Data(data), Length(length) {}
/// Construct an ArrayRef from a range.
ArrayRef(const T *begin, const T *end)
: Data(begin), Length(end - begin) {}
/// Construct an ArrayRef from a SmallVector. This is templated in order to
/// avoid instantiating SmallVectorTemplateCommon<T> whenever we
/// copy-construct an ArrayRef.
template<typename U>
/*implicit*/ ArrayRef(const SmallVectorTemplateCommon<T, U> &Vec)
: Data(Vec.data()), Length(Vec.size()) {
}
/// Construct an ArrayRef from a std::vector.
template<typename A>
/*implicit*/ ArrayRef(const std::vector<T, A> &Vec)
: Data(Vec.data()), Length(Vec.size()) {}
/// Construct an ArrayRef from a std::array
template <size_t N>
/*implicit*/ constexpr ArrayRef(const std::array<T, N> &Arr)
: Data(Arr.data()), Length(N) {}
/// Construct an ArrayRef from a C array.
template <size_t N>
/*implicit*/ constexpr ArrayRef(const T (&Arr)[N]) : Data(Arr), Length(N) {}
/// Construct an ArrayRef from a std::initializer_list.
/*implicit*/ ArrayRef(const std::initializer_list<T> &Vec)
: Data(Vec.begin() == Vec.end() ? (T*)nullptr : Vec.begin()),
Length(Vec.size()) {}
/// @}
/// @name Simple Operations
/// @{
const_iterator begin() const { return Data; }
const_iterator end() const { return Data + Length; }
reverse_iterator rbegin() const { return reverse_iterator(end()); }
reverse_iterator rend() const { return reverse_iterator(begin()); }
/// empty - Check if the array is empty.
bool empty() const { return Length == 0; }
const T *data() const { return Data; }
/// size - Get the array size.
size_t size() const { return Length; }
/// front - Get the first element.
const T &front() const {
AT_CHECK(!empty(), "ArrayRef: attempted to access front() of empty list");
return Data[0];
}
/// back - Get the last element.
const T &back() const {
AT_CHECK(!empty(), "ArrayRef: attempted to access back() of empty list");
return Data[Length-1];
}
/// equals - Check for element-wise equality.
bool equals(ArrayRef RHS) const {
if (Length != RHS.Length)
return false;
return std::equal(begin(), end(), RHS.begin());
}
/// slice(n, m) - Chop off the first N elements of the array, and keep M
/// elements in the array.
ArrayRef<T> slice(size_t N, size_t M) const {
AT_CHECK(N+M <= size(), "ArrayRef: invalid slice, ", N, " + ", M, " is not <= ", size());
return ArrayRef<T>(data()+N, M);
}
/// slice(n) - Chop off the first N elements of the array.
ArrayRef<T> slice(size_t N) const { return slice(N, size() - N); }
/// @}
/// @name Operator Overloads
/// @{
const T &operator[](size_t Index) const {
return Data[Index];
}
/// Vector compatibility
const T &at(size_t Index) const {
AT_CHECK(Index < Length, "ArrayRef: invalid index ", Index, " for length ", Length);
return Data[Index];
}
/// Disallow accidental assignment from a temporary.
///
/// The declaration here is extra complicated so that "arrayRef = {}"
/// continues to select the move assignment operator.
template <typename U>
typename std::enable_if<std::is_same<U, T>::value, ArrayRef<T>>::type &
operator=(U &&Temporary) = delete;
/// Disallow accidental assignment from a temporary.
///
/// The declaration here is extra complicated so that "arrayRef = {}"
/// continues to select the move assignment operator.
template <typename U>
typename std::enable_if<std::is_same<U, T>::value, ArrayRef<T>>::type &
operator=(std::initializer_list<U>) = delete;
/// @}
/// @name Expensive Operations
/// @{
std::vector<T> vec() const {
return std::vector<T>(Data, Data+Length);
}
/// @}
/// @name Conversion operators
/// @{
operator std::vector<T>() const {
return std::vector<T>(Data, Data+Length);
}
/// @}
};
} // end namespace at
#include <c10/util/ArrayRef.h>

2
aten/src/ATen/Backend.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <c10/core/Backend.h>

View File

@ -1,26 +1,2 @@
#pragma once
#include <cstddef>
#include <string>
#include <typeinfo>
namespace at {
/// Utility to demangle a C++ symbol name.
std::string demangle(const char* name);
/// Returns the printable name of the type.
template <typename T>
inline const char* demangle_type() {
#ifdef __GXX_RTTI
static const std::string name = demangle(typeid(T).name());
return name.c_str();
#else // __GXX_RTTI
return "(RTTI disabled, cannot show name)";
#endif // __GXX_RTTI
}
std::string get_backtrace(
size_t frames_to_skip = 0,
size_t maximum_number_of_frames = 64,
bool skip_python_frames = true);
} // namespace at
#include <ATen/core/Backtrace.h>

View File

@ -1,11 +1,6 @@
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
SET(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake ${CMAKE_MODULE_PATH})
if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
# ---[ Generate and install header and cpp files
include(../../../cmake/Codegen.cmake)
endif()
IF(NOT MSVC)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ignored-qualifiers")
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-ignored-qualifiers")
@ -13,23 +8,6 @@ IF(NOT MSVC)
SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-absolute-value")
ENDIF(NOT MSVC)
################################################################################
# Helper functions
################################################################################
function(filter_list output input)
unset(result)
foreach(filename ${${input}})
foreach(pattern ${ARGN})
if("${filename}" MATCHES "${pattern}")
list(APPEND result "${filename}")
endif()
endforeach()
endforeach()
set(${output} ${result} PARENT_SCOPE)
endfunction()
# Can be compiled standalone
IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DIR OR NOT AT_INSTALL_SHARE_DIR)
SET(AT_INSTALL_BIN_DIR "bin" CACHE PATH "AT install binary subdirectory")
@ -42,13 +20,16 @@ CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")
CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")
# NB: If you edit these globs, you'll have to update setup.py package_data as well
FILE(GLOB base_h "*.h" "detail/*.h")
FILE(GLOB base_cpp "*.cpp" "detail/*.cpp")
FILE(GLOB base_h "*.h" "detail/*.h" "cpu/*.h")
FILE(GLOB base_cpp "*.cpp" "detail/*.cpp" "cpu/*.cpp")
add_subdirectory(core)
FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")
FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")
FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")
FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")
FILE(GLOB cudnn_cpp "cudnn/*.cpp")
FILE(GLOB miopen_h "miopen/*.h")
FILE(GLOB miopen_cpp "miopen/*.cpp")
FILE(GLOB mkl_cpp "mkl/*.cpp")
FILE(GLOB mkldnn_cpp "mkldnn/*.cpp")
@ -57,12 +38,13 @@ FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")
FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")
FILE(GLOB native_cuda_cu "native/cuda/*.cu")
FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")
FILE(GLOB native_mkl_cpp "native/mkl/*.cpp")
FILE(GLOB native_mkldnn_cpp "native/mkldnn/*.cpp")
set(all_cpu_cpp ${base_cpp} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})
set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})
if(AT_MKL_ENABLED)
set(all_cpu_cpp ${all_cpu_cpp} ${mkl_cpp})
endif()
@ -73,9 +55,14 @@ endif()
IF(USE_CUDA OR USE_ROCM)
list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})
set(all_cuda_cpp ${native_cudnn_cpp} ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
IF(USE_CUDA)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
ENDIF()
ELSEIF(USE_ROCM)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${miopen_cpp} ${all_cuda_cpp})
ENDIF()
endif()
@ -161,7 +148,7 @@ endif(MKLDNN_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS cpuinfo)
if(NOT MSVC)
if(NOT MSVC AND NOT EMSCRIPTEN)
# Preserve values for the main build
set(__aten_sleef_build_shared_libs ${BUILD_SHARED_LIBS})
set(__aten_sleef_build_tests ${BUILD_TESTS})
@ -171,6 +158,16 @@ if(NOT MSVC)
set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
set(CMAKE_CXX_FLAGS)
# Bump up optimization level for sleef to -O1, since at -O0 the compiler
# excessively spills intermediate vector registers to the stack
# and makes things run impossibly slowly
set(OLD_CMAKE_C_FLAGS_DEBUG ${CMAKE_C_FLAGS_DEBUG})
IF(${CMAKE_C_FLAGS_DEBUG} MATCHES "-O0")
string(REGEX REPLACE "-O0" "-O1" CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})
ELSE()
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -O1")
ENDIF()
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef static" FORCE)
set(BUILD_DFT OFF CACHE BOOL "Don't build sleef DFT lib" FORCE)
set(BUILD_GNUABI_LIBS OFF CACHE BOOL "Don't build sleef gnuabi libs" FORCE)
@ -181,6 +178,7 @@ if(NOT MSVC)
link_directories(${CMAKE_BINARY_DIR}/sleef/lib)
list(APPEND ATen_CPU_DEPENDENCY_LIBS sleef)
set(CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})
set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})
# Set these back. TODO: Use SLEEF_ to pass these instead
@ -208,6 +206,12 @@ IF(USE_CUDA AND NOT USE_ROCM)
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60
--generate-code arch=compute_70,code=sm_70)
elseif(${CUDA_VERSION_MAJOR} EQUAL "10")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60
--generate-code arch=compute_70,code=sm_70)
else()
MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")
endif()
@ -250,15 +254,16 @@ IF(USE_CUDA AND NOT USE_ROCM)
ENDIF(USE_MAGMA)
IF ($ENV{ATEN_STATIC_CUDA})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a")
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart_static.a")
ENDIF($ENV{ATEN_STATIC_CUDA})
ENDIF()
IF(USE_ROCM)
### Link in the ROCm libraries BLAS / RNG.
FIND_LIBRARY(HIPBLAS_LIBRARY hipblas HINTS ${HIPBLAS_PATH}/lib)
FIND_LIBRARY(HIPRNG_LIBRARY hcrng HINTS ${HIPRNG_PATH}/lib)
FIND_LIBRARY(ROCBLAS_LIBRARY rocblas HINTS ${ROCBLAS_PATH}/lib)
FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${HIPBLAS_LIBRARY} ${HIPRNG_LIBRARY})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})
ENDIF()
# Include CPU paths for CUDA as well
@ -286,7 +291,7 @@ else()
target_link_libraries(ATen_cpu PRIVATE ATEN_CPU_FILES_GEN_LIB)
caffe2_interface_library(ATen_cpu ATen_cpu_library)
# Set standard properties on the target
aten_set_target_props(ATen_cpu)
torch_set_target_props(ATen_cpu)
# Make sure these don't get built by parent
set(ATen_CPU_SRCS)
@ -327,7 +332,7 @@ if(USE_CUDA OR USE_ROCM)
ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})
# Set standard properties on the target
aten_set_target_props(ATen_cuda)
torch_set_target_props(ATen_cuda)
caffe2_interface_library(ATen_cuda ATen_cuda_library)
@ -345,9 +350,9 @@ if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")
endif()
if(NOT MSVC)
aten_compile_options(ATen_cpu)
torch_compile_options(ATen_cpu)
if(USE_CUDA OR USE_ROCM)
aten_compile_options(ATen_cuda)
torch_compile_options(ATen_cuda)
endif()
endif()
@ -359,41 +364,13 @@ if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")
endif()
endif()
if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
# Eventually replace this use of LOCATION with use of
# $<TARGET_FILE:ATen_cpu>, but generators only work in some cases
cmake_policy(SET CMP0026 OLD)
get_target_property(ATEN_CPU_OUTPUT_NAME ATen_cpu LOCATION)
get_filename_component(ATEN_CPU_OUTPUT_NAME ${ATEN_CPU_OUTPUT_NAME} NAME)
set(ATEN_LIBRARIES
"${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_LIB_DIR}/${ATEN_CPU_OUTPUT_NAME}")
if(USE_CUDA OR USE_ROCM)
get_target_property(ATEN_CUDA_OUTPUT_NAME ATen_cuda LOCATION)
get_filename_component(ATEN_CUDA_OUTPUT_NAME ${ATEN_CUDA_OUTPUT_NAME} NAME)
list(APPEND ATEN_LIBRARIES
"${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_LIB_DIR}/${ATEN_CUDA_OUTPUT_NAME}")
endif()
install(TARGETS ATen_cpu
RUNTIME DESTINATION "${AT_INSTALL_BIN_DIR}"
LIBRARY DESTINATION "${AT_INSTALL_LIB_DIR}"
ARCHIVE DESTINATION "${AT_INSTALL_LIB_DIR}")
if(USE_CUDA OR USE_ROCM)
install(TARGETS ATen_cuda
RUNTIME DESTINATION "${AT_INSTALL_BIN_DIR}"
LIBRARY DESTINATION "${AT_INSTALL_LIB_DIR}"
ARCHIVE DESTINATION "${AT_INSTALL_LIB_DIR}")
endif()
endif()
SET(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")
CONFIGURE_FILE(ATenConfig.cmake.in "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake")
INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"
DESTINATION "${AT_INSTALL_SHARE_DIR}/cmake/ATen")
# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake
FOREACH(HEADER ${base_h} ${cuda_h} ${cudnn_h})
FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h})
string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" HEADER_SUB ${HEADER})
GET_FILENAME_COMPONENT(DIR ${HEADER_SUB} DIRECTORY)
INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen/${DIR})
@ -411,39 +388,8 @@ else()
add_subdirectory(test)
endif()
if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)
foreach(test_src ${ATen_CPU_TEST_SRCS})
get_filename_component(test_name ${test_src} NAME_WE)
add_executable(${test_name} "${test_src}")
target_include_directories(
${test_name} PRIVATE $<INSTALL_INTERFACE:include>)
target_include_directories(${test_name} PRIVATE ${ATen_CPU_INCLUDE})
target_include_directories(${test_name} SYSTEM PRIVATE ${ATen_THIRD_PARTY_INCLUDE})
target_link_libraries(${test_name} ATen_cpu)
add_test(NAME ${test_name} COMMAND $<TARGET_FILE:${test_name}>)
install(TARGETS ${test_name} DESTINATION test)
endforeach()
if(USE_CUDA OR USE_ROCM)
foreach(test_src ${ATen_CUDA_TEST_SRCS})
get_filename_component(test_name ${test_src} NAME_WE)
torch_cuda_based_add_executable(${test_name} "${test_src}")
target_include_directories(
${test_name} PRIVATE $<INSTALL_INTERFACE:include>)
target_include_directories(${test_name} PRIVATE ${ATen_CPU_INCLUDE})
target_include_directories(${test_name} SYSTEM PRIVATE ${ATen_THIRD_PARTY_INCLUDE})
target_link_libraries(${test_name} -Wl,--no-as-needed ATen_cpu ATen_cuda)
add_test(NAME ${test_name} COMMAND $<TARGET_FILE:${test_name}>)
install(TARGETS ${test_name} DESTINATION test)
endforeach()
endif()
# Make sure these don't get built by parent
set(ATen_CPU_TEST_SRCS)
set(ATen_CUDA_TEST_SRCS)
endif()
# Pass source, includes, and libs to parent
set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

View File

@ -3,9 +3,99 @@
#include "ATen/Parallel.h"
#include "ATen/TensorUtils.h"
#include <limits>
#include <utility>
#include <cstring>
namespace at {
/*
[collapse dims] Updates sizes, and strides to reflect a "collapse" of
the info, possibly excluding the optional excludeDim. A "collapsed" version
of the info is the fewest dims that order the tensor's elements in the same
way as the original info. If excludeDim is specified, the collapse is the
fewest dims that order the tensor's elements as the original and preserve the
excluded dimension, unless the tensor collapses to a point.
This function returns a pair of values.
1) The (new) index of the preserved dimension if excludeDim is
specified. 0 if the tensor is collapsed to a point. -1
otherwise.
2) The new number of dimensions.
*/
template <typename T>
inline std::pair<int64_t, int64_t> collapse_dims(
T* sizes,
T* strides,
int64_t dims,
const int excludeDim = -1) {
AT_CHECK(
excludeDim >= -1 && excludeDim < dims,
"expected excluded dim between -1 and dims - 1");
int64_t stopDim = (excludeDim == -1) ? dims : excludeDim;
int64_t newIndex = -1;
int64_t oldIndex = 0;
int64_t remappedExcludedDim = -1;
while (oldIndex < dims) {
// Finds a dimension to collapse into
for (; oldIndex < stopDim; ++oldIndex) {
if (sizes[oldIndex] == 1) {
continue;
}
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
++oldIndex;
break;
}
// Collapses dims
for (; oldIndex < stopDim; ++oldIndex) {
if (sizes[oldIndex] == 1) {
continue;
}
if (strides[newIndex] == sizes[oldIndex] * strides[oldIndex]) {
sizes[newIndex] *= sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
} else {
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
}
}
// Handles excludeDim being set (oldIndex == excludeDim)
if (oldIndex != dims) {
// Preserves excluded dimension
++newIndex;
sizes[newIndex] = sizes[oldIndex];
strides[newIndex] = strides[oldIndex];
remappedExcludedDim = newIndex;
// Restarts iteration after excludeDim
++oldIndex;
stopDim = dims;
}
}
// Handles special case of all dims size 1
if (newIndex == -1 || (newIndex == 0 && sizes[0] == 1)) {
dims = 1;
sizes[0] = 1;
strides[0] = 1;
return std::pair<int64_t, int64_t>(0, 1);
}
dims = newIndex + 1;
return std::pair<int64_t, int64_t>(remappedExcludedDim, dims);
}
/*
* The basic strategy for apply is as follows:
*
@ -50,27 +140,6 @@ inline Tensor sort_strides(Tensor& tensor_) {
return tensor;
}
template <typename Arg>
inline void _setup_arrays(Tensor& tensor, Arg* iter) {
int64_t max_dim = tensor.ndimension();
iter->dim_ = 0;
for (int64_t i = 0; i < max_dim; i++) {
int64_t size = tensor.size(i);
int64_t stride = tensor.stride(i);
while (i + 1 < max_dim &&
(tensor.size(i + 1) == 1 ||
tensor.stride(i) == tensor.size(i + 1) * tensor.stride(i + 1))) {
size = size * tensor.size(i + 1);
if (tensor.size(i + 1) != 1)
stride = tensor.stride(i + 1);
i++;
}
iter->sizes_[iter->dim_] = size;
iter->strides_[iter->dim_] = stride;
iter->dim_++;
}
}
template <typename T, int N>
struct strided_tensor_iter_fixed {
public:
@ -86,8 +155,16 @@ struct strided_tensor_iter_fixed {
strided_tensor_iter_fixed(strided_tensor_iter_fixed&&) = default;
strided_tensor_iter_fixed(Tensor& tensor, bool sort_strides = false)
: data_(tensor.data<T>()) {
memset(counter_, 0, sizeof(int64_t) * N);
_setup_arrays(tensor, this);
std::memset(counter_, 0, sizeof(int64_t) * N);
if (tensor.dim() > 0) {
std::memcpy(
sizes_, tensor.sizes().data(), tensor.dim() * sizeof(int64_t));
std::memcpy(
strides_,
tensor.strides().data(),
tensor.dim() * sizeof(int64_t));
}
dim_ = std::get<1>(collapse_dims(sizes_, strides_, tensor.ndimension()));
}
};
@ -109,9 +186,9 @@ struct strided_tensor_iter {
: data_(tensor.data<T>()),
dim_(tensor.ndimension()),
counter_(dim_, 0),
sizes_(tensor.sizes()),
strides_(tensor.strides()) {
_setup_arrays(tensor, this);
sizes_(tensor.sizes().vec()),
strides_(tensor.strides().vec()) {
dim_ = std::get<1>(collapse_dims(sizes_.data(), strides_.data(), dim_));
}
};
@ -132,7 +209,7 @@ inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {
for (size_t i = 0; i < tensors.size() - 1; i++) {
oss << tensors[i].sizes() << ", ";
}
oss << "and " << tensors[tensors.size() - 1]
oss << "and " << tensors[tensors.size() - 1].sizes()
<< " to have the same number of elements, but got ";
for (size_t i = 0; i < tensors.size() - 1; i++) {
oss << tensors[i].numel() << ", ";
@ -145,7 +222,7 @@ inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {
inline bool _apply_preamble(ArrayRef<Tensor> tensors) {
checkBackend("CPU_tensor_apply", tensors, Backend::CPU);
if (!_all_equal_numel(tensors))
throw std::runtime_error(_all_equal_numel_error(tensors));
AT_ERROR(_all_equal_numel_error(tensors));
// An empty tensor has no elements
for (auto& t : tensors)
if (t.numel() == 0)

View File

@ -1,7 +1,7 @@
#pragma once
#include "TH/TH.h"
#include "ATen/Error.h"
#include "c10/util/Exception.h"
// This file creates a fake allocator that just throws exceptions if
// it is actually used.

View File

@ -1,12 +1,12 @@
#pragma once
// Using AT_API is crucial as otherwise you'll see
// Using CAFFE2_API is crucial as otherwise you'll see
// linking errors using MSVC
// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx
// This header adds this if using AT_API
#include "ATen/ATenGeneral.h"
// This header adds this if using CAFFE2_API
#include "ATen/core/ATenGeneral.h"
namespace at {
AT_API void set_num_threads(int);
AT_API int get_num_threads();
CAFFE2_API void set_num_threads(int);
CAFFE2_API int get_num_threads();
}

View File

@ -0,0 +1,20 @@
#include <ATen/CPUTypeDefault.h>
#include <ATen/Context.h>
#include <ATen/CPUGenerator.h>
namespace at {
Allocator* CPUTypeDefault::allocator() const {
return getCPUAllocator();
}
Device CPUTypeDefault::getDeviceFromPtr(void * data) const {
return DeviceType::CPU;
}
std::unique_ptr<Generator> CPUTypeDefault::generator() const {
return std::unique_ptr<Generator>(new CPUGenerator(&at::globalContext()));
}
} // namespace at

View File

@ -0,0 +1,14 @@
#pragma once
#include <ATen/TypeDefault.h>
namespace at {
struct CAFFE2_API CPUTypeDefault : public TypeDefault {
CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)
: TypeDefault(type_id, is_variable, is_undefined) {}
Allocator* allocator() const override;
Device getDeviceFromPtr(void * data) const override;
std::unique_ptr<Generator> generator() const override;
};
} // namespace at

View File

@ -1,183 +0,0 @@
#include "ATen/CUDAStream.h"
#include "ATen/Error.h"
#include "ATen/detail/CUDAHooksInterface.h"
#include <mutex>
// Internal implementation is entirely hidden
struct CUDAStreamInternals {
bool is_destructible;
std::atomic<int> refcount;
int64_t device; // Note: cudaGetDevice works with int32_t, not int64_t
cudaStream_t stream;
};
namespace at {
namespace detail {
/*
* Stream state
*/
static constexpr cudaStream_t DEFAULT_STREAM = 0;
static std::once_flag init_flag;
static int64_t num_gpus;
static CUDAStreamInternals* default_streams;
static thread_local CUDAStreamInternals** current_streams = nullptr;
// Creates a(n indestructible) default stream for each device
// Note: the default stream on each device is signified by a zero
// value for the pointer, and so is not actually created as usual.
// In particular, we don't need to switch devices when creating the
// streams.
static void initDefaultCUDAStreams() {
num_gpus = getCUDAHooks().getNumGPUs();
default_streams = (CUDAStreamInternals*) malloc(num_gpus * sizeof(CUDAStreamInternals));
for (auto i = decltype(num_gpus){0}; i < num_gpus; ++i) {
default_streams[i].is_destructible = false;
default_streams[i].refcount = 0;
default_streams[i].device = i;
default_streams[i].stream = DEFAULT_STREAM;
}
}
// Init front-end to ensure initialization only occurs once
static void initCUDAStreamsOnce() {
// Inits default streams (once, globally)
std::call_once(init_flag, initDefaultCUDAStreams);
// Inits current streams (thread local) to default streams
if (current_streams) return;
current_streams = (CUDAStreamInternals**) malloc(num_gpus * sizeof(CUDAStreamInternals*));
for (auto i = decltype(num_gpus){0}; i < num_gpus; ++i) {
current_streams[i] = &default_streams[i];
}
}
/*
* Pointer-based stream API
*/
// Helper to return the current device
static inline int64_t current_device() {
int cur_device;
DynamicCUDAInterface::get_device(&cur_device);
return cur_device;
}
// Helper to verify the GPU index is valid
static inline void check_gpu(int64_t device) {
AT_CHECK(device >= 0 && device < num_gpus);
}
CUDAStreamInternals* CUDAStream_getDefaultStreamOnDevice(int64_t device) {
initCUDAStreamsOnce();
check_gpu(device);
return &default_streams[device];
}
CUDAStreamInternals* CUDAStream_getDefaultStream() {
return CUDAStream_getDefaultStreamOnDevice(current_device());
}
// Creates (and retains) and new cuda stream
CUDAStreamInternals* CUDAStream_createAndRetainWithOptions(int32_t flags, int32_t priority) {
CUDAStreamInternals* internals = (CUDAStreamInternals*) malloc(sizeof(CUDAStreamInternals));
internals->is_destructible = true;
internals->refcount = 1;
internals->device = current_device();
DynamicCUDAInterface::cuda_stream_create_with_priority(&internals->stream, flags, priority);
return internals;
}
// Note: despite not being "unsafe," is using these methods in a multithreaded
// environment then the caller must be sure that streams are valid
// when they're requested. These methods will throw an error if an
// invalid stream is requested.
CUDAStreamInternals* CUDAStream_getAndRetainCurrentStreamOnDevice(int64_t device) {
initCUDAStreamsOnce();
check_gpu(device);
auto cur = current_streams[device];
AT_CHECK(CUDAStream_retain(cur));
return cur;
}
CUDAStreamInternals* CUDAStream_getAndRetainCurrentStream() {
return CUDAStream_getAndRetainCurrentStreamOnDevice(current_device());
}
// Note: these unsafe methods do not retain the stream before returning it.
// This is unsafe behavior and these methods SHOULD NOT BE USED.
// They are here only for legacy compatibility.
CUDAStreamInternals* CUDAStream_getCurrentStreamOnDeviceUnsafe(int64_t device) {
initCUDAStreamsOnce();
check_gpu(device);
return current_streams[device];
}
CUDAStreamInternals* CUDAStream_getCurrentStreamUnsafe() {
return CUDAStream_getCurrentStreamOnDeviceUnsafe(current_device());
}
void CUDAStream_setStreamOnDevice(int64_t device, CUDAStreamInternals* ptr) {
initCUDAStreamsOnce();
check_gpu(device);
AT_CHECK(ptr);
AT_CHECK(ptr->device == device);
AT_CHECK(CUDAStream_retain(ptr));
CUDAStream_free(current_streams[device]);
current_streams[device] = ptr;
}
void CUDAStream_setStream(CUDAStreamInternals* ptr) {
CUDAStream_setStreamOnDevice(current_device(), ptr);
}
// Getters
cudaStream_t CUDAStream_stream(CUDAStreamInternals* ptr) {
AT_CHECK(ptr);
return ptr->stream;
}
int64_t CUDAStream_device(CUDAStreamInternals* ptr) {
AT_CHECK(ptr);
return ptr->device;
}
// Memory management
// Note: only destructible (non-default) streams are ref counted
bool CUDAStream_retain(CUDAStreamInternals* ptr) {
AT_CHECK(ptr);
if (ptr->is_destructible) return(++ptr->refcount > 1);
return true;
}
void CUDAStream_free(CUDAStreamInternals*& ptr) {
if (ptr && ptr->stream && ptr->is_destructible && --ptr->refcount <= 0) {
AT_CHECK(ptr->refcount == 0);
DynamicCUDAInterface::cuda_stream_destroy(ptr->stream);
free(ptr);
ptr = nullptr;
}
}
} // namespace detail
/*
* CUDAStream functions
*/
// Copy constructor
CUDAStream::CUDAStream(const CUDAStream& other) {
AT_CHECK(other.internals_);
AT_CHECK(detail::CUDAStream_retain(other.internals_));
internals_ = other.internals_;
}
// Move constructor
CUDAStream::CUDAStream(CUDAStream&& other) {
AT_CHECK(other.internals_);
std::swap(internals_, other.internals_);
}
} // namespace at

View File

@ -1,95 +0,0 @@
#pragma once
#include <cstdint>
#include <utility>
/*
* A CUDA stream interface with no CUDA build dependency.
*
* Includes the CUDAStream RAII class and a pointer-based stream API.
*
* The ATen Context interface should be preferred when working with streams.
*/
// Forward-declares cudaStream_t to avoid depending on CUDA in CPU builds
// Note: this is the internal CUDA runtime typedef for cudaStream_t
struct CUstream_st;
typedef struct CUstream_st* cudaStream_t;
// Forward-declares internals
struct CUDAStreamInternals;
namespace at {
namespace detail {
// Pointer-based API (for internal use)
// Note: ATen/Context is preferred to work with streams safely
CUDAStreamInternals* CUDAStream_getDefaultStreamOnDevice(int64_t device);
CUDAStreamInternals* CUDAStream_getDefaultStream();
CUDAStreamInternals* CUDAStream_createAndRetainWithOptions(int32_t flags, int32_t priority);
CUDAStreamInternals* CUDAStream_getAndRetainCurrentStreamOnDevice(int64_t device);
CUDAStreamInternals* CUDAStream_getAndRetainCurrentStream();
// Note: these Unsafe gets should NEVER be used and are only here for legacy
// purposes. Once those uses are gone they should be removed.
CUDAStreamInternals* CUDAStream_getCurrentStreamOnDeviceUnsafe(int64_t device);
CUDAStreamInternals* CUDAStream_getCurrentStreamUnsafe();
void CUDAStream_setStreamOnDevice(int64_t device, CUDAStreamInternals* internals);
void CUDAStream_setStream(CUDAStreamInternals* internals);
cudaStream_t CUDAStream_stream(CUDAStreamInternals*);
int64_t CUDAStream_device(CUDAStreamInternals*);
bool CUDAStream_retain(CUDAStreamInternals*);
void CUDAStream_free(CUDAStreamInternals*&);
} // namespace detail
// RAII for a CUDA stream
// Allows use as a cudaStream_t, copying, moving, and metadata access.
struct CUDAStream {
// Constants
static constexpr int32_t DEFAULT_FLAGS = 1; // = cudaStreamNonBlocking;
static constexpr int32_t DEFAULT_PRIORITY = 0;
// Constructors
CUDAStream() = default;
CUDAStream(CUDAStreamInternals* internals) : internals_{internals} { }
// Destructor
~CUDAStream() { detail::CUDAStream_free(internals_); }
// Copy constructor
CUDAStream(const CUDAStream& other);
// Move constructor
CUDAStream(CUDAStream&& other);
// Assignment operator
CUDAStream& operator=(CUDAStream other) {
std::swap(internals_, other.internals_);
return *this;
}
// Implicit conversion to cudaStream_t
operator cudaStream_t() const { return detail::CUDAStream_stream(internals_); }
// Less than operator (to allow use in sets)
friend bool operator<(const CUDAStream& left, const CUDAStream& right) {
return left.internals_ < right.internals_;
}
// Getters
int64_t device() const { return detail::CUDAStream_device(internals_); }
cudaStream_t stream() const { return detail::CUDAStream_stream(internals_); }
CUDAStreamInternals* internals() const { return internals_; }
private:
CUDAStreamInternals* internals_ = nullptr;
};
} // namespace at

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/Error.h"
#include "ATen/Generator.h"
#include "ATen/Utils.h"
#include "ATen/core/Generator.h"
#include "c10/util/Exception.h"
namespace at {

View File

@ -8,3 +8,4 @@
#define AT_MKLDNN_ENABLED() @AT_MKLDNN_ENABLED@
#define AT_MKL_ENABLED() @AT_MKL_ENABLED@
#define CAFFE2_STATIC_LINK_CUDA() @CAFFE2_STATIC_LINK_CUDA@

View File

@ -2,6 +2,8 @@
#include "Context.h"
#include <ATen/core/TensorOptions.h>
#include <thread>
#include <mutex>
#include <sstream>
@ -9,10 +11,11 @@
#include <stdexcept>
#include "ATen/CPUGenerator.h"
#include "ATen/RegisterCPU.h"
#include "ATen/Tensor.h"
#include <ATen/cpu/FlushDenormal.h>
#ifdef USE_SSE3
#include <pmmintrin.h>
#endif
#include "TH/TH.h" // for USE_LAPACK
namespace at {
@ -27,16 +30,20 @@ static inline void argErrorHandler(int arg, const char * msg, void * data) {
Context::Context()
: next_id(static_cast<size_t>(TypeID::NumOptions))
, thc_state(nullptr, [](THCState* p){ /* no-op */ } ) {
, thc_state(nullptr, [](THCState* p){ /* no-op */ } )
, thh_state(nullptr, [](THHState* p){ /* no-op */ } )
{
THSetDefaultErrorHandler(errorHandler,nullptr);
THSetDefaultArgErrorHandler(argErrorHandler,nullptr);
generator_registry[static_cast<int>(Backend::CPU)]
generator_registry[static_cast<int>(DeviceType::CPU)]
.reset(new CPUGenerator(this));
Type::registerCPU(this);
register_cpu_types(this);
}
// TODO: This could be bad juju if someone calls globalContext() in the
// destructor of an object with static lifetime.
Context & globalContext() {
static Context globalContext_;
return globalContext_;
@ -77,19 +84,63 @@ bool Context::hasMKL() const {
#endif
}
bool Context::setFlushDenormal(bool on) {
#ifdef USE_SSE3
// Setting flush-to-zero (FTZ) flag
_MM_SET_FLUSH_ZERO_MODE(on ? _MM_FLUSH_ZERO_ON
: _MM_FLUSH_ZERO_OFF);
// Setting denormals-are-zero (DAZ) flag
_MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON
: _MM_DENORMALS_ZERO_OFF);
bool Context::hasLAPACK() const {
#ifdef USE_LAPACK
return true;
#else
return false;
#endif
}
bool Context::setFlushDenormal(bool on) {
return at::cpu::set_flush_denormal(on);
}
TypeExtendedInterface& getType(TensorOptions options) {
return globalContext().getType(
options.backend(), typeMetaToScalarType(options.dtype()), options.is_variable());
}
TypeExtendedInterface& getType(const TensorImpl* impl) {
Backend backend = tensorTypeIdToBackend(impl->type_id());
return globalContext().getType(
backend, typeMetaToScalarType(impl->dtype()), impl->is_variable());
}
TypeExtendedInterface& getType(const Tensor& t) {
return getType(t.unsafeGetTensorImpl());
}
LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options) {
return globalContext().getLegacyTHDispatcher(
options.backend(), typeMetaToScalarType(options.dtype()));
}
LegacyTHDispatcher& getLegacyTHDispatcher(const TensorImpl* impl) {
Backend backend = tensorTypeIdToBackend(impl->type_id());
return globalContext().getLegacyTHDispatcher(
backend, typeMetaToScalarType(impl->dtype()));
}
Allocator* getCPUAllocator() {
return getTHDefaultAllocator();
}
struct LegacyDeviceTypeInit : public LegacyDeviceTypeInitInterface {
LegacyDeviceTypeInit(LegacyDeviceTypeInitArgs) {}
void initCPU() const override {
globalContext();
}
void initCUDA() const override {
globalContext().lazyInitCUDA();
}
void initHIP() const override {
globalContext().lazyInitHIP();
}
void initComplex() const override {
globalContext().lazyInitComplex();
}
};
REGISTER_LEGACY_TYPE_INIT(LegacyDeviceTypeInit);
}

View File

@ -1,13 +1,19 @@
#pragma once
#include "ATen/ATenGeneral.h"
#include <ATen/CPUGeneral.h>
#include "ATen/Generator.h"
#include "ATen/Type.h"
#include "ATen/TypeExtendedInterface.h"
#include "ATen/Utils.h"
#include "ATen/Error.h"
#include "ATen/LegacyTHDispatch.h"
#include "ATen/LegacyTHDispatcher.h"
#include "ATen/core/ATenGeneral.h"
#include "ATen/core/Generator.h"
#include "ATen/core/LegacyTypeDispatch.h"
#include "ATen/core/VariableHooksInterface.h"
#include "ATen/detail/CUDAHooksInterface.h"
#include "ATen/CUDAStream.h"
#include "ATen/detail/HIPHooksInterface.h"
#include "ATen/detail/ComplexHooksInterface.h"
#include "c10/util/Exception.h"
#include <memory>
#include <mutex>
@ -15,119 +21,94 @@
namespace at {
enum class IsVariable {
NotVariable,
Variable,
NumOptions
};
class Tensor;
class AT_API Context {
public:
class CAFFE2_API Context {
public:
Context();
Type* getTypeRaw(Backend p, ScalarType s) {
return type_registry[static_cast<int>(p)][static_cast<int>(s)].get();
TypeExtendedInterface* getNonVariableTypeRaw(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeRaw(p, s));
}
TypeExtendedInterface * getNonVariableTypeOpt(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeOpt(p, s));
}
TypeExtendedInterface & getNonVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getNonVariableType(p, s));
}
TypeExtendedInterface & getVariableType(Backend p, ScalarType s) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getVariableType(p, s));
}
TypeExtendedInterface & getType(Backend p, ScalarType s, bool is_variable) {
return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getType(p, s, is_variable));
}
LegacyTHDispatcher& getLegacyTHDispatcher(Backend p, ScalarType s) {
return globalLegacyTHDispatch().getLegacyTHDispatcher(p, s);
}
// The passed in Type must be delete'able
// TODO: Just make it take a unique_ptr
void registerType(Backend b, ScalarType s, Type* t) {
globalLegacyTypeDispatch().registerType(b, s,
LegacyTypeDispatch::TypeUniquePtr{t, LegacyTypeDeleter([](Type* p) { delete p; }) });
}
Type * getTypeOpt(Backend p, ScalarType s) {
initCUDAIfNeeded(p);
auto type = getTypeRaw(p, s);
if(!type) {
// there is only a single Undefined Type.
if (p == Backend::Undefined || s == ScalarType::Undefined) {
return getTypeRaw(Backend::Undefined, ScalarType::Undefined);
}
}
void registerLegacyTHDispatcher(Backend b, ScalarType s, LegacyTHDispatcher* t) {
globalLegacyTHDispatch().registerDispatcher(b, s,
LegacyTHDispatch::LegacyTHDispatcherUniquePtr{t, LegacyTHDispatcherDeleter([](LegacyTHDispatcher* p) { delete p; }) });
}
return type;
}
Type & getType(Backend p, ScalarType s) {
auto* type = getTypeOpt(p, s);
if (!type) AT_ERROR(toString(p), toString(s), "Type is not enabled.");
return *type;
}
Generator & defaultGenerator(Backend p) {
initCUDAIfNeeded(p);
auto & generator = generator_registry[static_cast<int>(p)];
Generator & defaultGenerator(DeviceType device_type) {
initCUDAIfNeeded(device_type);
initHIPIfNeeded(device_type);
auto & generator = generator_registry[static_cast<int>(device_type)];
if(!generator)
AT_ERROR(toString(p), " backend type not enabled.");
AT_ERROR(DeviceTypeName(device_type), " backend type not enabled.");
return *generator;
}
bool hasMKL() const;
bool hasLAPACK() const;
bool hasMAGMA() const {
return detail::getCUDAHooks().hasMAGMA();
}
bool hasCUDA() const {
return detail::getCUDAHooks().hasCUDA();
}
bool hasCuDNN() const {
return detail::getCUDAHooks().hasCuDNN();
bool hasHIP() const {
return detail::getHIPHooks().hasHIP();
}
int64_t current_device() const {
return detail::getCUDAHooks().current_device();
}
// defined in header so that getType has ability to inline
// call_once check. getType is called fairly frequently
// defined in header so that getNonVariableType has ability to inline
// call_once check. getNonVariableType is called fairly frequently
THCState* lazyInitCUDA() {
std::call_once(thc_init,[&] {
thc_state = detail::getCUDAHooks().initCUDA();
generator_registry[static_cast<int>(Backend::CUDA)] =
generator_registry[static_cast<int>(DeviceType::CUDA)] =
detail::getCUDAHooks().initCUDAGenerator(this);
detail::getCUDAHooks().registerCUDATypes(this);
});
return thc_state.get();
}
THHState* lazyInitHIP() {
std::call_once(thh_init,[&] {
thh_state = detail::getHIPHooks().initHIP();
generator_registry[static_cast<int>(DeviceType::HIP)] =
detail::getHIPHooks().initHIPGenerator(this);
detail::getHIPHooks().registerHIPTypes(this);
});
return thh_state.get();
}
void lazyInitComplex() {
std::call_once(complex_init_, [&] {
detail::getComplexHooks().registerComplexTypes(this);
});
}
THCState* getTHCState() {
// AT_ASSERT(thc_state);
return thc_state.get();
}
CUDAStream createCUDAStream() const {
return detail::CUDAStream_createAndRetainWithOptions(
CUDAStream::DEFAULT_FLAGS
, CUDAStream::DEFAULT_PRIORITY
);
THHState* getTHHState() {
return thh_state.get();
}
CUDAStream createCUDAStreamWithOptions(int32_t flags, int32_t priority) const {
return detail::CUDAStream_createAndRetainWithOptions(flags, priority);
}
CUDAStream getDefaultCUDAStream() const {
return detail::CUDAStream_getDefaultStream();
}
CUDAStream getDefaultCUDAStreamOnDevice(int64_t device) const {
return detail::CUDAStream_getDefaultStreamOnDevice(device);
}
CUDAStream getCurrentCUDAStream() const {
return detail::CUDAStream_getAndRetainCurrentStream();
}
CUDAStream getCurrentCUDAStreamOnDevice(int64_t device) const {
return detail::CUDAStream_getAndRetainCurrentStreamOnDevice(device);
}
void setCurrentCUDAStream(CUDAStream stream) const {
return detail::CUDAStream_setStream(stream.internals());
}
void setCurrentCUDAStreamOnDevice(int64_t device, CUDAStream stream) const {
return detail::CUDAStream_setStreamOnDevice(device, stream.internals());
}
#ifndef __HIP_PLATFORM_HCC__
cusparseHandle_t getCurrentCUDASparseHandle() const {
return detail::getCUDAHooks().getCurrentCUDASparseHandle(thc_state.get());
}
#endif
cudaDeviceProp* getCurrentDeviceProperties() const {
return detail::getCUDAHooks().getCurrentDeviceProperties(thc_state.get());
}
cudaDeviceProp* getDeviceProperties(int device) const {
return detail::getCUDAHooks().getDeviceProperties(thc_state.get(), device);
}
int getNumGPUs() const {
return detail::getCUDAHooks().getNumGPUs();
}
size_t freshTypeID() {
return next_id++;
}
@ -144,28 +125,36 @@ public:
bool deterministicCuDNN() const;
void setDeterministicCuDNN(bool);
std::unique_ptr<Generator>
generator_registry[static_cast<int>(Backend::NumOptions)];
generator_registry[static_cast<int>(DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES)];
private:
// NB: type_registry has nullptr for all CUDA backends until
// CUDA initialization has occurred
std::unique_ptr<Type> type_registry
[static_cast<int>(Backend::NumOptions)]
[static_cast<int>(ScalarType::NumOptions)];
void initCUDAIfNeeded(Backend p) {
if(p == Backend::CUDA)
void initCUDAIfNeeded(DeviceType p) {
if (p == DeviceType::CUDA) {
lazyInitCUDA();
}
}
void initHIPIfNeeded(DeviceType p) {
if (p == DeviceType::HIP) {
lazyInitHIP();
}
}
void initComplexIfNeeded(ScalarType s) {
if (isComplexType(s)) {
lazyInitComplex();
}
}
std::once_flag thc_init;
std::once_flag thh_init;
std::once_flag complex_init_;
bool enabled_cudnn = true;
bool deterministic_cudnn = false;
bool benchmark_cudnn = false;
std::atomic<size_t> next_id;
std::unique_ptr<THCState, void(*)(THCState*)> thc_state;
std::unique_ptr<THHState, void(*)(THHState*)> thh_state;
friend struct Type;
friend void register_cuda_types(Context * context);
};
AT_API Context & globalContext();
CAFFE2_API Context& globalContext();
static inline void init() {
globalContext();
@ -177,32 +166,62 @@ static inline void init() {
}
}
static inline Type& getType(Backend p, ScalarType s) {
return globalContext().getType(p, s);
static inline TypeExtendedInterface& getNonVariableType(Backend p, ScalarType s) {
return globalContext().getNonVariableType(p, s);
}
static inline Type& CPU(ScalarType s) {
return getType(Backend::CPU, s);
static inline TypeExtendedInterface& getNonVariableType(DeviceType p, ScalarType s) {
return globalContext().getNonVariableType(deviceTypeToBackend(p), s);
}
static inline Type& CUDA(ScalarType s) {
return getType(Backend::CUDA, s);
CAFFE2_API TypeExtendedInterface& getType(TensorOptions options);
CAFFE2_API TypeExtendedInterface& getType(const TensorImpl*);
CAFFE2_API TypeExtendedInterface& getType(const Tensor&);
CAFFE2_API Allocator* getCPUAllocator();
static inline TypeExtendedInterface& CPU(ScalarType s) {
return getNonVariableType(Backend::CPU, s);
}
static inline TypeExtendedInterface& CUDA(ScalarType s) {
return getNonVariableType(Backend::CUDA, s);
}
static inline TypeExtendedInterface& HIP(ScalarType s) {
return getNonVariableType(Backend::HIP, s);
}
CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options);
CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(const Tensor&);
static inline bool hasCUDA() {
return globalContext().hasCUDA();
}
static inline bool hasCuDNN() {
return globalContext().hasCuDNN();
static inline bool hasHIP() {
return globalContext().hasHIP();
}
static inline bool hasMKL() {
return globalContext().hasMKL();
}
static inline int64_t current_device() {
return globalContext().current_device();
static inline bool hasLAPACK() {
return globalContext().hasLAPACK();
}
static inline bool hasMAGMA() {
return globalContext().hasMAGMA();
}
static inline void manual_seed(uint64_t seed) {
globalContext().defaultGenerator(DeviceType::CPU).manualSeed(seed);
// NB: Sometimes we build with CUDA, but we don't have any GPUs
// available. In that case, we must not seed CUDA; it will fail!
if (hasCUDA() && detail::getCUDAHooks().getNumGPUs() > 0) {
globalContext().defaultGenerator(DeviceType::CUDA).manualSeedAll(seed);
}
}
} // namespace at

View File

@ -1,4 +1,5 @@
#include "ATen/DLConvertor.h"
#include "ATen/Functions.h"
#include <iostream>
#include <sstream>
@ -36,6 +37,12 @@ static DLDataType getDLDataType(const Type& type) {
case ScalarType::Half:
dtype.code = DLDataTypeCode::kDLFloat;
break;
case ScalarType::ComplexHalf:
throw std::logic_error("ComplexHalf is not supported by dlpack");
case ScalarType::ComplexFloat:
throw std::logic_error("ComplexFloat is not supported by dlpack");
case ScalarType::ComplexDouble:
throw std::logic_error("ComplexDouble is not supported by dlpack");
case ScalarType::Undefined:
throw std::logic_error("Undefined is not a valid ScalarType");
case ScalarType::NumOptions:
@ -57,19 +64,20 @@ static DLContext getDLContext(const Type& type, const int64_t& device_id) {
}
static Backend getATenBackend(const DLContext& ctx) {
Backend backend;
static DeviceType getATenDeviceType(const DLContext& ctx) {
switch (ctx.device_type) {
case DLDeviceType::kDLCPU:
backend = Backend::CPU;
break;
return DeviceType::CPU;
case DLDeviceType::kDLGPU:
backend = Backend::CUDA;
break;
return DeviceType::CUDA;
case DLDeviceType::kDLOpenCL:
return DeviceType::OPENCL;
case DLDeviceType::kDLROCM:
return DeviceType::HIP;
default:
throw std::logic_error("Unsupported device_type: " + std::to_string(ctx.device_type));
}
return backend;
return DeviceType::CPU; // impossible
}
@ -144,7 +152,7 @@ DLManagedTensor* toDLPack(const Tensor& src) {
atDLMTensor->tensor.deleter = &deleter;
atDLMTensor->tensor.dl_tensor.data = src.data_ptr();
int64_t device_id = 0;
if (src.type().is_cuda()) {
if (src.is_cuda()) {
device_id = src.get_device();
}
atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id);
@ -158,15 +166,15 @@ DLManagedTensor* toDLPack(const Tensor& src) {
Tensor fromDLPack(const DLManagedTensor* src) {
Backend backend = getATenBackend(src->dl_tensor.ctx);
DeviceType device_type = getATenDeviceType(src->dl_tensor.ctx);
ScalarType stype = toScalarType(src->dl_tensor.dtype);
auto deleter = [src](void * self) {
src->deleter(const_cast<DLManagedTensor*>(src));
};
return getType(backend, stype).tensorFromBlob(
src->dl_tensor.data,
return at::from_blob(src->dl_tensor.data,
IntList(src->dl_tensor.shape, src->dl_tensor.ndim),
IntList(src->dl_tensor.strides, src->dl_tensor.ndim),
deleter);
deleter,
at::device(device_type).dtype(stype));
}
} //namespace at

View File

@ -10,8 +10,8 @@
namespace at {
AT_API ScalarType toScalarType(const DLDataType& dtype);
AT_API DLManagedTensor * toDLPack(const Tensor& src);
AT_API Tensor fromDLPack(const DLManagedTensor* src);
CAFFE2_API ScalarType toScalarType(const DLDataType& dtype);
CAFFE2_API DLManagedTensor* toDLPack(const Tensor& src);
CAFFE2_API Tensor fromDLPack(const DLManagedTensor* src);
} //namespace at

File diff suppressed because it is too large Load Diff

View File

@ -1,128 +1,2 @@
#pragma once
#include <ATen/Error.h>
#include <ATen/ScalarType.h>
#include <cstddef>
#include <iosfwd>
#include <string>
#include <functional>
namespace at {
/// Represents a a compute device on which a tensor is located. A device is
/// uniquely identified by a type, which specifies the type of machine it is
/// (e.g. CPU or CUDA GPU), and a device index or ordinal, which identifies the
/// specific compute device when there is more than one of a certain type. The
/// device index is optional, and in its defaulted state represents (abstractly)
/// "the current device". Further, there are two constraints on the value of the
/// device index, if one is explicitly stored:
/// 1. A negative index represents the current device, a non-negative index
/// represents a specific, concrete device,
/// 2. When the device type is CPU, the device index must be zero.
struct Device {
/// The possible values of the device *type*.
enum class Type { CPU, CUDA };
/// Converts a `Backend` to a `Device::Type` if possible.
static Type backend_to_type(Backend backend) {
switch (backend) {
case kCPU:
case kSparseCPU:
return Type::CPU;
case kCUDA:
case kSparseCUDA:
return Type::CUDA;
default:
AT_ERROR(
"Invalid backend ", toString(backend), " for Device construction");
}
}
/// Constructs a new `Device` from a `Type` and an optional device index.
/* implicit */ Device(Type type, int32_t index = -1)
: type_(type), index_(index) {
AT_CHECK(
index == -1 || index >= 0,
"Device index must be -1 or non-negative, got ",
index);
AT_CHECK(
!is_cpu() || index <= 0,
"CPU device index must be -1 or zero, got ",
index);
}
/// Constructs a `Device` from a string description, for convenience.
/// The string supplied must follow the following schema:
/// `(cpu|cuda):[<device-index>]`
/// where `cpu:` or `cuda:` specifies the device type, and
/// `<device-index>` optionally specifies a device index.
/* implicit */ Device(const std::string& device_string);
/// Constructs a new `Device` from a `Backend` (which is converted to a
/// `Type`, if possible) and an optional device index.
/* implicit */ Device(Backend backend, int32_t index = -1)
: Device(backend_to_type(backend), index) {}
/// Returns true if the type and index of this `Device` matches that of
/// `other`.
bool operator==(const Device& other) const noexcept {
return this->type_ == other.type_ && this->index_ == other.index_;
}
/// Returns true if the type or index of this `Device` differs from that of
/// `other`.
bool operator!=(const Device& other) const noexcept {
return !(*this == other);
}
/// Sets the device index.
void set_index(int32_t index) {
index_ = index;
}
/// Returns the type of device this is.
Type type() const noexcept {
return type_;
}
/// Returns the optional index.
const int32_t& index() const noexcept {
return index_;
}
/// Returns true if the device has a non-default index.
bool has_index() const noexcept {
return index_ != -1;
}
/// Return true if the device is of CUDA type.
bool is_cuda() const noexcept {
return type_ == Type::CUDA;
}
/// Return true if the device is of CPU type.
bool is_cpu() const noexcept {
return type_ == Type::CPU;
}
private:
Type type_;
int32_t index_ = -1;
};
} // namespace at
std::ostream& operator<<(std::ostream& stream, at::Device::Type type);
std::ostream& operator<<(std::ostream& stream, const at::Device& device);
namespace std {
template<> struct hash<at::Device>
{
size_t operator()(const at::Device& device) const noexcept {
size_t hash_val = static_cast<size_t>(device.index() + 1);
if (device.is_cuda()) {
hash_val += 2;
}
return hash_val;
}
};
} // namespace std
#include <c10/Device.h>

View File

@ -1,102 +1,36 @@
#pragma once
#include <ATen/Device.h>
#include <ATen/Error.h>
#include <ATen/ScalarType.h>
#include <ATen/Tensor.h>
#include <ATen/detail/CUDAHooksInterface.h>
#include <cstddef>
#include <c10/DeviceGuard.h>
#include <ATen/core/Tensor.h>
#include <c10/core/ScalarType.h> // TensorList whyyyyy
namespace at {
/// RAII guard that sets a certain default GPU index in its constructor, and
/// changes it back to the device that was originally active upon destruction.
///
/// The index is always reset to the one that was active at the time of
/// construction of the guard. Even if you `set_index` after construction, the
/// destructor will still reset the index to the one that was active at
/// construction time.
struct DeviceGuard {
/// Default constructor, does nothing.
DeviceGuard() = default;
/// Uses the given device's `index()` if it is a CUDA device, else does
/// nothing.
explicit DeviceGuard(Device device) {
if (device.is_cuda()) {
set_index(device.index());
}
// Are you here because you're wondering why DeviceGuard(tensor) no
// longer works? For code organization reasons, we have temporarily(?)
// removed this constructor from DeviceGuard. The new way to
// spell it is:
//
// OptionalDeviceGuard guard(device_of(tensor));
/// Return the Device of a Tensor, if the Tensor is defined.
inline optional<Device> device_of(Tensor t) {
if (t.defined()) {
return make_optional(t.device());
} else {
return nullopt;
}
}
/// Calls `set_device` with the given index.
explicit DeviceGuard(int32_t index) {
set_index(index);
/// Return the Device of a TensorList, if the list is non-empty and
/// the first Tensor is defined. (This function implicitly assumes
/// that all tensors in the list have the same device.)
inline optional<Device> device_of(TensorList t) {
if (!t.empty()) {
return device_of(t.front());
} else {
return nullopt;
}
}
/// Sets the device to the index on which the given tensor is located.
explicit DeviceGuard(const Tensor& tensor) {
set_index_from(tensor);
}
/// Sets the device to the index on which the first tensor in the list is
/// located. If the list is empty, does nothing.
explicit DeviceGuard(const TensorList& tensors) {
if (!tensors.empty()) {
set_index_from(tensors.front());
}
}
/// Resets the device to the index that was active at construction of the
/// guard.
~DeviceGuard() {
// It should only not have a value if an index was never actually set.
if (original_index_ != -1) {
// Unchecked because we don't want to throw in the destructor.
detail::DynamicCUDAInterface::unchecked_set_device(original_index_);
}
}
/// Sets the device to the given one.
void set_index(int32_t index) {
if (index == -1) {
return;
}
AT_ASSERT(index >= 0);
if (original_index_ == -1) {
int32_t previous_index = -123;
detail::DynamicCUDAInterface::get_device(&previous_index);
original_index_ = previous_index;
if (index != original_index_) {
detail::DynamicCUDAInterface::set_device(index);
}
} else {
detail::DynamicCUDAInterface::set_device(index);
}
last_index_ = index;
}
/// Calls `set_index` with the `Tensor`'s current device, if it is a CUDA
/// tensor. Does nothing if the `tensor` is not defined.
void set_index_from(const Tensor& tensor) {
if (tensor.defined() && tensor.is_cuda()) {
set_index(tensor.get_device());
}
}
/// Returns the device that was set upon construction of the guard.
int32_t original_index() const noexcept {
return original_index_;
}
// /// Returns the last device that was set via `set_device`, if any.
int32_t last_index() const noexcept {
return last_index_;
}
private:
/// The original device that was active at construction of this object.
int32_t original_index_ = -1;
/// The last index that was set via `set_device`.
int32_t last_index_ = -1;
};
} // namespace at

View File

@ -1,11 +1,2 @@
#pragma once
#include "SmallVector.h"
#include <stdint.h>
namespace at {
/// A container for sizes or strides
using DimVector = SmallVector<int64_t, 5>;
}
#include <ATen/core/DimVector.h>

View File

@ -1,8 +1,8 @@
#pragma once
#include <ATen/Error.h>
#include <ATen/Half.h>
#include <ATen/Type.h>
#include <ATen/core/Half.h>
#include <c10/util/Exception.h>
#define AT_PRIVATE_CASE_TYPE(enum_type, type, ...) \
case enum_type: { \
@ -10,72 +10,144 @@
return __VA_ARGS__(); \
}
#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
default: \
#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
} \
}()
#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, Half, __VA_ARGS__) \
default: \
#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
} \
}()
#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
AT_ERROR("%s not implemented for '%s'", (NAME), the_type.toString()); \
} \
#define AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexHalf, std::complex<at::Half>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, Half, __VA_ARGS__) \
default: \
#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()
#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...) \
[&] { \
const at::Type& the_type = TYPE; \
switch (the_type.scalarType()) { \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__) \
AT_PRIVATE_CASE_TYPE( \
at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__) \
default: \
AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \
} \
}()

View File

@ -1,32 +0,0 @@
#include <ATen/Error.h>
#include <ATen/Backtrace.h>
#include <iostream>
#include <string>
namespace at {
std::ostream& operator<<(std::ostream& out, const SourceLocation& loc) {
out << loc.function << " at " << loc.file << ":" << loc.line;
return out;
}
Error::Error(SourceLocation source_location, std::string err)
: what_without_backtrace_(err)
, what_(str(err, " (", source_location, ")\n", get_backtrace(/*frames_to_skip=*/2)))
{}
void Warning::warn(SourceLocation source_location, std::string msg) {
warning_handler_(source_location, msg.c_str());
}
void Warning::set_warning_handler(handler_t handler) {
warning_handler_ = handler;
}
void Warning::print_warning(const SourceLocation& source_location, const char* msg) {
std::cerr << "Warning: " << msg << " (" << source_location << ")\n";
}
Warning::handler_t Warning::warning_handler_ = &Warning::print_warning;
} // namespace at

View File

@ -1,131 +0,0 @@
#pragma once
#include <ATen/ATenGeneral.h> // for AT_API
#include <ATen/optional.h>
#include <cstddef>
#include <exception>
#include <ostream>
#include <sstream>
#include <string>
#if defined(_MSC_VER) && _MSC_VER <= 1900
#define __func__ __FUNCTION__
#endif
namespace at {
namespace detail {
inline std::ostream& _str(std::ostream& ss) { return ss; }
template <typename T>
inline std::ostream& _str(std::ostream& ss, const T& t) {
ss << t;
return ss;
}
template <typename T, typename... Args>
inline std::ostream&
_str(std::ostream& ss, const T& t, const Args&... args) {
return _str(_str(ss, t), args...);
}
} // namespace detail
// Convert a list of string-like arguments into a single string.
template <typename... Args>
inline std::string str(const Args&... args) {
std::ostringstream ss;
detail::_str(ss, args...);
return ss.str();
}
// Specializations for already-a-string types.
template <>
inline std::string str(const std::string& str) {
return str;
}
inline std::string str(const char* c_str) {
return c_str;
}
/// Represents a location in source code (for debugging).
struct SourceLocation {
const char* function;
const char* file;
uint32_t line;
};
std::ostream& operator<<(std::ostream& out, const SourceLocation& loc);
/// The primary ATen error class.
/// Provides a complete error message with source location information via
/// `what()`, and a more concise message via `what_without_backtrace()`. Should
/// primarily be used with the `AT_ERROR` macro.
///
/// NB: at::Error is handled specially by the default torch to suppress the
/// backtrace, see torch/csrc/Exceptions.h
class AT_API Error : public std::exception {
std::string what_without_backtrace_;
std::string what_;
public:
Error(SourceLocation source_location, std::string err);
/// Returns the complete error message, including the source location.
const char* what() const noexcept override {
return what_.c_str();
}
/// Returns only the error message string, without source location.
const char* what_without_backtrace() const noexcept {
return what_without_backtrace_.c_str();
}
};
class AT_API Warning {
using handler_t = void(*)(const SourceLocation& source_location, const char* msg);
public:
/// Issue a warning with a given message. Dispatched to the current
/// warning handler.
static void warn(SourceLocation source_location, std::string msg);
/// Sets the global warning handler. This is not thread-safe, so it should
/// generally be called once during initialization.
static void set_warning_handler(handler_t handler);
/// The default warning handler. Prints the message to stderr.
static void print_warning(const SourceLocation& source_location, const char* msg);
private:
static handler_t warning_handler_;
};
} // namespace at
// TODO: variants that print the expression tested and thus don't require strings
// TODO: CAFFE_ENFORCE_WITH_CALLER style macro
#define AT_ERROR(...) \
throw at::Error({__func__, __FILE__, __LINE__}, at::str(__VA_ARGS__))
#define AT_WARN(...) \
at::Warning::warn({__func__, __FILE__, __LINE__}, at::str(__VA_ARGS__))
#define AT_ASSERT(cond) \
if (!(cond)) { \
AT_ERROR(#cond " ASSERT FAILED at ", __FILE__, ":", __LINE__, ", please report a bug to PyTorch."); \
}
#define AT_ASSERTM(cond, ...) \
if (!(cond)) { \
AT_ERROR(at::str(#cond, " ASSERT FAILED at ", __FILE__, ":", __LINE__, ", please report a bug to PyTorch. ", __VA_ARGS__)); \
}
#define AT_CHECK(cond, ...) \
if (!(cond)) { \
AT_ERROR(at::str(__VA_ARGS__)); \
}

View File

@ -29,11 +29,13 @@ std::vector<int64_t> infer_size(IntList a, IntList b) {
}
std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
const Tensor& tensor,
IntList tensor_sizes,
IntList tensor_strides,
IntList sizes) {
int64_t ndim = sizes.size();
int64_t tensor_dim = tensor_sizes.size();
if (tensor.dim() == 0) {
if (tensor_dim == 0) {
std::vector<int64_t> expandedStrides(ndim, 0);
return std::tuple<std::vector<int64_t>, std::vector<int64_t>>(
sizes.vec(), expandedStrides);
@ -44,9 +46,9 @@ std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
// create a new geometry for the tensors
for (int64_t i = ndim - 1; i >= 0; --i) {
int64_t offset = ndim - 1 - i;
int64_t dim = tensor.dim() - 1 - offset;
int64_t size = (dim >= 0) ? tensor.sizes()[dim] : 1;
int64_t stride = (dim >= 0) ? tensor.strides()[dim]
int64_t dim = tensor_dim - 1 - offset;
int64_t size = (dim >= 0) ? tensor_sizes[dim] : 1;
int64_t stride = (dim >= 0) ? tensor_strides[dim]
: expandedSizes[i + 1] * expandedStrides[i + 1];
int64_t targetSize = sizes[i];
if (targetSize == -1) {
@ -66,7 +68,11 @@ std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(
") must match the existing size (",
size,
") at non-singleton dimension ",
i);
i,
". Target sizes: ",
sizes,
". Tensor sizes: ",
tensor_sizes);
size = targetSize;
stride = 0;
}

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/Error.h"
#include "c10/util/Exception.h"
#include <functional>
#include <sstream>
@ -9,8 +9,12 @@
namespace at {
AT_API std::vector<int64_t> infer_size(IntList a, IntList b);
std::tuple<std::vector<int64_t>, std::vector<int64_t> > inferExpandGeometry(const Tensor &tensor, IntList sizes);
CAFFE2_API std::vector<int64_t> infer_size(IntList a, IntList b);
CAFFE2_API std::tuple<std::vector<int64_t>, std::vector<int64_t>>
inferExpandGeometry(
IntList tensor_sizes,
IntList tensor_strides,
IntList sizes);
// avoid copy-construction of Tensor by using a reference_wrapper.
inline void check_defined(std::initializer_list<std::reference_wrapper<const Tensor>> tensors, const char *api_name) {
@ -110,7 +114,7 @@ inline std::vector<Tensor> expand_outplace(TensorList to_expand) {
if (!to_expand[i].defined()) {
continue;
} else if (first) {
sizes = to_expand[i].sizes();
sizes = to_expand[i].sizes().vec();
first = false;
} else {
sizes = infer_size(sizes, to_expand[i].sizes());
@ -130,4 +134,44 @@ inline std::vector<Tensor> expand_outplace(TensorList to_expand) {
return result;
}
// Sums `tensor` repeatedly to produce a tensor of shape `shape`.
// Precondition: is_expandable_to(shape, tensor.sizes()) must be true
static inline Tensor sum_to(Tensor tensor, const IntList shape) {
if (shape.size() == 0) {
return tensor.sum();
}
c10::SmallVector<int64_t, 8> reduce_dims;
const at::IntList sizes = tensor.sizes();
const int64_t leading_dims = sizes.size() - shape.size();
for (int64_t i = 0; i < leading_dims; ++i) {
reduce_dims.push_back(i);
}
for (int64_t i = leading_dims; i < static_cast<int64_t>(sizes.size()); ++i) {
if (shape[i - leading_dims] == 1 && sizes[i] > 1) {
reduce_dims.push_back(i);
}
}
if (!reduce_dims.empty()) {
tensor = tensor.sum(reduce_dims, /*keepdim=*/true);
}
return leading_dims > 0 ? tensor.view(shape) : tensor;
}
// True if `shape` can be broadcasted to `desired`
static inline bool is_expandable_to(IntList shape, IntList desired) {
int ndim = shape.size();
int target_dim = desired.size();
if (ndim > target_dim) {
return false;
}
for (int i = 0; i < ndim; i++) {
int64_t size = shape[ndim - i - 1];
int64_t target = desired[target_dim - i - 1];
if (size != target && size != 1) {
return false;
}
}
return true;
}
}

View File

@ -1,26 +1 @@
#pragma once
#include <iostream>
#include "ATen/Type.h"
#include "ATen/Scalar.h"
namespace at {
AT_API std::ostream& operator<<(std::ostream & out, IntList list);
AT_API std::ostream& operator<<(std::ostream & out, Backend b);
AT_API std::ostream& operator<<(std::ostream & out, ScalarType t);
AT_API std::ostream& operator<<(std::ostream & out, const Type & t);
AT_API std::ostream& print(std::ostream& stream, const Tensor & tensor, int64_t linesize);
static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) {
return print(out,t,80);
}
static inline void print(const Tensor & t, int64_t linesize=80) {
print(std::cout,t,linesize);
}
static inline std::ostream& operator<<(std::ostream & out, Scalar s) {
s = s.local();
return out << (s.isFloatingPoint() ? s.toDouble() : s.toLong());
}
}
#include <ATen/core/Formatting.h>

View File

@ -1,23 +1,2 @@
#pragma once
#include <stdint.h>
namespace at {
struct Generator {
Generator() {};
Generator(const Generator& other) = delete;
Generator(Generator&& other) = delete;
virtual ~Generator() {};
virtual Generator& copy(const Generator& other) = 0;
virtual Generator& free() = 0;
virtual uint64_t seed() = 0;
virtual uint64_t initialSeed() = 0;
virtual Generator& manualSeed(uint64_t seed) = 0;
virtual Generator& manualSeedAll(uint64_t seed) = 0;
virtual void * unsafeGetTH() = 0;
};
} // namespace at
#include <ATen/core/Generator.h>

View File

@ -1,168 +0,0 @@
#pragma once
#include "ATen/ATenGeneral.h"
#include <cstring>
#include <limits>
#ifdef __CUDACC__
#include <cuda_fp16.h>
#endif
namespace at {
/// Constructors
inline AT_HOSTDEVICE Half::Half(float value) {
#if defined(__CUDA_ARCH__) || defined(__HIP_DEVICE_COMPILE__)
x = __half_as_short(__float2half(value));
#else
x = detail::float2halfbits(value);
#endif
}
/// Implicit conversions
inline AT_HOSTDEVICE Half::operator float() const {
#if defined(__CUDA_ARCH__) || defined(__HIP_DEVICE_COMPILE__)
return __half2float(*reinterpret_cast<const __half*>(&x));
#else
return detail::halfbits2float(x);
#endif
}
#ifdef __CUDACC__
inline AT_HOSTDEVICE Half::Half(const __half& value) {
x = *reinterpret_cast<const unsigned short*>(&value);
}
inline AT_HOSTDEVICE Half::operator __half() const {
return *reinterpret_cast<const __half*>(&x);
}
#endif
/// Arithmetic
inline AT_HOSTDEVICE Half operator+(const Half& a, const Half& b) {
return (float)a + (float)b;
}
inline AT_HOSTDEVICE Half operator-(const Half& a, const Half& b) {
return (float)a - (float)b;
}
inline AT_HOSTDEVICE Half operator*(const Half& a, const Half& b) {
return (float)a * (float)b;
}
inline AT_HOSTDEVICE Half operator/(const Half& a, const Half& b) {
return (float)a / (float)b;
}
inline AT_HOSTDEVICE Half operator-(const Half& a) {
return -(float)a;
}
inline AT_HOSTDEVICE Half& operator+=(Half& a, const Half& b) {
a = a + b;
return a;
}
inline AT_HOSTDEVICE Half& operator-=(Half& a, const Half& b) {
a = a - b;
return a;
}
inline AT_HOSTDEVICE Half& operator*=(Half& a, const Half& b) {
a = a * b;
return a;
}
inline AT_HOSTDEVICE Half& operator/=(Half& a, const Half& b) {
a = a / b;
return a;
}
/// Arithmetic with floats
inline AT_HOSTDEVICE float operator+(Half a, float b) { return (float)a + b; }
inline AT_HOSTDEVICE float operator-(Half a, float b) { return (float)a - b; }
inline AT_HOSTDEVICE float operator*(Half a, float b) { return (float)a * b; }
inline AT_HOSTDEVICE float operator/(Half a, float b) { return (float)a / b; }
inline AT_HOSTDEVICE float operator+(float a, Half b) { return a + (float)b; }
inline AT_HOSTDEVICE float operator-(float a, Half b) { return a - (float)b; }
inline AT_HOSTDEVICE float operator*(float a, Half b) { return a * (float)b; }
inline AT_HOSTDEVICE float operator/(float a, Half b) { return a / (float)b; }
inline AT_HOSTDEVICE float& operator+=(float& a, const Half& b) { return a += (float)b; }
inline AT_HOSTDEVICE float& operator-=(float& a, const Half& b) { return a -= (float)b; }
inline AT_HOSTDEVICE float& operator*=(float& a, const Half& b) { return a *= (float)b; }
inline AT_HOSTDEVICE float& operator/=(float& a, const Half& b) { return a /= (float)b; }
/// Arithmetic with doubles
inline AT_HOSTDEVICE double operator+(Half a, double b) { return (double)a + b; }
inline AT_HOSTDEVICE double operator-(Half a, double b) { return (double)a - b; }
inline AT_HOSTDEVICE double operator*(Half a, double b) { return (double)a * b; }
inline AT_HOSTDEVICE double operator/(Half a, double b) { return (double)a / b; }
inline AT_HOSTDEVICE double operator+(double a, Half b) { return a + (double)b; }
inline AT_HOSTDEVICE double operator-(double a, Half b) { return a - (double)b; }
inline AT_HOSTDEVICE double operator*(double a, Half b) { return a * (double)b; }
inline AT_HOSTDEVICE double operator/(double a, Half b) { return a / (double)b; }
/// Arithmetic with ints
inline AT_HOSTDEVICE Half operator+(Half a, int b) { return a + (Half)b; }
inline AT_HOSTDEVICE Half operator-(Half a, int b) { return a - (Half)b; }
inline AT_HOSTDEVICE Half operator*(Half a, int b) { return a * (Half)b; }
inline AT_HOSTDEVICE Half operator/(Half a, int b) { return a / (Half)b; }
inline AT_HOSTDEVICE Half operator+(int a, Half b) { return (Half)a + b; }
inline AT_HOSTDEVICE Half operator-(int a, Half b) { return (Half)a - b; }
inline AT_HOSTDEVICE Half operator*(int a, Half b) { return (Half)a * b; }
inline AT_HOSTDEVICE Half operator/(int a, Half b) { return (Half)a / b; }
/// NOTE: we do not define comparisons directly and instead rely on the implicit
/// conversion from at::Half to float.
} // namespace at
namespace std {
template<> class numeric_limits<at::Half> {
public:
static constexpr bool is_specialized = true;
static constexpr bool is_signed = true;
static constexpr bool is_integer = false;
static constexpr bool is_exact = false;
static constexpr bool has_infinity = true;
static constexpr bool has_quiet_NaN = true;
static constexpr bool has_signaling_NaN = true;
static constexpr auto has_denorm = numeric_limits<float>::has_denorm;
static constexpr auto has_denorm_loss = numeric_limits<float>::has_denorm_loss;
static constexpr auto round_style = numeric_limits<float>::round_style;
static constexpr bool is_iec559 = true;
static constexpr bool is_bounded = true;
static constexpr bool is_modulo = false;
static constexpr int digits = 11;
static constexpr int digits10 = 3;
static constexpr int max_digits10 = 5;
static constexpr int radix = 2;
static constexpr int min_exponent = -13;
static constexpr int min_exponent10 = -4;
static constexpr int max_exponent = 16;
static constexpr int max_exponent10 = 4;
static constexpr auto traps = numeric_limits<float>::traps;
static constexpr auto tinyness_before = numeric_limits<float>::tinyness_before;
static constexpr at::Half min() { return at::Half(0x0400, at::Half::from_bits); }
static constexpr at::Half lowest() { return at::Half(0xFBFF, at::Half::from_bits); }
static constexpr at::Half max() { return at::Half(0x7BFF, at::Half::from_bits); }
static constexpr at::Half epsilon() { return at::Half(0x1400, at::Half::from_bits); }
static constexpr at::Half round_error() { return at::Half(0x3800, at::Half::from_bits); }
static constexpr at::Half infinity() { return at::Half(0x7C00, at::Half::from_bits); }
static constexpr at::Half quiet_NaN() { return at::Half(0x7E00, at::Half::from_bits); }
static constexpr at::Half signaling_NaN() { return at::Half(0x7D00, at::Half::from_bits); }
static constexpr at::Half denorm_min() { return at::Half(0x0001, at::Half::from_bits); }
};
} // namespace std

View File

@ -1,34 +0,0 @@
#include "ATen/Half.h"
#include "ATen/Tensor.h"
#include "ATen/Context.h"
#include <TH/TH.h>
#include <iostream>
namespace at {
static_assert(std::is_standard_layout<Half>::value, "at::Half must be standard layout.");
namespace detail {
float halfbits2float(unsigned short bits) {
float value;
TH_halfbits2float(&bits, &value);
return value;
}
unsigned short float2halfbits(float value) {
unsigned short bits;
TH_float2halfbits(&value, &bits);
return bits;
}
} // namespace detail
std::ostream& operator<<(std::ostream & out, const Half& value) {
out << (float)value;
return out;
}
} // namespace at

View File

@ -1,113 +1,2 @@
#pragma once
/// Defines the Half type (half-precision floating-point) including conversions
/// to standard C types and basic arithmetic operations. Note that arithmetic
/// operations are implemented by converting to floating point and
/// performing the operation in float32, instead of using CUDA half intrinisics.
/// Most uses of this type within ATen are memory bound, including the
/// element-wise kernels, and the half intrinisics aren't efficient on all GPUs.
/// If you are writing a compute bound kernel, you can use the CUDA half
/// intrinsics directly on the Half type from device code.
#include "ATen/ATenGeneral.h"
#include <limits>
#include <string>
#include <cstdint>
#include <stdexcept>
#include <utility>
#include <cmath>
#include <iosfwd>
#ifdef __CUDACC__
#include <cuda_fp16.h>
#endif
#ifndef AT_HOSTDEVICE
#ifdef __CUDACC__
#define AT_HOSTDEVICE __host__ __device__
#else
#define AT_HOSTDEVICE
#endif
#endif
namespace at {
namespace detail {
float halfbits2float(unsigned short bits);
unsigned short float2halfbits(float value);
}
struct alignas(2) Half {
unsigned short x;
struct from_bits_t {};
static constexpr from_bits_t from_bits = from_bits_t();
// HIP wants __host__ __device__ tag, CUDA does not
#ifdef __HIP_PLATFORM_HCC__
AT_HOSTDEVICE Half() = default;
#else
Half() = default;
#endif
constexpr AT_HOSTDEVICE Half(unsigned short bits, from_bits_t) : x(bits) {};
inline AT_HOSTDEVICE Half(float value);
inline AT_HOSTDEVICE operator float() const;
#ifdef __CUDACC__
inline AT_HOSTDEVICE Half(const __half& value);
inline AT_HOSTDEVICE operator __half() const;
#endif
};
template<typename To, typename From> To convert(From f) {
return static_cast<To>(f);
}
// skip isnan and isinf check for integral types
template<typename To, typename From>
typename std::enable_if<std::is_integral<From>::value, bool>::type overflows(From f) {
using limit = std::numeric_limits<To>;
return f < limit::lowest() || f > limit::max();
}
template<typename To, typename From>
typename std::enable_if<!std::is_integral<From>::value, bool>::type overflows(From f) {
using limit = std::numeric_limits<To>;
if (limit::has_infinity && std::isinf((double)f)) {
return false;
}
if (!limit::has_quiet_NaN && (f != f)) {
return true;
}
return f < limit::lowest() || f > limit::max();
}
template<typename To, typename From> To checked_convert(From f, const char* name) {
if (overflows<To, From>(f)) {
std::string msg = "value cannot be converted to type ";
msg += name;
msg += " without overflow: ";
msg += std::to_string(f);
throw std::domain_error(std::move(msg));
}
return convert<To, From>(f);
}
template<typename To, typename From>
To HalfFix(From h) {
To ret;
ret.x = h.x;
return ret;
}
AT_API std::ostream& operator<<(std::ostream & out, const Half& value);
} // namespace at
#include "Half-inl.h"
#undef AT_HOSTDEVICE
#include <ATen/core/Half.h>

44
aten/src/ATen/InferSize.h Normal file
View File

@ -0,0 +1,44 @@
#pragma once
#include <c10/core/ScalarType.h>
#include <c10/util/Optional.h>
#include <sstream>
#include <vector>
namespace at {
// Infers the size of a dim with size -1, if it exists. Also checks that new
// shape is compatible with the number of elements.
static std::vector<int64_t> infer_size(IntList shape, int64_t numel) {
auto res = shape.vec();
int64_t newsize = 1;
auto infer_dim = c10::optional<int64_t>();
for (int64_t dim = 0, ndim = shape.size(); dim != ndim; dim++) {
if (shape[dim] == -1) {
if (infer_dim) {
throw std::runtime_error("only one dimension can be inferred");
}
infer_dim = dim;
} else if (shape[dim] >= 0) {
newsize *= shape[dim];
} else {
AT_ERROR("invalid shape dimension ", shape[dim]);
}
}
if (numel == newsize || (infer_dim && newsize > 0 && numel % newsize == 0)) {
if (infer_dim) {
// we have a degree of freedom here to select the dimension size; follow NumPy semantics
// and just bail.
AT_CHECK(newsize != 0, "cannot reshape tensor of 0 elements into shape ", shape);
res[*infer_dim] = numel / newsize;
}
return res;
}
std::ostringstream ss;
ss << "shape '" << shape << "' is invalid for input of size " << numel;
throw std::runtime_error(ss.str());
}
}

View File

@ -0,0 +1,15 @@
#pragma once
#include <ATen/core/TensorOptions.h>
namespace at {
// Represents the initial TensorOptions, before the "defaults" are ever changed.
// This is designed to be used in library code, where the explicit devices, dtypes, etc. are known.
// NOTE: this is not a stable API.
inline TensorOptions initialTensorOptions() {
return TensorOptions(kCPU).dtype(kFloat).layout(kStrided)
.requires_grad(false).is_variable(false);
}
}

View File

@ -1,20 +1,2 @@
#pragma once
#include <ATen/ScalarType.h>
namespace at {
enum class Layout { Strided, Sparse };
constexpr auto kStrided = Layout::Strided;
constexpr auto kSparse = Layout::Sparse;
inline Layout layout_from_backend(Backend backend) {
switch (backend) {
case Backend::SparseCPU:
case Backend::SparseCUDA:
return Layout::Sparse;
default:
return Layout::Strided;
}
}
} // namespace at
#include <c10/core/Layout.h>

View File

@ -0,0 +1,12 @@
#include <ATen/LegacyTHDispatch.h>
namespace at {
// TODO: This could be bad juju if someone calls globalContext() in the
// destructor of an object with static lifetime.
LegacyTHDispatch & globalLegacyTHDispatch() {
static LegacyTHDispatch singleton;
return singleton;
}
}

View File

@ -0,0 +1,91 @@
#pragma once
// LegacyTHDispatcher is the legacy mechanism for dispatching directly
// to TH/THNN/THC/THCUNN functions in ATen, which is essentially a giant virtual
// dispatch table for every TH function we support dynamically dispatching over.
//
// NB: We do not actually dispatch to *operators* here, the usual pattern is for
// ATen operators to call this mechanism for their implementation, but the
// operator itself is declared separately (e.g. as a native function "wrapper").
//
// Q: Why don't we just use LegacyTypeDispatch here?
// A: Mainly separation of concerns:
// 1) Type is for implementation of operators, which requires codegen of
// Variables, JIT, etc. That is handled by the native function "wrappers";
// just calling into TH does not require that.
// 2) Type does not require scalar-specific dispatch, whereas calling into TH
// does. Thus, this separation allows us to evolve operator dispatch
// separately (i.e. to use the C10 dispatcher) from details of how to
// call TH functionality.
//
// The implmentation here is very similar to the LegacyTypeDispatch design, with
// the following simplications:
// 1) This is not required for a mobile build, so does not have to live in /core.
// 2) Because these only contain function implementations, we do not have to
// handle the Variable/Tensor split; that is handled at the native function
// "wrapper" level.
// 3) Because an operator must have been previously dispatched via the Type
// mechanism, we do need to handle device initialization. This means it is
// WRONG to call directly into these functions without first going through
// Type dispatch (i.e. the usual operator -> Type -> LegacyTHDispatch pattern).
// 4) Because an operator must have been previously dispatched via the Type
// mechanism, we do not need to handle undefined Tensors.
//
// NB: We don't use Registry for this, because we don't want to
// pay for a hash table lookup every time we do an operation.
//
// NB: we can delete this when we don't call into any TH implementations.
#include <c10/core/Backend.h>
#include <c10/core/ScalarType.h>
#include <ATen/LegacyTHDispatcher.h>
namespace at {
struct Type;
struct CAFFE2_API LegacyTHDispatcherDeleter {
using LegacyTHDispatcherDeleterFun = void(LegacyTHDispatcher*);
LegacyTHDispatcherDeleterFun *fn_ = nullptr;
LegacyTHDispatcherDeleter() {}
/* implicit */ LegacyTHDispatcherDeleter(LegacyTHDispatcherDeleterFun *fn) : fn_(fn) {}
void operator()(LegacyTHDispatcher * ptr) {
if (fn_) {
(*fn_)(ptr);
}
}
};
class CAFFE2_API LegacyTHDispatch {
public:
using LegacyTHDispatcherUniquePtr = std::unique_ptr<LegacyTHDispatcher, LegacyTHDispatcherDeleter>;
// WARNING: This function has the precondition that you have
// initialized the type you want to call. This initialization
// step is generally done by Context, or assumed because you
// have a Tensor and thus the Type of that Tensor must already
// be initialized.
void registerDispatcher(Backend b, ScalarType s, LegacyTHDispatcherUniquePtr&& t) {
dispatcher_registry[static_cast<int>(b)][static_cast<int>(s)] = std::move(t);
}
LegacyTHDispatcher* getLegacyTHDispatcherRaw(Backend p, ScalarType s) {
return dispatcher_registry[static_cast<int>(p)][static_cast<int>(s)].get();
}
LegacyTHDispatcher & getLegacyTHDispatcher(Backend p, ScalarType s) {
auto* type = getLegacyTHDispatcherRaw(p, s);
if (!type) AT_ERROR(toString(p), toString(s), "THDispatcher is not enabled.");
return *type;
}
private:
// NB: dispatcher_registry has nullptr for all CUDA backends until
// CUDA initialization has occurred
LegacyTHDispatcherUniquePtr dispatcher_registry
[static_cast<int>(Backend::NumOptions)]
[static_cast<int>(ScalarType::NumOptions)];
};
CAFFE2_API LegacyTHDispatch& globalLegacyTHDispatch();
} // namespace at

View File

@ -1,6 +1,6 @@
#pragma once
#include <ATen/ArrayRef.h>
#include <ATen/Utils.h>
#include <c10/util/ArrayRef.h>
#include <vector>

View File

@ -1,16 +0,0 @@
#include <ATen/OptionsGuard.h>
#include <ATen/optional.h>
namespace at {
thread_local at::optional<TensorOptions> DefaultTensorOptions::options_;
TensorOptions& DefaultTensorOptions::get() {
if (!options_) {
options_.emplace(
/*use_thread_local_default_options=*/false);
}
return *options_;
}
} // namespace at

View File

@ -1,54 +0,0 @@
#pragma once
#include <ATen/Device.h>
#include <ATen/Layout.h>
#include <ATen/ScalarType.h>
#include <ATen/TensorOptions.h>
#include <ATen/optional.h>
namespace at {
/// A wrapper over a thread local TensorOptions instance.
struct DefaultTensorOptions {
/// Returns the current thread local default options.
/// Defined in OptionsGuard.cpp because we can't use optional in headers, due
/// to Windows and other compilers.
static TensorOptions& get();
private:
/// This is an optional because of compiler bugs that mis-initialize static
/// thread local variables. The workaround is lazy initialization, i.e.
/// `DefaultTensorOptions::get()` will initialize the `options_` to a proper
/// value upon first invocation.
/// https://gcc.gnu.org/ml/gcc-bugs/2013-12/msg00026.html
static thread_local at::optional<TensorOptions> options_;
};
/// RAII guard that stores the current default options upon construction, sets
/// the current default options to the ones given to its constructor, and
/// finally resets the options back to the original ones in the destructor.
struct OptionsGuard {
/// Stores the current default options and sets them to the given ones.
explicit OptionsGuard(const TensorOptions& options)
: original_(DefaultTensorOptions::get()) {
DefaultTensorOptions::get() = options;
}
/// Restores the original default options.
~OptionsGuard() {
DefaultTensorOptions::get() = original_;
}
/// Returns the original options that were in place at the time of
/// construction of this object.
const TensorOptions& original() {
return original_;
}
private:
/// The original options that were in place at the time of construction of
/// this object.
TensorOptions original_;
};
} // namespace at

View File

@ -1,6 +1,8 @@
#pragma once
#include <ATen/ATen.h>
#include <atomic>
#include <cstddef>
#include <exception>
#ifdef _OPENMP
#include <omp.h>
@ -20,24 +22,62 @@ inline int64_t divup(int64_t x, int64_t y) {
return (x + y - 1) / y;
}
inline int get_max_threads() {
#ifdef _OPENMP
return omp_get_max_threads();
#else
return 1;
#endif
}
inline int get_thread_num() {
#ifdef _OPENMP
return omp_get_thread_num();
#else
return 0;
#endif
}
inline bool in_parallel_region() {
#ifdef _OPENMP
return omp_in_parallel();
#else
return false;
#endif
}
template <class F>
inline void parallel_for(
const int64_t begin,
const int64_t end,
const int64_t grain_size,
const F f) {
const F& f) {
#ifdef _OPENMP
#pragma omp parallel if ((end - begin) >= grain_size)
std::atomic_flag err_flag = ATOMIC_FLAG_INIT;
std::exception_ptr eptr;
#pragma omp parallel if (!omp_in_parallel() && ((end - begin) >= grain_size))
{
int64_t num_threads = omp_get_num_threads();
int64_t tid = omp_get_thread_num();
int64_t chunk_size = divup((end - begin), num_threads);
int64_t begin_tid = begin + tid * chunk_size;
if (begin_tid < end)
f(begin_tid, std::min(end, chunk_size + begin_tid));
if (begin_tid < end) {
try {
f(begin_tid, std::min(end, chunk_size + begin_tid));
} catch (...) {
if (!err_flag.test_and_set()) {
eptr = std::current_exception();
}
}
}
}
if (eptr) {
std::rethrow_exception(eptr);
}
#else
f(begin, end);
if (begin < end) {
f(begin, end);
}
#endif
}

View File

@ -1,216 +0,0 @@
#pragma once
/**
* Simple registry implementation that uses static variables to
* register object creators during program initialization time.
*/
// NB: This Registry works poorly when you have other namespaces.
// Make all macro invocations from inside the at namespace.
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <functional>
#include <memory>
#include <mutex>
#include <unordered_map>
#include <string>
#include <vector>
#include <ATen/Backtrace.h>
#include <ATen/ATenGeneral.h>
namespace at {
template <typename KeyType>
inline void PrintOffendingKey(const KeyType& /*key*/) {
printf("[key type printing not supported]\n");
}
template <>
inline void PrintOffendingKey(const std::string& key) {
printf("Offending key: %s.\n", key.c_str());
}
/**
* @brief A template class that allows one to register classes by keys.
*
* The keys are usually a std::string specifying the name, but can be anything that
* can be used in a std::map.
*
* You should most likely not use the Registry class explicitly, but use the
* helper macros below to declare specific registries as well as registering
* objects.
*/
template <class SrcType, class ObjectPtrType, class... Args>
class AT_API Registry {
public:
typedef std::function<ObjectPtrType(Args...)> Creator;
Registry() : registry_() {}
void Register(const SrcType& key, Creator creator) {
// The if statement below is essentially the same as the following line:
// CHECK_EQ(registry_.count(key), 0) << "Key " << key
// << " registered twice.";
// However, CHECK_EQ depends on google logging, and since registration is
// carried out at static initialization time, we do not want to have an
// explicit dependency on glog's initialization function.
std::lock_guard<std::mutex> lock(register_mutex_);
if (registry_.count(key) != 0) {
printf("Key already registered.\n");
PrintOffendingKey(key);
std::exit(1);
}
registry_[key] = creator;
}
void Register(const SrcType& key, Creator creator, const std::string& help_msg) {
Register(key, creator);
help_message_[key] = help_msg;
}
inline bool Has(const SrcType& key) { return (registry_.count(key) != 0); }
ObjectPtrType Create(const SrcType& key, Args... args) {
if (registry_.count(key) == 0) {
// Returns nullptr if the key is not registered.
return nullptr;
}
return registry_[key](args...);
}
/**
* Returns the keys currently registered as a std::vector.
*/
std::vector<SrcType> Keys() {
std::vector<SrcType> keys;
for (const auto& it : registry_) {
keys.push_back(it.first);
}
return keys;
}
const std::unordered_map<SrcType, std::string>& HelpMessage() const {
return help_message_;
}
const char* HelpMessage(const SrcType& key) const {
auto it = help_message_.find(key);
if (it == help_message_.end()) {
return nullptr;
}
return it->second.c_str();
}
private:
std::unordered_map<SrcType, Creator> registry_;
std::unordered_map<SrcType, std::string> help_message_;
std::mutex register_mutex_;
Registry(const Registry&) = delete;
Registry& operator=(const Registry&) = delete;
};
template <class SrcType, class ObjectPtrType, class... Args>
class AT_API Registerer {
public:
Registerer(
const SrcType& key,
Registry<SrcType, ObjectPtrType, Args...>* registry,
typename Registry<SrcType, ObjectPtrType, Args...>::Creator creator,
const std::string& help_msg = "") {
registry->Register(key, creator, help_msg);
}
template <class DerivedType>
static ObjectPtrType DefaultCreator(Args... args) {
// TODO(jiayq): old versions of NVCC does not handle make_unique well
// so we are forced to use a unique_ptr constructor here. Check if it is
// fine to use make_unique in the future.
// return make_unique<DerivedType>(args...);
return ObjectPtrType(new DerivedType(args...));
}
};
/**
* AT_ANONYMOUS_VARIABLE(str) introduces an identifier starting with
* str and ending with a number that varies with the line.
* Pretty much a copy from 'folly/Preprocessor.h'
*/
#define AT_CONCATENATE_IMPL(s1, s2) s1##s2
#define AT_CONCATENATE(s1, s2) AT_CONCATENATE_IMPL(s1, s2)
#ifdef __COUNTER__
#define AT_ANONYMOUS_VARIABLE(str) AT_CONCATENATE(str, __COUNTER__)
#else
#define AT_ANONYMOUS_VARIABLE(str) AT_CONCATENATE(str, __LINE__)
#endif
/**
* AT_DECLARE_TYPED_REGISTRY is a macro that expands to a function
* declaration, as well as creating a convenient typename for its corresponding
* registerer.
*/
#define AT_DECLARE_TYPED_REGISTRY( \
RegistryName, SrcType, ObjectType, PtrType, ...) \
AT_API Registry<SrcType, PtrType<ObjectType>, __VA_ARGS__>* RegistryName(); \
typedef Registerer<SrcType, PtrType<ObjectType>, __VA_ARGS__> \
Registerer##RegistryName; \
extern template class Registerer<SrcType, PtrType<ObjectType>, __VA_ARGS__>;
#define AT_DEFINE_TYPED_REGISTRY( \
RegistryName, SrcType, ObjectType, PtrType, ...) \
Registry<SrcType, PtrType<ObjectType>, __VA_ARGS__>* RegistryName() { \
static Registry<SrcType, PtrType<ObjectType>, __VA_ARGS__>* registry = \
new Registry<SrcType, PtrType<ObjectType>, __VA_ARGS__>(); \
return registry; \
} \
template class Registerer<SrcType, PtrType<ObjectType>, __VA_ARGS__>;
// Note(Yangqing): The __VA_ARGS__ below allows one to specify a templated
// creator with comma in its templated arguments.
#define AT_REGISTER_TYPED_CREATOR(RegistryName, key, ...) \
namespace { \
Registerer##RegistryName AT_ANONYMOUS_VARIABLE(g_##RegistryName)( \
key, RegistryName(), __VA_ARGS__); \
}
#define AT_REGISTER_TYPED_CLASS(RegistryName, key, ...) \
namespace { \
Registerer##RegistryName AT_ANONYMOUS_VARIABLE(g_##RegistryName)( \
key, \
RegistryName(), \
Registerer##RegistryName::DefaultCreator<__VA_ARGS__>, \
::at::demangle_type<__VA_ARGS__>()); \
}
// AT_DECLARE_REGISTRY and AT_DEFINE_REGISTRY are hard-wired to use std::string
// as the key
// type, because that is the most commonly used cases.
#define AT_DECLARE_REGISTRY(RegistryName, ObjectType, ...) \
AT_DECLARE_TYPED_REGISTRY( \
RegistryName, std::string, ObjectType, std::unique_ptr, __VA_ARGS__)
#define AT_DEFINE_REGISTRY(RegistryName, ObjectType, ...) \
AT_DEFINE_TYPED_REGISTRY( \
RegistryName, std::string, ObjectType, std::unique_ptr, __VA_ARGS__)
#define AT_DECLARE_SHARED_REGISTRY(RegistryName, ObjectType, ...) \
AT_DECLARE_TYPED_REGISTRY( \
RegistryName, std::string, ObjectType, std::shared_ptr, __VA_ARGS__)
#define AT_DEFINE_SHARED_REGISTRY(RegistryName, ObjectType, ...) \
AT_DEFINE_TYPED_REGISTRY( \
RegistryName, std::string, ObjectType, std::shared_ptr, __VA_ARGS__)
// AT_REGISTER_CREATOR and AT_REGISTER_CLASS are hard-wired to use std::string
// as the key
// type, because that is the most commonly used cases.
#define AT_REGISTER_CREATOR(RegistryName, key, ...) \
AT_REGISTER_TYPED_CREATOR(RegistryName, #key, __VA_ARGS__)
#define AT_REGISTER_CLASS(RegistryName, key, ...) \
AT_REGISTER_TYPED_CLASS(RegistryName, #key, __VA_ARGS__)
} // namespace at

View File

@ -1,58 +0,0 @@
#pragma once
#include <atomic>
namespace at {
// base class for refcounted things, allows for collects of generic
// refcounted objects that include tensors
struct Retainable {
Retainable(): refcount(1), weak_refcount(1) {}
void retain() {
++refcount;
}
void release() {
if(--refcount == 0) {
// If we know that this is the last reference then we can skip
// all the decrements and release_resources().
if (weak_refcount == 1) {
delete this;
} else {
release_resources();
weak_release();
}
}
}
void weak_retain() {
++weak_refcount;
}
void weak_release() {
if (--weak_refcount == 0) {
delete this;
}
}
bool weak_lock() {
for (;;) {
auto current_refcount = refcount.load();
if (current_refcount == 0) return false;
if (refcount.compare_exchange_strong(current_refcount, current_refcount + 1)) break;
}
return true;
}
uint32_t use_count() const {
return refcount.load();
}
uint32_t weak_use_count() const {
return weak_refcount.load();
}
virtual void release_resources() {};
virtual ~Retainable() {}
private:
// INVARIANT: once refcount reaches 0 it can never go up
// INVARIANT: weak_refcount = number of weak references + (refcount > 0 ? 1 : 0)
std::atomic<uint32_t> refcount;
std::atomic<uint32_t> weak_refcount;
};
}

View File

@ -1,21 +0,0 @@
#include "ATen/Config.h"
#include "ATen/Scalar.h"
#include <TH/TH.h>
#include "ATen/Tensor.h"
#include "ATen/Context.h"
namespace at {
Tensor Scalar::toTensor() const {
if (Tag::HAS_t == tag) {
return Tensor(t);
} else if (Tag::HAS_d == tag) {
return CPU(kDouble).scalarTensor(*this);
} else {
assert(Tag::HAS_i == tag);
return CPU(kLong).scalarTensor(*this);
}
}
}

View File

@ -1,103 +1,3 @@
#pragma once
#include <assert.h>
#include <stdint.h>
#include <stdexcept>
#include <string>
#include <utility>
#include "ATen/ATenGeneral.h"
#include "ATen/Half.h"
#include "ATen/ScalarType.h"
#include "ATen/TensorBase.h"
#include "ATen/Utils.h"
namespace at {
struct Tensor;
class AT_API Scalar {
public:
Scalar() : Scalar(int64_t(0)) {}
explicit Scalar(const detail::TensorBase & t)
: tag(Tag::HAS_t), t(t) {
AT_CHECK(t.defined(), "Attempting to create a Scalar from an undefined tensor");
AT_CHECK(t.dim() == 0, "Attempting to create a Scalar from a ", t.dim(), " dim tensor");
}
#define DEFINE_IMPLICIT_CTOR(type,name,member) \
Scalar(type vv) \
: tag(Tag::HAS_##member) { \
v . member = convert<decltype(v.member),type>(vv); \
}
AT_FORALL_SCALAR_TYPES(DEFINE_IMPLICIT_CTOR)
#undef DEFINE_IMPLICIT_CTOR
// return a new scalar that is guarenteed to be not backed by a tensor.
Scalar local() const {
if (Tag::HAS_t != tag) {
return *this;
}
return t.pImpl->localScalar();
}
#define DEFINE_ACCESSOR(type,name,member) \
type to##name () const { \
if (Tag::HAS_t == tag) { \
return local().to##name(); \
} else if (Tag::HAS_d == tag) { \
return checked_convert<type, double>(v.d, #type); \
} else { \
return checked_convert<type, int64_t>(v.i, #type); \
} \
}
Tensor toTensor() const;
AT_FORALL_SCALAR_TYPES(DEFINE_ACCESSOR)
//also support scalar.to<int64_t>();
template<typename T>
T to();
#undef DEFINE_ACCESSOR
bool isFloatingPoint() const {
return Tag::HAS_d == tag;
}
bool isIntegral() const {
return Tag::HAS_i == tag;
}
bool isBackedByTensor() const {
return Tag::HAS_t == tag;
}
private:
enum class Tag { HAS_d, HAS_i, HAS_t };
Tag tag;
union {
double d;
int64_t i;
} v;
detail::TensorBase t;
friend struct Type;
};
// define the scalar.to<int64_t>() specializations
template<typename T>
inline T Scalar::to() {
throw std::runtime_error("to() cast to unexpected type.");
}
#define DEFINE_TO(T,name,_) \
template<> \
inline T Scalar::to<T>() { \
return to##name(); \
}
AT_FORALL_SCALAR_TYPES(DEFINE_TO)
#undef DEFINE_TO
}
#include <ATen/core/Scalar.h>

20
aten/src/ATen/ScalarOps.h Normal file
View File

@ -0,0 +1,20 @@
#pragma once
#include <c10/core/Scalar.h>
#include "ATen/Tensor.h"
// This is in the c10 namespace because we use ADL to find the functions in it.
namespace c10 {
// FIXME: this should be (and was) Scalar::toTensor, but there is currently no way
// to implement this without going through Derived Types (which are not part of core).
inline at::Tensor scalar_to_tensor(Scalar s) {
if (s.isFloatingPoint()) {
return at::CPU(kDouble).scalarTensor(s);
} else {
AT_ASSERT(s.isIntegral());
return at::CPU(kLong).scalarTensor(s);
}
}
}

View File

@ -1,170 +1,4 @@
#pragma once
#include <stdint.h>
#include "ATen/ArrayRef.h"
#include "ATen/ATenGeneral.h"
#include "ATen/Half.h"
namespace at {
// NB: Order matters for this macro; it is relied upon in
// _promoteTypesLookup and probably other places.
#define AT_FORALL_SCALAR_TYPES(_) \
_(uint8_t,Byte,i) \
_(int8_t,Char,i) \
_(int16_t,Short,i) \
_(int,Int,i) \
_(int64_t,Long,i) \
_(at::Half,Half,d) \
_(float,Float,d) \
_(double,Double,d)
#define AT_FORALL_SCALAR_TYPES_EXCEPT_HALF(_) \
_(uint8_t,Byte,i) \
_(int8_t,Char,i) \
_(int16_t,Short,i) \
_(int,Int,i) \
_(int64_t,Long,i) \
_(float,Float,d) \
_(double,Double,d)
enum class ScalarType {
#define DEFINE_ENUM(_1,n,_2) \
n,
AT_FORALL_SCALAR_TYPES(DEFINE_ENUM)
#undef DEFINE_ENUM
Undefined,
NumOptions
};
enum class Backend {
CPU,
CUDA,
SparseCPU,
SparseCUDA,
Undefined,
NumOptions
};
constexpr Backend kCPU = Backend::CPU;
constexpr Backend kCUDA = Backend::CUDA;
constexpr Backend kSparseCPU = Backend::SparseCPU;
constexpr Backend kSparseCUDA = Backend::SparseCUDA;
static inline Backend toSparse(Backend b) {
switch (b) {
case Backend::CPU: return Backend::SparseCPU;
case Backend::CUDA: return Backend::SparseCUDA;
case Backend::SparseCPU: return Backend::SparseCPU;
case Backend::SparseCUDA: return Backend::SparseCUDA;
default: throw std::runtime_error("Unknown backend");
}
}
static inline Backend toDense(Backend b) {
switch (b) {
case Backend::CPU: return Backend::CPU;
case Backend::CUDA: return Backend::CUDA;
case Backend::SparseCPU: return Backend::CPU;
case Backend::SparseCUDA: return Backend::CUDA;
default: throw std::runtime_error("Unknown backend");
}
}
static inline const char * toString(Backend b) {
switch(b) {
case Backend::CPU: return "CPU";
case Backend::CUDA: return "CUDA";
case Backend::SparseCPU: return "SparseCPU";
case Backend::SparseCUDA: return "SparseCUDA";
default: return "UNKNOWN_BACKEND";
}
}
#define DEFINE_CONSTANT(_,name,_2) \
constexpr ScalarType k##name = ScalarType::name;
AT_FORALL_SCALAR_TYPES(DEFINE_CONSTANT)
#undef DEFINE_CONSTANT
static inline const char * toString(ScalarType t) {
#define DEFINE_CASE(_,name,_2) \
case ScalarType:: name : return #name;
switch(t) {
AT_FORALL_SCALAR_TYPES(DEFINE_CASE)
default:
return "UNKNOWN_SCALAR";
}
#undef DEFINE_CASE
}
static inline size_t elementSize(ScalarType t) {
#define CASE_ELEMENTSIZE_CASE(ctype,name,_2) \
case ScalarType:: name : return sizeof(ctype);
switch(t) {
AT_FORALL_SCALAR_TYPES(CASE_ELEMENTSIZE_CASE)
default:
AT_ERROR("Unknown ScalarType");
}
#undef CASE_ELEMENTSIZE_CASE
}
static inline bool isIntegralType(ScalarType t) {
return (t == ScalarType::Byte ||
t == ScalarType::Char ||
t == ScalarType::Int ||
t == ScalarType::Long ||
t == ScalarType::Short);
}
static inline bool isFloatingType(ScalarType t) {
return (t == ScalarType::Double ||
t == ScalarType::Float ||
t == ScalarType::Half);
}
static inline ScalarType promoteTypes(ScalarType a, ScalarType b) {
// This is generated according to NumPy's promote_types
#define u1 ScalarType::Byte
#define i1 ScalarType::Char
#define i2 ScalarType::Short
#define i4 ScalarType::Int
#define i8 ScalarType::Long
#define f2 ScalarType::Half
#define f4 ScalarType::Float
#define f8 ScalarType::Double
#define ud ScalarType::Undefined
static constexpr ScalarType _promoteTypesLookup
[static_cast<int>(ScalarType::NumOptions)]
[static_cast<int>(ScalarType::NumOptions)] = {
/* u1 i1 i2 i4 i8 f2 f4 f8, ud */
/* u1 */ { u1, i2, i2, i4, i8, f2, f4, f8, ud },
/* i1 */ { i2, i1, i2, i4, i8, f2, f4, f8, ud },
/* i2 */ { i2, i2, i2, i4, i8, f4, f4, f8, ud },
/* i4 */ { i4, i4, i4, i4, i8, f8, f4, f8, ud },
/* i8 */ { i8, i8, i8, i8, i8, f8, f4, f8, ud },
/* f2 */ { f2, f2, f4, f8, f8, f2, f4, f8, ud },
/* f4 */ { f4, f4, f4, f4, f4, f4, f4, f8, ud },
/* f8 */ { f8, f8, f8, f8, f8, f8, f8, f8, ud },
/* ud */ { ud, ud, ud, ud, ud, ud, ud, ud, ud },
};
#undef u1
#undef i1
#undef i2
#undef i4
#undef i8
#undef f2
#undef f4
#undef f8
#undef ud
return _promoteTypesLookup[static_cast<int>(a)][static_cast<int>(b)];
}
struct Tensor;
typedef ArrayRef<int64_t> IntList;
typedef ArrayRef<Tensor> TensorList;
} // namespace at
#include <ATen/core/ATenGeneral.h> // for BC reasons
#include <c10/core/Backend.h>
#include <c10/core/ScalarType.h>

View File

@ -1,974 +1,2 @@
//===- llvm/ADT/SmallVector.h - 'Normally small' vectors --------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file defines the SmallVector class.
//
//===----------------------------------------------------------------------===//
// ATen: modified from llvm::SmallVector.
// replaced report_bad_alloc_error with std::bad_alloc
// replaced isPodLike<T> with AT_IS_TRIVIALLY_COPYABLE
// replaced iterator_range constructor with inline Container&& constructor
// removed LLVM_NODISCARD and LLVM_ATTRIBUTE_ALWAYS_INLINE qualifiers
// removed LLVM_UNLIKELY
#pragma once
#include "AlignOf.h"
#include <algorithm>
#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <cstring>
#include <initializer_list>
#include <iterator>
#include <memory>
#include <new>
#include <type_traits>
#include <utility>
#if __GNUG__ && __GNUC__ < 5
#define AT_IS_TRIVIALLY_COPYABLE(T) __has_trivial_copy(T)
#else
#define AT_IS_TRIVIALLY_COPYABLE(T) std::is_trivially_copyable<T>::value
#endif
namespace at {
namespace detail {
// From llvm/Support/MathExtras.h
static inline uint64_t NextPowerOf2(uint64_t A) {
A |= (A >> 1);
A |= (A >> 2);
A |= (A >> 4);
A |= (A >> 8);
A |= (A >> 16);
A |= (A >> 32);
return A + 1;
}
}
/// This is all the non-templated stuff common to all SmallVectors.
class SmallVectorBase {
protected:
void *BeginX, *EndX, *CapacityX;
protected:
SmallVectorBase(void *FirstEl, size_t Size)
: BeginX(FirstEl), EndX(FirstEl), CapacityX((char*)FirstEl+Size) {}
/// This is an implementation of the grow() method which only works
/// on POD-like data types and is out of line to reduce code duplication.
void grow_pod(void *FirstEl, size_t MinSizeInBytes, size_t TSize);
public:
/// This returns size()*sizeof(T).
size_t size_in_bytes() const {
return size_t((char*)EndX - (char*)BeginX);
}
/// capacity_in_bytes - This returns capacity()*sizeof(T).
size_t capacity_in_bytes() const {
return size_t((char*)CapacityX - (char*)BeginX);
}
bool empty() const { return BeginX == EndX; }
};
/// This is the part of SmallVectorTemplateBase which does not depend on whether
/// the type T is a POD. The extra dummy template argument is used by ArrayRef
/// to avoid unnecessarily requiring T to be complete.
template <typename T, typename = void>
class SmallVectorTemplateCommon : public SmallVectorBase {
private:
template <typename, unsigned> friend struct SmallVectorStorage;
// Allocate raw space for N elements of type T. If T has a ctor or dtor, we
// don't want it to be automatically run, so we need to represent the space as
// something else. Use an array of char of sufficient alignment.
using U = AlignedCharArrayUnion<T>;
U FirstEl;
// Space after 'FirstEl' is clobbered, do not add any instance vars after it.
protected:
SmallVectorTemplateCommon(size_t Size) : SmallVectorBase(&FirstEl, Size) {}
void grow_pod(size_t MinSizeInBytes, size_t TSize) {
SmallVectorBase::grow_pod(&FirstEl, MinSizeInBytes, TSize);
}
/// Return true if this is a smallvector which has not had dynamic
/// memory allocated for it.
bool isSmall() const {
return BeginX == static_cast<const void*>(&FirstEl);
}
/// Put this vector in a state of being small.
void resetToSmall() {
BeginX = EndX = CapacityX = &FirstEl;
}
void setEnd(T *P) { this->EndX = P; }
public:
using size_type = size_t;
using difference_type = ptrdiff_t;
using value_type = T;
using iterator = T *;
using const_iterator = const T *;
using const_reverse_iterator = std::reverse_iterator<const_iterator>;
using reverse_iterator = std::reverse_iterator<iterator>;
using reference = T &;
using const_reference = const T &;
using pointer = T *;
using const_pointer = const T *;
// forward iterator creation methods.
iterator begin() { return (iterator)this->BeginX; }
const_iterator begin() const { return (const_iterator)this->BeginX; }
iterator end() { return (iterator)this->EndX; }
const_iterator end() const { return (const_iterator)this->EndX; }
protected:
iterator capacity_ptr() { return (iterator)this->CapacityX; }
const_iterator capacity_ptr() const { return (const_iterator)this->CapacityX;}
public:
// reverse iterator creation methods.
reverse_iterator rbegin() { return reverse_iterator(end()); }
const_reverse_iterator rbegin() const{ return const_reverse_iterator(end()); }
reverse_iterator rend() { return reverse_iterator(begin()); }
const_reverse_iterator rend() const { return const_reverse_iterator(begin());}
size_type size() const { return end()-begin(); }
size_type max_size() const { return size_type(-1) / sizeof(T); }
/// Return the total number of elements in the currently allocated buffer.
size_t capacity() const { return capacity_ptr() - begin(); }
/// Return a pointer to the vector's buffer, even if empty().
pointer data() { return pointer(begin()); }
/// Return a pointer to the vector's buffer, even if empty().
const_pointer data() const { return const_pointer(begin()); }
reference operator[](size_type idx) {
assert(idx < size());
return begin()[idx];
}
const_reference operator[](size_type idx) const {
assert(idx < size());
return begin()[idx];
}
reference front() {
assert(!empty());
return begin()[0];
}
const_reference front() const {
assert(!empty());
return begin()[0];
}
reference back() {
assert(!empty());
return end()[-1];
}
const_reference back() const {
assert(!empty());
return end()[-1];
}
};
/// SmallVectorTemplateBase<isPodLike = false> - This is where we put method
/// implementations that are designed to work with non-POD-like T's.
template <typename T, bool isPodLike>
class SmallVectorTemplateBase : public SmallVectorTemplateCommon<T> {
protected:
SmallVectorTemplateBase(size_t Size) : SmallVectorTemplateCommon<T>(Size) {}
static void destroy_range(T *S, T *E) {
while (S != E) {
--E;
E->~T();
}
}
/// Move the range [I, E) into the uninitialized memory starting with "Dest",
/// constructing elements as needed.
template<typename It1, typename It2>
static void uninitialized_move(It1 I, It1 E, It2 Dest) {
std::uninitialized_copy(std::make_move_iterator(I),
std::make_move_iterator(E), Dest);
}
/// Copy the range [I, E) onto the uninitialized memory starting with "Dest",
/// constructing elements as needed.
template<typename It1, typename It2>
static void uninitialized_copy(It1 I, It1 E, It2 Dest) {
std::uninitialized_copy(I, E, Dest);
}
/// Grow the allocated memory (without initializing new elements), doubling
/// the size of the allocated memory. Guarantees space for at least one more
/// element, or MinSize more elements if specified.
void grow(size_t MinSize = 0);
public:
void push_back(const T &Elt) {
if (this->EndX >= this->CapacityX)
this->grow();
::new ((void*) this->end()) T(Elt);
this->setEnd(this->end()+1);
}
void push_back(T &&Elt) {
if (this->EndX >= this->CapacityX)
this->grow();
::new ((void*) this->end()) T(::std::move(Elt));
this->setEnd(this->end()+1);
}
void pop_back() {
this->setEnd(this->end()-1);
this->end()->~T();
}
};
// Define this out-of-line to dissuade the C++ compiler from inlining it.
template <typename T, bool isPodLike>
void SmallVectorTemplateBase<T, isPodLike>::grow(size_t MinSize) {
size_t CurCapacity = this->capacity();
size_t CurSize = this->size();
// Always grow, even from zero.
size_t NewCapacity = size_t(detail::NextPowerOf2(CurCapacity+2));
if (NewCapacity < MinSize)
NewCapacity = MinSize;
T *NewElts = static_cast<T*>(malloc(NewCapacity*sizeof(T)));
if (NewElts == nullptr)
throw std::bad_alloc();
// Move the elements over.
this->uninitialized_move(this->begin(), this->end(), NewElts);
// Destroy the original elements.
destroy_range(this->begin(), this->end());
// If this wasn't grown from the inline copy, deallocate the old space.
if (!this->isSmall())
free(this->begin());
this->setEnd(NewElts+CurSize);
this->BeginX = NewElts;
this->CapacityX = this->begin()+NewCapacity;
}
/// SmallVectorTemplateBase<isPodLike = true> - This is where we put method
/// implementations that are designed to work with POD-like T's.
template <typename T>
class SmallVectorTemplateBase<T, true> : public SmallVectorTemplateCommon<T> {
protected:
SmallVectorTemplateBase(size_t Size) : SmallVectorTemplateCommon<T>(Size) {}
// No need to do a destroy loop for POD's.
static void destroy_range(T *, T *) {}
/// Move the range [I, E) onto the uninitialized memory
/// starting with "Dest", constructing elements into it as needed.
template<typename It1, typename It2>
static void uninitialized_move(It1 I, It1 E, It2 Dest) {
// Just do a copy.
uninitialized_copy(I, E, Dest);
}
/// Copy the range [I, E) onto the uninitialized memory
/// starting with "Dest", constructing elements into it as needed.
template<typename It1, typename It2>
static void uninitialized_copy(It1 I, It1 E, It2 Dest) {
// Arbitrary iterator types; just use the basic implementation.
std::uninitialized_copy(I, E, Dest);
}
/// Copy the range [I, E) onto the uninitialized memory
/// starting with "Dest", constructing elements into it as needed.
template <typename T1, typename T2>
static void uninitialized_copy(
T1 *I, T1 *E, T2 *Dest,
typename std::enable_if<std::is_same<typename std::remove_const<T1>::type,
T2>::value>::type * = nullptr) {
// Use memcpy for PODs iterated by pointers (which includes SmallVector
// iterators): std::uninitialized_copy optimizes to memmove, but we can
// use memcpy here. Note that I and E are iterators and thus might be
// invalid for memcpy if they are equal.
if (I != E)
memcpy(Dest, I, (E - I) * sizeof(T));
}
/// Double the size of the allocated memory, guaranteeing space for at
/// least one more element or MinSize if specified.
void grow(size_t MinSize = 0) {
this->grow_pod(MinSize*sizeof(T), sizeof(T));
}
public:
void push_back(const T &Elt) {
if (this->EndX >= this->CapacityX)
this->grow();
memcpy(this->end(), &Elt, sizeof(T));
this->setEnd(this->end()+1);
}
void pop_back() {
this->setEnd(this->end()-1);
}
};
/// This class consists of common code factored out of the SmallVector class to
/// reduce code duplication based on the SmallVector 'N' template parameter.
template <typename T>
class SmallVectorImpl : public SmallVectorTemplateBase<T, AT_IS_TRIVIALLY_COPYABLE(T)> {
using SuperClass = SmallVectorTemplateBase<T, AT_IS_TRIVIALLY_COPYABLE(T)>;
public:
using iterator = typename SuperClass::iterator;
using const_iterator = typename SuperClass::const_iterator;
using size_type = typename SuperClass::size_type;
protected:
// Default ctor - Initialize to empty.
explicit SmallVectorImpl(unsigned N)
: SmallVectorTemplateBase<T, AT_IS_TRIVIALLY_COPYABLE(T)>(N*sizeof(T)) {
}
public:
SmallVectorImpl(const SmallVectorImpl &) = delete;
~SmallVectorImpl() {
// Destroy the constructed elements in the vector.
this->destroy_range(this->begin(), this->end());
// If this wasn't grown from the inline copy, deallocate the old space.
if (!this->isSmall())
free(this->begin());
}
void clear() {
this->destroy_range(this->begin(), this->end());
this->EndX = this->BeginX;
}
void resize(size_type N) {
if (N < this->size()) {
this->destroy_range(this->begin()+N, this->end());
this->setEnd(this->begin()+N);
} else if (N > this->size()) {
if (this->capacity() < N)
this->grow(N);
auto I = this->end();
for (auto E = this->begin() + N; I != E; ++I)
new (&*I) T();
this->setEnd(this->begin()+N);
}
}
void resize(size_type N, const T &NV) {
if (N < this->size()) {
this->destroy_range(this->begin()+N, this->end());
this->setEnd(this->begin()+N);
} else if (N > this->size()) {
if (this->capacity() < N)
this->grow(N);
std::uninitialized_fill(this->end(), this->begin()+N, NV);
this->setEnd(this->begin()+N);
}
}
void reserve(size_type N) {
if (this->capacity() < N)
this->grow(N);
}
T pop_back_val() {
T Result = ::std::move(this->back());
this->pop_back();
return Result;
}
void swap(SmallVectorImpl &RHS);
/// Add the specified range to the end of the SmallVector.
template <typename in_iter,
typename = typename std::enable_if<std::is_convertible<
typename std::iterator_traits<in_iter>::iterator_category,
std::input_iterator_tag>::value>::type>
void append(in_iter in_start, in_iter in_end) {
size_type NumInputs = std::distance(in_start, in_end);
// Grow allocated space if needed.
if (NumInputs > size_type(this->capacity_ptr()-this->end()))
this->grow(this->size()+NumInputs);
// Copy the new elements over.
this->uninitialized_copy(in_start, in_end, this->end());
this->setEnd(this->end() + NumInputs);
}
/// Add the specified range to the end of the SmallVector.
void append(size_type NumInputs, const T &Elt) {
// Grow allocated space if needed.
if (NumInputs > size_type(this->capacity_ptr()-this->end()))
this->grow(this->size()+NumInputs);
// Copy the new elements over.
std::uninitialized_fill_n(this->end(), NumInputs, Elt);
this->setEnd(this->end() + NumInputs);
}
void append(std::initializer_list<T> IL) {
append(IL.begin(), IL.end());
}
// FIXME: Consider assigning over existing elements, rather than clearing &
// re-initializing them - for all assign(...) variants.
void assign(size_type NumElts, const T &Elt) {
clear();
if (this->capacity() < NumElts)
this->grow(NumElts);
this->setEnd(this->begin()+NumElts);
std::uninitialized_fill(this->begin(), this->end(), Elt);
}
template <typename in_iter,
typename = typename std::enable_if<std::is_convertible<
typename std::iterator_traits<in_iter>::iterator_category,
std::input_iterator_tag>::value>::type>
void assign(in_iter in_start, in_iter in_end) {
clear();
append(in_start, in_end);
}
void assign(std::initializer_list<T> IL) {
clear();
append(IL);
}
iterator erase(const_iterator CI) {
// Just cast away constness because this is a non-const member function.
iterator I = const_cast<iterator>(CI);
assert(I >= this->begin() && "Iterator to erase is out of bounds.");
assert(I < this->end() && "Erasing at past-the-end iterator.");
iterator N = I;
// Shift all elts down one.
std::move(I+1, this->end(), I);
// Drop the last elt.
this->pop_back();
return(N);
}
iterator erase(const_iterator CS, const_iterator CE) {
// Just cast away constness because this is a non-const member function.
iterator S = const_cast<iterator>(CS);
iterator E = const_cast<iterator>(CE);
assert(S >= this->begin() && "Range to erase is out of bounds.");
assert(S <= E && "Trying to erase invalid range.");
assert(E <= this->end() && "Trying to erase past the end.");
iterator N = S;
// Shift all elts down.
iterator I = std::move(E, this->end(), S);
// Drop the last elts.
this->destroy_range(I, this->end());
this->setEnd(I);
return(N);
}
iterator insert(iterator I, T &&Elt) {
if (I == this->end()) { // Important special case for empty vector.
this->push_back(::std::move(Elt));
return this->end()-1;
}
assert(I >= this->begin() && "Insertion iterator is out of bounds.");
assert(I <= this->end() && "Inserting past the end of the vector.");
if (this->EndX >= this->CapacityX) {
size_t EltNo = I-this->begin();
this->grow();
I = this->begin()+EltNo;
}
::new ((void*) this->end()) T(::std::move(this->back()));
// Push everything else over.
std::move_backward(I, this->end()-1, this->end());
this->setEnd(this->end()+1);
// If we just moved the element we're inserting, be sure to update
// the reference.
T *EltPtr = &Elt;
if (I <= EltPtr && EltPtr < this->EndX)
++EltPtr;
*I = ::std::move(*EltPtr);
return I;
}
iterator insert(iterator I, const T &Elt) {
if (I == this->end()) { // Important special case for empty vector.
this->push_back(Elt);
return this->end()-1;
}
assert(I >= this->begin() && "Insertion iterator is out of bounds.");
assert(I <= this->end() && "Inserting past the end of the vector.");
if (this->EndX >= this->CapacityX) {
size_t EltNo = I-this->begin();
this->grow();
I = this->begin()+EltNo;
}
::new ((void*) this->end()) T(std::move(this->back()));
// Push everything else over.
std::move_backward(I, this->end()-1, this->end());
this->setEnd(this->end()+1);
// If we just moved the element we're inserting, be sure to update
// the reference.
const T *EltPtr = &Elt;
if (I <= EltPtr && EltPtr < this->EndX)
++EltPtr;
*I = *EltPtr;
return I;
}
iterator insert(iterator I, size_type NumToInsert, const T &Elt) {
// Convert iterator to elt# to avoid invalidating iterator when we reserve()
size_t InsertElt = I - this->begin();
if (I == this->end()) { // Important special case for empty vector.
append(NumToInsert, Elt);
return this->begin()+InsertElt;
}
assert(I >= this->begin() && "Insertion iterator is out of bounds.");
assert(I <= this->end() && "Inserting past the end of the vector.");
// Ensure there is enough space.
reserve(this->size() + NumToInsert);
// Uninvalidate the iterator.
I = this->begin()+InsertElt;
// If there are more elements between the insertion point and the end of the
// range than there are being inserted, we can use a simple approach to
// insertion. Since we already reserved space, we know that this won't
// reallocate the vector.
if (size_t(this->end()-I) >= NumToInsert) {
T *OldEnd = this->end();
append(std::move_iterator<iterator>(this->end() - NumToInsert),
std::move_iterator<iterator>(this->end()));
// Copy the existing elements that get replaced.
std::move_backward(I, OldEnd-NumToInsert, OldEnd);
std::fill_n(I, NumToInsert, Elt);
return I;
}
// Otherwise, we're inserting more elements than exist already, and we're
// not inserting at the end.
// Move over the elements that we're about to overwrite.
T *OldEnd = this->end();
this->setEnd(this->end() + NumToInsert);
size_t NumOverwritten = OldEnd-I;
this->uninitialized_move(I, OldEnd, this->end()-NumOverwritten);
// Replace the overwritten part.
std::fill_n(I, NumOverwritten, Elt);
// Insert the non-overwritten middle part.
std::uninitialized_fill_n(OldEnd, NumToInsert-NumOverwritten, Elt);
return I;
}
template <typename ItTy,
typename = typename std::enable_if<std::is_convertible<
typename std::iterator_traits<ItTy>::iterator_category,
std::input_iterator_tag>::value>::type>
iterator insert(iterator I, ItTy From, ItTy To) {
// Convert iterator to elt# to avoid invalidating iterator when we reserve()
size_t InsertElt = I - this->begin();
if (I == this->end()) { // Important special case for empty vector.
append(From, To);
return this->begin()+InsertElt;
}
assert(I >= this->begin() && "Insertion iterator is out of bounds.");
assert(I <= this->end() && "Inserting past the end of the vector.");
size_t NumToInsert = std::distance(From, To);
// Ensure there is enough space.
reserve(this->size() + NumToInsert);
// Uninvalidate the iterator.
I = this->begin()+InsertElt;
// If there are more elements between the insertion point and the end of the
// range than there are being inserted, we can use a simple approach to
// insertion. Since we already reserved space, we know that this won't
// reallocate the vector.
if (size_t(this->end()-I) >= NumToInsert) {
T *OldEnd = this->end();
append(std::move_iterator<iterator>(this->end() - NumToInsert),
std::move_iterator<iterator>(this->end()));
// Copy the existing elements that get replaced.
std::move_backward(I, OldEnd-NumToInsert, OldEnd);
std::copy(From, To, I);
return I;
}
// Otherwise, we're inserting more elements than exist already, and we're
// not inserting at the end.
// Move over the elements that we're about to overwrite.
T *OldEnd = this->end();
this->setEnd(this->end() + NumToInsert);
size_t NumOverwritten = OldEnd-I;
this->uninitialized_move(I, OldEnd, this->end()-NumOverwritten);
// Replace the overwritten part.
for (T *J = I; NumOverwritten > 0; --NumOverwritten) {
*J = *From;
++J; ++From;
}
// Insert the non-overwritten middle part.
this->uninitialized_copy(From, To, OldEnd);
return I;
}
void insert(iterator I, std::initializer_list<T> IL) {
insert(I, IL.begin(), IL.end());
}
template <typename... ArgTypes> void emplace_back(ArgTypes &&... Args) {
if (this->EndX >= this->CapacityX)
this->grow();
::new ((void *)this->end()) T(std::forward<ArgTypes>(Args)...);
this->setEnd(this->end() + 1);
}
SmallVectorImpl &operator=(const SmallVectorImpl &RHS);
SmallVectorImpl &operator=(SmallVectorImpl &&RHS);
bool operator==(const SmallVectorImpl &RHS) const {
if (this->size() != RHS.size()) return false;
return std::equal(this->begin(), this->end(), RHS.begin());
}
bool operator!=(const SmallVectorImpl &RHS) const {
return !(*this == RHS);
}
bool operator<(const SmallVectorImpl &RHS) const {
return std::lexicographical_compare(this->begin(), this->end(),
RHS.begin(), RHS.end());
}
/// Set the array size to \p N, which the current array must have enough
/// capacity for.
///
/// This does not construct or destroy any elements in the vector.
///
/// Clients can use this in conjunction with capacity() to write past the end
/// of the buffer when they know that more elements are available, and only
/// update the size later. This avoids the cost of value initializing elements
/// which will only be overwritten.
void set_size(size_type N) {
assert(N <= this->capacity());
this->setEnd(this->begin() + N);
}
};
template <typename T>
void SmallVectorImpl<T>::swap(SmallVectorImpl<T> &RHS) {
if (this == &RHS) return;
// We can only avoid copying elements if neither vector is small.
if (!this->isSmall() && !RHS.isSmall()) {
std::swap(this->BeginX, RHS.BeginX);
std::swap(this->EndX, RHS.EndX);
std::swap(this->CapacityX, RHS.CapacityX);
return;
}
if (RHS.size() > this->capacity())
this->grow(RHS.size());
if (this->size() > RHS.capacity())
RHS.grow(this->size());
// Swap the shared elements.
size_t NumShared = this->size();
if (NumShared > RHS.size()) NumShared = RHS.size();
for (size_type i = 0; i != NumShared; ++i)
std::swap((*this)[i], RHS[i]);
// Copy over the extra elts.
if (this->size() > RHS.size()) {
size_t EltDiff = this->size() - RHS.size();
this->uninitialized_copy(this->begin()+NumShared, this->end(), RHS.end());
RHS.setEnd(RHS.end()+EltDiff);
this->destroy_range(this->begin()+NumShared, this->end());
this->setEnd(this->begin()+NumShared);
} else if (RHS.size() > this->size()) {
size_t EltDiff = RHS.size() - this->size();
this->uninitialized_copy(RHS.begin()+NumShared, RHS.end(), this->end());
this->setEnd(this->end() + EltDiff);
this->destroy_range(RHS.begin()+NumShared, RHS.end());
RHS.setEnd(RHS.begin()+NumShared);
}
}
template <typename T>
SmallVectorImpl<T> &SmallVectorImpl<T>::
operator=(const SmallVectorImpl<T> &RHS) {
// Avoid self-assignment.
if (this == &RHS) return *this;
// If we already have sufficient space, assign the common elements, then
// destroy any excess.
size_t RHSSize = RHS.size();
size_t CurSize = this->size();
if (CurSize >= RHSSize) {
// Assign common elements.
iterator NewEnd;
if (RHSSize)
NewEnd = std::copy(RHS.begin(), RHS.begin()+RHSSize, this->begin());
else
NewEnd = this->begin();
// Destroy excess elements.
this->destroy_range(NewEnd, this->end());
// Trim.
this->setEnd(NewEnd);
return *this;
}
// If we have to grow to have enough elements, destroy the current elements.
// This allows us to avoid copying them during the grow.
// FIXME: don't do this if they're efficiently moveable.
if (this->capacity() < RHSSize) {
// Destroy current elements.
this->destroy_range(this->begin(), this->end());
this->setEnd(this->begin());
CurSize = 0;
this->grow(RHSSize);
} else if (CurSize) {
// Otherwise, use assignment for the already-constructed elements.
std::copy(RHS.begin(), RHS.begin()+CurSize, this->begin());
}
// Copy construct the new elements in place.
this->uninitialized_copy(RHS.begin()+CurSize, RHS.end(),
this->begin()+CurSize);
// Set end.
this->setEnd(this->begin()+RHSSize);
return *this;
}
template <typename T>
SmallVectorImpl<T> &SmallVectorImpl<T>::operator=(SmallVectorImpl<T> &&RHS) {
// Avoid self-assignment.
if (this == &RHS) return *this;
// If the RHS isn't small, clear this vector and then steal its buffer.
if (!RHS.isSmall()) {
this->destroy_range(this->begin(), this->end());
if (!this->isSmall()) free(this->begin());
this->BeginX = RHS.BeginX;
this->EndX = RHS.EndX;
this->CapacityX = RHS.CapacityX;
RHS.resetToSmall();
return *this;
}
// If we already have sufficient space, assign the common elements, then
// destroy any excess.
size_t RHSSize = RHS.size();
size_t CurSize = this->size();
if (CurSize >= RHSSize) {
// Assign common elements.
iterator NewEnd = this->begin();
if (RHSSize)
NewEnd = std::move(RHS.begin(), RHS.end(), NewEnd);
// Destroy excess elements and trim the bounds.
this->destroy_range(NewEnd, this->end());
this->setEnd(NewEnd);
// Clear the RHS.
RHS.clear();
return *this;
}
// If we have to grow to have enough elements, destroy the current elements.
// This allows us to avoid copying them during the grow.
// FIXME: this may not actually make any sense if we can efficiently move
// elements.
if (this->capacity() < RHSSize) {
// Destroy current elements.
this->destroy_range(this->begin(), this->end());
this->setEnd(this->begin());
CurSize = 0;
this->grow(RHSSize);
} else if (CurSize) {
// Otherwise, use assignment for the already-constructed elements.
std::move(RHS.begin(), RHS.begin()+CurSize, this->begin());
}
// Move-construct the new elements in place.
this->uninitialized_move(RHS.begin()+CurSize, RHS.end(),
this->begin()+CurSize);
// Set end.
this->setEnd(this->begin()+RHSSize);
RHS.clear();
return *this;
}
/// Storage for the SmallVector elements which aren't contained in
/// SmallVectorTemplateCommon. There are 'N-1' elements here. The remaining '1'
/// element is in the base class. This is specialized for the N=1 and N=0 cases
/// to avoid allocating unnecessary storage.
template <typename T, unsigned N>
struct SmallVectorStorage {
typename SmallVectorTemplateCommon<T>::U InlineElts[N - 1];
};
template <typename T> struct SmallVectorStorage<T, 1> {};
template <typename T> struct SmallVectorStorage<T, 0> {};
/// This is a 'vector' (really, a variable-sized array), optimized
/// for the case when the array is small. It contains some number of elements
/// in-place, which allows it to avoid heap allocation when the actual number of
/// elements is below that threshold. This allows normal "small" cases to be
/// fast without losing generality for large inputs.
///
/// Note that this does not attempt to be exception safe.
///
template <typename T, unsigned N>
class SmallVector : public SmallVectorImpl<T> {
/// Inline space for elements which aren't stored in the base class.
SmallVectorStorage<T, N> Storage;
public:
SmallVector() : SmallVectorImpl<T>(N) {}
explicit SmallVector(size_t Size, const T &Value = T())
: SmallVectorImpl<T>(N) {
this->assign(Size, Value);
}
template <typename ItTy,
typename = typename std::enable_if<std::is_convertible<
typename std::iterator_traits<ItTy>::iterator_category,
std::input_iterator_tag>::value>::type>
SmallVector(ItTy S, ItTy E) : SmallVectorImpl<T>(N) {
this->append(S, E);
}
template <typename Container>
explicit SmallVector(Container &&c) : SmallVectorImpl<T>(N) {
this->append(c.begin(), c.end());
}
SmallVector(std::initializer_list<T> IL) : SmallVectorImpl<T>(N) {
this->assign(IL);
}
SmallVector(const SmallVector &RHS) : SmallVectorImpl<T>(N) {
if (!RHS.empty())
SmallVectorImpl<T>::operator=(RHS);
}
const SmallVector &operator=(const SmallVector &RHS) {
SmallVectorImpl<T>::operator=(RHS);
return *this;
}
SmallVector(SmallVector &&RHS) : SmallVectorImpl<T>(N) {
if (!RHS.empty())
SmallVectorImpl<T>::operator=(::std::move(RHS));
}
template<typename Container>
const SmallVector &operator=(const Container &RHS) {
this->assign(RHS.begin(), RHS.end());
return *this;
}
SmallVector(SmallVectorImpl<T> &&RHS) : SmallVectorImpl<T>(N) {
if (!RHS.empty())
SmallVectorImpl<T>::operator=(::std::move(RHS));
}
const SmallVector &operator=(SmallVector &&RHS) {
SmallVectorImpl<T>::operator=(::std::move(RHS));
return *this;
}
const SmallVector &operator=(SmallVectorImpl<T> &&RHS) {
SmallVectorImpl<T>::operator=(::std::move(RHS));
return *this;
}
const SmallVector &operator=(std::initializer_list<T> IL) {
this->assign(IL);
return *this;
}
};
template <typename T, unsigned N>
inline size_t capacity_in_bytes(const SmallVector<T, N> &X) {
return X.capacity_in_bytes();
}
} // end namespace at
namespace std {
/// Implement std::swap in terms of SmallVector swap.
template<typename T>
inline void
swap(at::SmallVectorImpl<T> &LHS, at::SmallVectorImpl<T> &RHS) {
LHS.swap(RHS);
}
/// Implement std::swap in terms of SmallVector swap.
template<typename T, unsigned N>
inline void
swap(at::SmallVector<T, N> &LHS, at::SmallVector<T, N> &RHS) {
LHS.swap(RHS);
}
} // end namespace std
#include <c10/util/SmallVector.h>

View File

@ -1,85 +1,117 @@
#include <ATen/ATen.h>
#include <ATen/SparseTensorImpl.h>
#include <ATen/InitialTensorOptions.h>
#include <ATen/core/LegacyTypeDispatch.h>
namespace at {
namespace {
DeviceType sparseTensorIdToDeviceType(TensorTypeId type_id) {
if (type_id == SparseCPUTensorId()) {
return kCPU;
} else if (type_id == SparseCUDATensorId()) {
return kCUDA;
} else {
AT_ERROR("Cannot construct SparseTensor with non-sparse tensor type ID ", type_id);
}
}
}
// An empty dense tensor defaults to a 1-dimensional tensor of size [0]
// (recall, it is not a 0-dimensional tensor, because such a tensor would
// a scalar and have one element)
//
// Thus, an empty sparse tensor should be a 1-dimensional tensor of size [0].
// Furthermore, we have dim == sparseDims + denseDims; since this is a sparse
// tensor, let us say that an empty sparse tensor has sparseDims == 1 and
// denseDims == 0. (There is a degree of freedom here, but given that this
// is a sparse dimension, it seems reasonable to demand that sparseDims > 0).
// Furthermore, we have dim == sparse_dim + dense_dim; since this is a sparse
// tensor, let us say that an empty sparse tensor has sparse_dim == 1 and
// dense_dim == 0. (There is a degree of freedom here, but given that this
// is a sparse dimension, it seems reasonable to demand that sparse_dim > 0).
//
// In an ideal world, this would then mean we allocate a [1,0] size indices
// tensor and a [0] size values tensor for such an empty tensor. However,
// we don't currently support zero-size dimensions, so we can't actually
// do this; so we just allocate zero-size tensors for everything.
SparseTensorImpl::SparseTensorImpl(Type * type)
: TensorImpl(type)
// This means that we allocate a [1,0] size indices tensor and a [0] size
// values tensor for such an empty tensor.
SparseTensorImpl::SparseTensorImpl(at::TensorTypeId type_id, const caffe2::TypeMeta& data_type)
: TensorImpl(type_id, data_type, nullptr, false)
, size_{0}
, sparseDims_(1)
, denseDims_(0)
, indices_(type->toDense().toScalarType(ScalarType::Long).tensor())
, values_(type->toDense().tensor()) {
AT_ASSERT(type->is_sparse());
}
, sparse_dim_(1)
, dense_dim_(0)
, indices_(at::empty({1, 0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(ScalarType::Long)))
, values_(at::empty({0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(data_type))) {}
const char * SparseTensorImpl::toString() const {
// TODO: also give back type information
return "SparseTensor";
}
IntList SparseTensorImpl::sizes() const {
return size_;
}
IntList SparseTensorImpl::strides() const {
AT_ERROR("sparse tensors do not have strides");
}
int64_t SparseTensorImpl::dim() const {
return sparseDims_ + denseDims_;
bool SparseTensorImpl::is_contiguous() const {
AT_ERROR("sparse tensors do not have is_contiguous");
}
Scalar SparseTensorImpl::localScalar() {
int64_t n = numel();
AT_CHECK(n == 1, "a Tensor with ", n, " elements cannot be converted to Scalar");
if (nnz_ == 0) return Scalar(0);
if (coalesced_) return values_.pImpl->localScalar();
// You have a non-coalesced scalar sparse tensor?! Wow! Have
// a cookie.
return values_.sum().pImpl->localScalar();
int64_t SparseTensorImpl::size(int64_t d) const {
d = at::maybe_wrap_dim(d, dim(), false);
return size_[d];
}
void * SparseTensorImpl::unsafeGetTH(bool retain) {
AT_ERROR("unsafeGetTH not supported for new style TensorImpl");
int64_t SparseTensorImpl::stride(int64_t d) const {
AT_ERROR("sparse tensors do not have strides");
}
std::unique_ptr<Storage> SparseTensorImpl::storage() {
AT_ERROR("sparse tensors do not have storage");
void SparseTensorImpl::resize_dim(int64_t ndim) {
AT_ERROR("sparse tensors do not have resize_dim");
}
void SparseTensorImpl::set_size(int64_t dim, int64_t new_size) {
AT_ERROR("sparse tensors do not have set_size");
}
void SparseTensorImpl::set_stride(int64_t dim, int64_t new_stride) {
AT_ERROR("sparse tensors do not have set_stride");
}
void SparseTensorImpl::set_storage_offset(int64_t storage_offset) {
AT_ERROR("sparse tensors do not have set_storage_offset");
}
void SparseTensorImpl::set_indices_and_values(const Tensor& indices, const Tensor& values) {
// TODO: Explicit empty test is needed because we don't handle size zero
// dimensions at the moment
bool empty = values.numel() == 0;
AT_CHECK(values.type().toSparse() == type(), "values type must match sparse tensor type");
int64_t SparseTensorImpl::dim() const {
return sparse_dim_ + dense_dim_;
}
TensorImpl* SparseTensorImpl::maybe_zero_dim(bool condition_when_zero_dim) {
AT_CHECK(condition_when_zero_dim == (dim() == 0),
"Attempted to maybe_zero_dim on a SparseTensorImpl to ", condition_when_zero_dim,
" but the SparseTensor's dim() is ", dim(), " and SparseTensors do not support"
" changing dimensionality via maybe_zero_dim");
return this;
}
const Storage& SparseTensorImpl::storage() const {
AT_ERROR("sparse tensors do not have storage");
}
int64_t SparseTensorImpl::storage_offset() const {
AT_ERROR("sparse tensors do not have storage");
}
void SparseTensorImpl::set_indices_and_values_unsafe(const Tensor& indices, const Tensor& values) {
AT_ASSERT(!indices.is_variable() && !values.is_variable()); // They should be plain tensors!
AT_CHECK(!indices.is_sparse(), "expected indices to be a dense tensor, but got indices of layout ", indices.layout());
AT_CHECK(!values.is_sparse(), "expected values to be a dense tensor, but got values of layout ", values.layout());
AT_CHECK(values.type().toSparse() == legacyTensorType(*this), "values type must match sparse tensor type");
AT_CHECK(indices.type().scalarType() == kLong, "indices must be an int64 tensor");
AT_CHECK(indices.type().backend() == values.type().backend(), "backend of indices (", indices.type().backend(), ") must match backend of values (", values.type().backend(), ")");
AT_CHECK(!indices.is_cuda() || indices.get_device() == values.get_device(), "device of indices (", indices.get_device(), ") must match device of values (", values.get_device(), ")");
if (!empty) {
AT_CHECK(indices.dim() == 2, "indices must be nDim x nnz");
AT_CHECK(indices.size(1) == values.size(0), "indices and values must have same nnz");
AT_CHECK(indices.size(0) == sparseDims_, "indices has incorrect first dimension, expected ", sparseDims_, ", got ", indices.size(0));
AT_CHECK(values.dim() == denseDims_ + 1, "values has incorrect number of dimensions, expected ", denseDims_ + 1, ", got ", values.dim());
} else {
AT_CHECK(indices.numel() == 0, "if values is empty, indices must be empty too");
}
AT_CHECK(indices.dim() == 2, "indices must be sparse_dim x nnz, but got: ", indices.sizes());
AT_CHECK(indices.size(1) == values.size(0), "indices and values must have same nnz, but got nnz from indices: ", indices.size(1), ", nnz from values: ", values.size(0));
AT_CHECK(indices.size(0) == sparse_dim_, "indices has incorrect first dimension, expected ", sparse_dim_, ", got ", indices.size(0));
AT_CHECK(values.dim() == dense_dim_ + 1, "values has incorrect number of dimensions, expected ", dense_dim_ + 1, ", got ", values.dim());
auto dense_size_original = sizes().slice(sparse_dim_);
std::vector<int64_t> expected_values_size_vec = {values.size(0)};
expected_values_size_vec.insert(expected_values_size_vec.end(), dense_size_original.begin(), dense_size_original.end());
IntList expected_values_size(expected_values_size_vec);
auto new_values_size = values.sizes();
AT_CHECK(
std::equal(expected_values_size.begin(), expected_values_size.end(), new_values_size.begin()),
"values has incorrect size, expected ", expected_values_size, ", got ", new_values_size
);
indices_ = indices;
values_ = values;
// TODO: Eliminate this ternary when we handle size zero dimensions.
// (Actually, this will "accidentally" work today because all zero-size
// tensors have size [0], and so you'll get 0 when empty is zero; but it's
// more explicit this way.)
nnz_ = empty ? 0 : values.size(0);
coalesced_ = false;
}

View File

@ -1,38 +1,26 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/TensorImpl.h"
#include "ATen/Error.h"
#include "ATen/core/TensorImpl.h"
#include "c10/util/Exception.h"
namespace at {
struct SparseTensorImpl : public TensorImpl {
struct CAFFE2_API SparseTensorImpl : public TensorImpl {
// Stored in COO format, indices + values.
// Ideal INVARIANTS:
// _sparseDims: range [0, len(shape)]; _sparseDims + _denseDims = len(shape)
// _denseDims : range [0, len(shape)]; _sparseDims + _denseDims = len(shape)
// _indices.shape: dimensionality: 2, shape: (_sparseDims, nnz)
// _values.shape: dimensionality: 1 + _denseDims. shape: (nnz, shape[_sparseDims:])
// Actual INVARIANT differences:
// 1) _sparseDims: range [1, len(shape)] (i.e. we don't allow 0 sparse dimensions)
// 2) when nnz = 0, there is strange behavior because we lack 0-dimensional sparse tensors. Namely:
// dimensionality == 0, _sparseDims == 0, _denseDims == 0, _indices.shape == {0}, _values.shape == {0}
// 3) For both _indices.shape and _values.shape, the nnz dimension may be larger than nnz
// 4) For _values.shape, the non-nnz dimensions may be smaller than the corresponding dimension size, e.g.
// a shape (2,3) sparse tensor with _sparseDims == 1, may have _values.shape: (nnz, <=2, <=3).
// INVARIANTS:
// sparse_dim: range [0, len(shape)]; sparse_dim + dense_dim = len(shape)
// dense_dim : range [0, len(shape)]; sparse_dim + dense_dim = len(shape)
// _indices.shape: dimensionality: 2, shape: (sparse_dim, nnz)
// _values.shape: dimensionality: 1 + dense_dim. shape: (nnz, shape[sparse_dim:])
// The true size of the sparse tensor (e.g., if you called to_dense()
// on it). When THTensor merges into TensorImpl, this field
// should move to the parent class.
std::vector<int64_t> size_;
// The number of non-zero elements.
int64_t nnz_ = 0;
int64_t sparseDims_ = 0; // number of sparse dimensions
int64_t denseDims_ = 0; // number of dense dimensions
int64_t sparse_dim_ = 0; // number of sparse dimensions
int64_t dense_dim_ = 0; // number of dense dimensions
Tensor indices_; // always a LongTensor
Tensor values_;
@ -48,58 +36,157 @@ struct SparseTensorImpl : public TensorImpl {
public:
// Public for now...
explicit SparseTensorImpl(Type * type);
explicit SparseTensorImpl(at::TensorTypeId, const caffe2::TypeMeta&);
int64_t nnz() const { return nnz_; }
int64_t sparseDims() const { return sparseDims_; }
int64_t denseDims() const { return denseDims_; }
int64_t nnz() const { return values_.size(0); }
int64_t sparse_dim() const { return sparse_dim_; }
int64_t dense_dim() const { return dense_dim_; }
bool coalesced() const { return coalesced_; }
Tensor indices() const { return indices_; }
Tensor values() const { return values_; }
const char * toString() const override;
IntList sizes() const override;
IntList strides() const override;
bool is_contiguous() const override;
int64_t size(int64_t d) const override;
int64_t stride(int64_t d) const override;
void resize_dim(int64_t ndim) override;
void set_size(int64_t dim, int64_t new_size) override;
void set_stride(int64_t dim, int64_t new_stride) override;
void set_storage_offset(int64_t storage_offset) override;
int64_t dim() const override;
Scalar localScalar() override;
void * unsafeGetTH(bool retain) override;
std::unique_ptr<Storage> storage() override;
TensorImpl* maybe_zero_dim(bool condition_when_zero_dim) override;
const Storage& storage() const override;
int64_t storage_offset() const override;
// Some ops do some manual size fiddling.
// TODO: Figure out a more safe way to provide this functionality
std::vector<int64_t>& _sizes_mut() { return size_; }
// WARNING: This function does NOT preserve invariants of sparseDims/denseDims with
// WARNING: This function does NOT preserve invariants of sparse_dim/dense_dim with
// respect to indices and values
void raw_resize_(int64_t sparseDims, int64_t denseDims, ArrayRef<int64_t> size) {
// UGHHHHH. Legacy special case
if (size.size() == 0) {
size_ = {0};
} else {
size_ = size;
}
sparseDims_ = sparseDims;
denseDims_ = denseDims;
void raw_resize_(int64_t sparse_dim, int64_t dense_dim, IntList size) {
size_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;
refresh_numel();
}
// TODO: I hate these two setters, please get rid of them!!!
void set_indices(const Tensor& indices) {
AT_ASSERT(indices.type().backend() == at::toDense(type().backend()));
AT_ASSERT(indices.type().scalarType() == kLong);
indices_ = indices;
// NOTE: This function preserves invariants of sparse_dim/dense_dim with respect to
// indices and values.
//
// NOTE: This function supports the following cases:
// 1. When we keep the number of dense dimensions unchanged, and NOT shrinking the size of
// any of the dense dimensions.
// 2. When we keep the number of sparse dimensions unchanged, and NOT shrinking the size of
// any of the sparse dimensions.
// 3. When the sparse tensor has zero nnz, in which case we are free to change the shapes of
// both its sparse and dense dimensions.
//
// This function DOESN'T support (and will throw an error) the following cases:
// 1. When we attempt to change the number of sparse dimensions on a non-empty sparse tensor
// (such an operation will invalidate the indices stored).
// 2. When we attempt to change the number of dense dimensions on a non-empty sparse tensor
// (such an operation will behave differently from an equivalent dense tensor's resize method,
// and for API consistency we don't support it).
// 3. When we attempt to shrink the size of any of the dense dimensions on a non-empty sparse tensor
// (such an operation will behave differently from an equivalent dense tensor's resize method,
// and for API consistency we don't support it).
// 4. When we attempt to shrink the size of any of the sparse dimensions on a non-empty sparse tensor
// (this could make some of the stored indices out-of-bound and thus unsafe).
void resize_(int64_t sparse_dim, int64_t dense_dim, IntList size) {
AT_CHECK(sparse_dim + dense_dim == size.size(), "number of dimensions must be sparse_dim (", sparse_dim, ") + dense_dim (", dense_dim, "), but got ", size.size());
if (nnz() > 0) {
auto alt_options_msg = "You could try the following options:\n\
1. If you need an empty sparse tensor of this size, call `x = torch.sparse_coo_tensor(size)`.\n\
2. If you need to resize this tensor, you have the following options:\n\
1. For both sparse and dense dimensions, keep the number of them constant and the size of them non-shrinking, and then try the same call again.\n\
2. Or, create a new sparse tensor with the correct indices and values from this sparse tensor.";
AT_CHECK(sparse_dim == sparse_dim_,
"changing the number of sparse dimensions (from ", sparse_dim_, " to ", sparse_dim, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);
AT_CHECK(dense_dim == dense_dim_,
"changing the number of dense dimensions (from ", dense_dim_, " to ", dense_dim, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);
bool shrinking_sparse_dims = false;
bool shrinking_dense_dim = false;
auto sparse_size_original = sizes().slice(0, sparse_dim);
auto sparse_size_new = size.slice(0, sparse_dim);
for (int i = 0; i < sparse_dim; i++) {
if (sparse_size_new[i] < sparse_size_original[i]) {
shrinking_sparse_dims = true;
break;
}
}
auto dense_size_original = sizes().slice(sparse_dim);
auto dense_size_new = size.slice(sparse_dim);
for (int i = 0; i < dense_dim; i++) {
if (dense_size_new[i] < dense_size_original[i]) {
shrinking_dense_dim = true;
break;
}
}
AT_CHECK(!shrinking_sparse_dims,
"shrinking the size of sparse dimensions (from ", sparse_size_original, " to ", sparse_size_new, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);
AT_CHECK(!shrinking_dense_dim,
"shrinking the size of dense dimensions (from ", dense_size_original, " to ", dense_size_new, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);
}
if ((!size.equals(size_)) || (sparse_dim != sparse_dim_) || (dense_dim != dense_dim_)) {
auto nnz = values().size(0);
std::vector<int64_t> values_size = {nnz};
auto dense_size = size.slice(sparse_dim);
values_size.insert(values_size.end(), dense_size.begin(), dense_size.end());
values_.resize_(values_size);
indices_.resize_({sparse_dim, nnz});
}
size_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;
refresh_numel();
}
void set_values(const Tensor& values) {
AT_ASSERT(values.type().toSparse() == type());
values_ = values;
// NOTE: this function will resize the sparse tensor and also set `indices` and `values` to empty.
void resize_and_clear_(int64_t sparse_dim, int64_t dense_dim, IntList size) {
AT_CHECK(sparse_dim + dense_dim == size.size(), "number of dimensions must be sparse_dim (", sparse_dim, ") + dense_dim (", dense_dim, "), but got ", size.size());
size_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;
auto empty_indices = at::empty({sparse_dim, 0}, indices().options());
std::vector<int64_t> values_size = {0};
auto dense_size = sizes().slice(sparse_dim);
values_size.insert(values_size.end(), dense_size.begin(), dense_size.end());
auto empty_values = at::empty(values_size, values().options());
set_indices_and_values_unsafe(empty_indices, empty_values);
refresh_numel();
}
void set_coalesced(bool coalesced) { coalesced_ = coalesced; }
void set_nnz(int64_t nnz) { nnz_ = nnz; }
// NOTE: this function is only used internally and not exposed to Python frontend
void set_nnz_and_narrow(int64_t new_nnz) {
AT_ASSERT(new_nnz <= nnz());
indices_ = indices_.narrow(1, 0, new_nnz);
values_ = values_.narrow(0, 0, new_nnz);
}
// Takes indices and values and directly puts them into the sparse tensor, no copy.
// NOTE: this function is unsafe because it doesn't check whether any indices are
// out of boundaries of `sizes`, so it should ONLY be used where we know that the
// indices are guaranteed to be within bounds.
// This used to be called THSTensor_(_move)
// NB: This used to be able to avoid a refcount bump, but I was too lazy to
// make it happen
void set_indices_and_values(const Tensor& indices, const Tensor& values);
void set_indices_and_values_unsafe(const Tensor& indices, const Tensor& values);
private:
int64_t get_device_slow() const override {
return values_.get_device();
}
};
} // namespace at

View File

@ -0,0 +1,125 @@
#include <ATen/ATen.h>
#include <ATen/SparseTensorImpl.h>
namespace at { namespace sparse {
// Just for documentary purposes
using SparseTensor = Tensor;
using LongTensor = Tensor;
using IntTensor = Tensor;
using SparseType = Type;
// This is an internal utility function for getting at the SparseTensorImpl,
// so that we can write sparse tensor specific accessors for special fields
// in SparseTensor. You should only use this for writing low level
// setters/getters for SparseTensorImpl fields; otherwise, you should use
// the low level setters/getters that were implemented using this.
//
// This may be called repeatedly, so make sure it's pretty cheap.
inline SparseTensorImpl* get_sparse_impl(const SparseTensor& self) {
AT_ASSERTM(!self.is_variable(), "_internal_get_SparseTensorImpl: should not be a variable");
AT_ASSERTM(self.is_sparse(), "_internal_get_SparseTensorImpl: not a sparse tensor");
return static_cast<SparseTensorImpl*>(self.unsafeGetTensorImpl());
}
// Takes indices and values and directly puts them into the sparse tensor, no
// copy. This used to be called THSTensor_(_move)
inline void alias_into_sparse(const SparseTensor& self, const LongTensor& indices, const Tensor& values) {
get_sparse_impl(self)->set_indices_and_values_unsafe(indices, values);
}
// Take indices and values and makes a (data) copy of them to put into the sparse
// indices/values. This used to be called THSTensor_(_set)
inline void copy_into_sparse(const SparseTensor& self, const LongTensor& indices, const Tensor& values, bool non_blocking) {
alias_into_sparse(self, self._indices().type().copy(indices, non_blocking), self._values().type().copy(values, non_blocking));
}
// TODO: put this into the public API
inline bool is_same_tensor(const Tensor& lhs, const Tensor& rhs) {
return lhs.unsafeGetTensorImpl() == rhs.unsafeGetTensorImpl();
}
inline bool is_same_density(const SparseTensor& self, const SparseTensor& src) {
return self.sparse_dim() == src.sparse_dim() && self.dense_dim() == src.dense_dim();
}
// Give us a new values tensor, with the same dimensionality
// as 'values' but with a new number of non-zero elements.
// TODO: Expose this for real in ATen, some day?
// NB: Doesn't preserve data.
inline Tensor new_values_with_size_of(const Tensor& values, int64_t nnz) {
std::vector<int64_t> size = values.sizes().vec();
size[0] = nnz;
return at::empty(size, values.options());
}
// NOTE [ Flatten Sparse Indices ]
// This helper function flattens a sparse indices tensor (a LongTensor) into a 1D
// indices tensor. E.g.,
// input = [[2, 4, 0],
// [3, 1, 10]]
// full_size = [2, 12]
// output = [ 2 * 12 + 3, 4 * 12 + 1, 0 * 12 + 10 ] = [27, 49, 10]
//
// In other words, assuming that each `indices[i, :]` is a valid index to a
// tensor `t` of shape `full_size`. This returns the corresponding indices to
// the flattened tensor `t.reshape( prod(full_size[:indices.size(0)]), -1 )`.
// if forceClone is true, the result will forced to be a clone of self.
// if force_clone is true, the result will forced to be a clone of self.
inline LongTensor flatten_indices(const Tensor& indices, IntList full_size, bool force_clone = false) {
int64_t sparse_dim = indices.size(0);
if (sparse_dim == 1) {
if (force_clone) {
return indices.squeeze(0).clone();
} else {
return indices.squeeze(0);
}
} else {
std::vector<int64_t> indices_mult_cpu_vec;
indices_mult_cpu_vec.reserve(sparse_dim);
int64_t mult = 1;
for (int64_t i = sparse_dim - 1; i >= 0; i--) {
indices_mult_cpu_vec[i] = mult;
mult *= full_size[i];
}
auto indices_mult_cpu = indices.type().cpu()
.tensorFromBlob(indices_mult_cpu_vec.data(), /*size=*/{sparse_dim, 1});
// NB: must be blocking because this blob may be freed after this closure,
// and non_blocking copy will see garbage.
auto indices_mult = indices_mult_cpu.to(indices.device(), /*non_blocking=*/false);
// Ideally we want matmul but matmul is slow on CPU Long and not implemented
// on CUDA Long. So mul is faster.
return indices.mul(indices_mult).sum(0);
}
}
// Flatten sparse tensor's indices from nD to 1D, similar to NOTE [ Flatten Sparse Indices ],
// except this one allows partial flatten: only flatten on specified dims. Note that
// the flatten indices might be uncoalesced if dims_to_flatten.size() < sparse_dim.
// Also if input indices is already coalesced, the flattened indices will also be sorted.
//
// args:
// indices: sparse tensor indices
// sizes: sparse tensor sizes
// dims_to_flatten: a list of dim index to flatten
//
// Ex1:
// indices = [[2, 4, 0],
// [3, 1, 3]]
// sizes = [2, 12]
// dims_to_flatten = [0, 1]
// new_indices = [ 2 * 12 + 3, 4 * 12 + 1, 0 * 12 + 3 ] = [27, 49, 3]
//
// Ex2:
// dims_to_flatten = [1]
// new_indices = [ 3, 1, 3 ] # uncoalesced
inline LongTensor flatten_indices_by_dims(const LongTensor& indices, const IntList& sizes, const IntList& dims_to_flatten){
LongTensor new_indices = at::zeros({indices.size(1)}, indices.options());
for (auto d : dims_to_flatten) {
new_indices.mul_(sizes[d]);
new_indices.add_(indices.select(0, d));
}
return new_indices;
}
}} // namespace at::sparse

View File

@ -1,41 +1,2 @@
#pragma once
#include "ATen/Scalar.h"
namespace at {
struct Type;
struct Storage {
static const char RESIZABLE = 2;
Storage() {}
Storage(const Storage& other) = delete;
void operator=(const Storage&) = delete;
virtual ~Storage() {};
virtual size_t elementSize() const = 0;
virtual size_t size() const = 0;
virtual void* data() = 0;
virtual const void* data() const = 0;
virtual Storage& retain() = 0;
virtual Storage& free() = 0;
virtual void * unsafeGetTH(bool retain) const = 0;
virtual Storage& resize(int64_t new_size) = 0;
virtual Type & type() const = 0;
virtual int getDevice() const = 0;
virtual const char * toString() const = 0;
virtual Storage& fill(Scalar value) = 0;
virtual Storage& set(size_t ind, Scalar value) = 0;
virtual Storage& fast_set(size_t ind, Scalar value) = 0;
virtual Scalar get(size_t ind) = 0;
virtual Scalar fast_get(size_t ind) = 0;
virtual void set_flag(char flag) = 0;
virtual void clear_flag(char flag) = 0;
};
} // namespace at
#include <c10/core/Storage.h>

View File

@ -1,77 +0,0 @@
#pragma once
#include "TH/TH.h"
#include "TH/THStorage.hpp"
#include "TH/THTypeConversion.hpp"
namespace at {
enum class THLongStorageViewKind {
SIZE,
STRIDE,
LENGTH,
};
// make a fake storage out of a size, pointer pair...
// used as an argument where THSize and THStride are passed into TH
class THLongStorageView {
public:
operator THLongStorage*() {
if (storage.size == 0 && zero_dim_to_null) {
return nullptr;
}
return &storage;
}
/*
// This is done as an enum, and not as static constructors, as there
// is no move/copy constructor for THLongStorageView
static THLongStorageView makeFromSize(ArrayRef<int64_t> ref) {
...
}
static THLongStorageView makeFromLength(ArrayRef<int64_t> ref) {
...
}
*/
THLongStorageView(ArrayRef<int64_t> ref, THLongStorageViewKind kind)
: zero_dim_to_null(false)
{
// zero_dim_to_one converts an empty ArrayRef into [1]
// zero_dim_to_null converts an empty ArrayRef into a null THLongStorage
bool zero_dim_to_one = false;
bool noelem_to_empty = false;
switch (kind) {
case THLongStorageViewKind::SIZE:
zero_dim_to_one = true;
break;
case THLongStorageViewKind::STRIDE:
zero_dim_to_null = true;
break;
case THLongStorageViewKind::LENGTH:
break;
}
if(zero_dim_to_one && ref.size() == 0) {
// make storage of size 0 actually a 1-length storage with 1 element
// so that our 0-dim tensors get allocated as 1-dim inside TH
one = 1;
storage.data_ptr = {&one, kCPU}; // non-owning
storage.size = 1;
} else {
storage.data_ptr = {const_cast<void*>(static_cast<const void*>(ref.data())), kCPU}; // non-owning
storage.size = ref.size();
}
storage.scalar_type = at::CTypeToScalarType<th::from_type<int64_t>>::to();
storage.refcount = 0;
storage.flag = 0;
}
private:
int64_t one;
THLongStorage storage;
bool zero_dim_to_null;
};
}

2
aten/src/ATen/Tensor.h Normal file
View File

@ -0,0 +1,2 @@
#pragma once
#include <ATen/core/Tensor.h>

View File

@ -1,51 +1,2 @@
#pragma once
#include <cstddef>
#include <stdint.h>
#include "ATen/ScalarType.h"
namespace at {
template<typename T, size_t N>
class TensorAccessorBase {
public:
TensorAccessorBase(T * data_, const int64_t * sizes_, const int64_t * strides_)
: data_(data_), sizes_(sizes_), strides_(strides_) {}
IntList sizes() {
return IntList(sizes_,N);
}
IntList strides() {
return IntList(strides_,N);
}
int64_t stride(int64_t i) { return strides()[i]; }
int64_t size(int64_t i) { return sizes()[i]; }
protected:
T * data_;
const int64_t* sizes_;
const int64_t* strides_;
};
template<typename T, size_t N>
class TensorAccessor : public TensorAccessorBase<T,N> {
public:
TensorAccessor(T * data_, const int64_t * sizes_, const int64_t * strides_)
: TensorAccessorBase<T,N>(data_,sizes_,strides_) {}
TensorAccessor<T,N-1> operator[](int64_t i) {
return TensorAccessor<T,N-1>(this->data_ + this->strides_[0]*i,this->sizes_+1,this->strides_+1);
}
};
template<typename T>
class TensorAccessor<T,1> : public TensorAccessorBase<T,1> {
public:
TensorAccessor(T * data_, const int64_t * sizes_, const int64_t * strides_)
: TensorAccessorBase<T,1>(data_,sizes_,strides_) {}
T & operator[](int64_t i) {
return this->data_[this->strides_[0]*i];
}
};
}
#include <ATen/core/TensorAccessor.h>

View File

@ -1,108 +0,0 @@
#pragma once
#include "ATen/TensorImpl.h"
#include "ATen/UndefinedTensor.h"
namespace at { namespace detail {
// TensorBaseImpl is the base class for Tensor which handles the reference counting
template<bool is_strong>
struct TensorBaseImpl {
TensorBaseImpl(): TensorBaseImpl(UndefinedTensor::singleton(), false) {}
TensorBaseImpl(TensorImpl * self, bool should_retain)
: pImpl(self) {
if (pImpl == nullptr) {
throw std::runtime_error("TensorBaseImpl with nullptr not supported");
}
if(should_retain && pImpl != UndefinedTensor::singleton()) {
retain();
}
}
TensorBaseImpl(const TensorBaseImpl & rhs)
: pImpl(rhs.pImpl) {
if (pImpl != UndefinedTensor::singleton()) {
retain();
}
}
TensorBaseImpl(TensorBaseImpl && rhs) noexcept
: pImpl(rhs.pImpl) {
rhs.pImpl = UndefinedTensor::singleton();
}
~TensorBaseImpl() {
if (pImpl != UndefinedTensor::singleton()) {
release();
}
}
TensorBaseImpl & operator=(TensorBaseImpl && rhs) & {
rhs.swap(*this);
return *this;
}
TensorBaseImpl & operator=(TensorBaseImpl const & rhs) & {
//TensorBaseImpl ctor retains original rhs.pImpl
//then rhs.pImpl is swapped with this->pImpl
//finally TensorBaseImpl dtor releases rhs.pImpl, which was originally this->pImpl
TensorBaseImpl(rhs).swap(*this);
return *this;
}
int64_t dim() const {
if (is_strong) {
return pImpl->dim();
} else {
AT_ERROR("Can't call dim() on a WeakTensor");
}
}
void reset() {
TensorBaseImpl().swap(*this);
}
void reset(TensorImpl * rhs) {
TensorBaseImpl(rhs, true).swap(*this);
}
void reset(TensorImpl * rhs, bool should_retain) {
TensorBaseImpl(rhs, should_retain).swap(*this );
}
void swap(TensorBaseImpl & rhs) {
TensorImpl * tmp = pImpl;
pImpl = rhs.pImpl;
rhs.pImpl = tmp;
}
TensorImpl * get() const {
return pImpl;
}
TensorImpl * detach() {
TensorImpl * ret = pImpl;
pImpl = UndefinedTensor::singleton();
return ret;
}
bool defined() const {
return pImpl != UndefinedTensor::singleton();
}
friend struct Type;
//TODO(zach): sort out friend structes
public:
TensorImpl * pImpl;
private:
void retain() {
if (is_strong) {
pImpl->retain();
} else {
pImpl->weak_retain();
}
}
void release() {
if (is_strong) {
pImpl->release();
} else {
pImpl->weak_release();
}
}
};
using TensorBase = TensorBaseImpl<true>;
using WeakTensorBase = TensorBaseImpl<false>;
}} // namespace at::detail

View File

@ -1,23 +1,15 @@
#include <ATen/TensorGeometry.h>
#include <ATen/TensorUtils.h>
#include <ATen/ATen.h>
namespace at {
bool TensorGeometry::is_contiguous() const {
int64_t dim = sizes_.size();
int64_t expected_stride = 1;
for (int64_t i = dim - 1; i >= 0; i--) {
if (sizes_[i] != 1 && strides_[i] != expected_stride) {
return false;
}
expected_stride *= sizes_[i];
if (numel_ == 0) {
return true;
}
return true;
}
Tensor TensorGeometry::zeros_with_stride(const Type& type) const {
return type.tensor(sizes_, strides_).zero_();
return at::geometry_is_contiguous(sizes_, strides_);
}
} // namespace at

View File

@ -5,11 +5,11 @@
namespace at {
struct AT_API TensorGeometry {
struct CAFFE2_API TensorGeometry {
TensorGeometry() : storage_offset_(0) {}
explicit TensorGeometry(IntList sizes)
: sizes_(sizes)
: sizes_(sizes.vec())
, strides_(sizes.size())
, storage_offset_(0) {
int64_t dim = sizes.size();
@ -18,19 +18,18 @@ struct AT_API TensorGeometry {
strides_[i] = expected_stride;
expected_stride *= sizes_[i];
}
numel_ = expected_stride;
}
explicit TensorGeometry(const Tensor& t)
: sizes_(t.sizes())
, strides_(t.strides())
, storage_offset_(t.storage_offset()) {}
: sizes_(t.sizes().vec())
, strides_(t.strides().vec())
, storage_offset_(t.storage_offset())
, numel_(t.numel()) {}
// true if the tensor is contiguous
bool is_contiguous() const;
// creates a new tensor with the sizes and strides of the source
Tensor zeros_with_stride(const Type& type) const;
int64_t dim() const { return sizes_.size(); }
int64_t size(int64_t dim) const {
dim = maybe_wrap_dim(dim, this->dim());
@ -43,13 +42,7 @@ struct AT_API TensorGeometry {
}
IntList strides() const { return IntList{ strides_ }; }
int64_t storage_offset() const { return storage_offset_; }
int64_t numel() const {
int64_t r = 1;
for (auto s : sizes()) {
r *= s;
}
return r;
}
int64_t numel() const { return numel_; }
TensorGeometry transpose(int64_t dim0, int64_t dim1) {
TensorGeometry r = *this; // copy
@ -63,6 +56,7 @@ struct AT_API TensorGeometry {
std::vector<int64_t> sizes_;
std::vector<int64_t> strides_;
int64_t storage_offset_;
int64_t numel_;
};
} // namespace at

View File

@ -1,36 +0,0 @@
#include <ATen/TensorImpl.h>
#include <ATen/Tensor.h>
#include <ATen/optional.h>
namespace at {
Tensor& TensorImpl::grad() {
AT_ERROR("grad is not implemented for Tensor");
}
const Tensor& TensorImpl::grad() const {
AT_ERROR("grad is not implemented for Tensor");
}
Tensor TensorImpl::detach() const {
AT_ERROR("detach is not implemented for Tensor");
}
void TensorImpl::backward(
at::optional<Tensor> gradient,
bool keep_graph,
bool create_graph) {
AT_ERROR("backward is not implemented for Tensor");
}
void TensorImpl::set_data(Tensor new_data) {
AT_ERROR("set_type is not implemented for Tensor");
}
void Tensor::backward(
at::optional<Tensor> gradient,
bool keep_graph,
bool create_graph) {
pImpl->backward(std::move(gradient), keep_graph, create_graph);
}
} // namespace at

View File

@ -1,98 +0,0 @@
#pragma once
#include <atomic>
#include <memory>
#include "ATen/Retainable.h"
#include "ATen/ScalarType.h"
#include "ATen/optional.h"
namespace at {
class Scalar;
struct Type;
struct Storage;
struct Tensor;
} // namespace at
namespace at {
struct TensorImpl : public Retainable {
explicit TensorImpl(Type * type)
: is_scalar(false), type_(type) {}
Type & type() const {
return *type_;
}
virtual const char * toString() const = 0;
virtual IntList sizes() const = 0;
virtual IntList strides() const = 0;
virtual int64_t dim() const = 0;
/**
* Perform a conversion of this tensor to a scalar, if numel() == 1.
* Otherwise, raise an error.
*/
virtual Scalar localScalar() = 0;
virtual void * unsafeGetTH(bool retain) = 0;
virtual std::unique_ptr<Storage> storage() = 0;
friend struct Type;
int64_t numel() {
int64_t n = 1;
for (auto s : sizes()) {
n *= s;
}
return n;
}
// 0-dim patchup of TH requires us to have a flag marking
// if a Tensor should be treated as 0-dim.
// the generated wrapper manipulates this flag.
// the setter should never be exposed in Tensor's public API
// because eventually we would like isScalar() to just be dim() == 0;
bool isScalar() const {
return is_scalar;
}
// this is called by the generated wrapper code when there are conditions
// when this output tensor should be a scalar. e.g. when all inputs
// to a function 'add' were scalars, then condition_when_scalar == true.
// we also prevent this from getting marked as a scalar if it is not
// the right shape afterall.
TensorImpl* maybeScalar(bool condition_when_scalar) {
is_scalar = false; //force dim() to tell the truth for TH
is_scalar = condition_when_scalar && dim() == 1 && sizes()[0] == 1;
return this;
}
void setScalar(bool s) {
is_scalar = s;
}
// ~~~~~ Autograd API ~~~~~
// Some methods below are defined in TensorImpl.cpp because Tensor is an
// incomplete type.
AT_API virtual void set_requires_grad(bool requires_grad) {
AT_ERROR("set_requires_grad is not implemented for Tensor");
}
AT_API virtual bool requires_grad() const {
AT_ERROR("requires_grad is not implemented for Tensor");
}
AT_API virtual Tensor& grad();
AT_API virtual const Tensor& grad() const;
AT_API virtual Tensor detach() const;
AT_API virtual void detach_() {
AT_ERROR("detach_ is not implemented for Tensor");
}
AT_API virtual void backward(
at::optional<Tensor> gradient,
bool keep_graph,
bool create_graph);
AT_API virtual void set_data(Tensor new_data);
protected:
bool is_scalar;
Type * type_;
};
} // namespace at

View File

@ -1,6 +1,6 @@
#pragma once
#include "ATen/Scalar.h"
#include <c10/core/Scalar.h>
#include "ATen/Tensor.h"
#include "ATen/Type.h"
@ -9,7 +9,12 @@
namespace at {
inline Tensor & Tensor::operator=(Tensor const & rhs) && {
return copy_(rhs);
}
inline Tensor & Tensor::operator=(Tensor && rhs) && {
return copy_(rhs);
}
inline Tensor & Tensor::operator=(Scalar v) && {
return fill_(v);
}
@ -42,9 +47,8 @@ inline Tensor& Tensor::operator/=(Scalar other) {
}
inline Tensor Tensor::operator[](Scalar index) const {
AT_CHECK(
index.local().isIntegral(),
"Can only index tensors with integral scalars (got ",
index.toTensor().type().toString(), ")");
index.isIntegral(),
"Can only index tensors with integral scalars");
return select(0, index.toLong());
}
inline Tensor Tensor::operator[](Tensor index) const {
@ -55,7 +59,7 @@ inline Tensor Tensor::operator[](Tensor index) const {
index.dim() == 0,
"Can only index with tensors that are scalars (zero-dim)");
// The Scalar(Tensor) constructor is explicit, so we need to call it.
return this->operator[](Scalar(index));
return this->operator[](index.item());
}
inline Tensor Tensor::operator[](int64_t index) const {
return select(0, index);
@ -64,9 +68,9 @@ inline Tensor Tensor::operator[](int64_t index) const {
#define AT_FORALL_BINARY_OPS(_) \
_(+,x.add(y), y.add(x)) \
_(*,x.mul(y), y.mul(x)) \
_(-,x.sub(y), y.type().tensor().resize_(y.sizes()).fill_(x).sub_(y)) \
_(/,x.div(y), y.type().tensor().resize_(y.sizes()).fill_(x).div_(y)) \
_(%,x.remainder(y), y.type().tensor().resize_(y.sizes()).fill_(x).remainder_(y)) \
_(-,x.sub(y), ::at::empty(y.sizes(), y.options()).fill_(x).sub_(y)) \
_(/,x.div(y), ::at::empty(y.sizes(), y.options()).fill_(x).div_(y)) \
_(%,x.remainder(y), ::at::empty(y.sizes(), y.options()).fill_(x).remainder_(y)) \
_(<,x.lt(y), y.gt(x)) \
_(<=,x.le(y), y.ge(x)) \
_(>,x.gt(y),y.lt(x)) \

View File

@ -1,19 +0,0 @@
#include <ATen/TensorOptions.h>
#include <ATen/Device.h>
#include <ATen/Layout.h>
#include <ATen/OptionsGuard.h>
#include <ATen/ScalarType.h>
#include <ATen/optional.h>
namespace at {
TensorOptions::TensorOptions(bool use_thread_local_default_options) {
if (use_thread_local_default_options) {
this->dtype(DefaultTensorOptions::get().dtype());
this->device(DefaultTensorOptions::get().device());
this->layout(DefaultTensorOptions::get().layout());
this->requires_grad(DefaultTensorOptions::get().requires_grad());
}
}
} // namespace at

View File

@ -1,279 +1,2 @@
#pragma once
#include <ATen/Context.h>
#include <ATen/Device.h>
#include <ATen/DeviceGuard.h>
#include <ATen/Layout.h>
#include <ATen/ScalarType.h>
#include <ATen/Tensor.h>
#include <ATen/Type.h>
#include <cstddef>
#include <utility>
namespace at {
/// A class to encapsulate construction axes of a `Tensor`.
/// `TensorOptions` is a virtual class to enable overriding of certain methods
/// by subclasses in other libraries, such as PyTorch. In PyTorch, there is a
/// `torch::TensorOptions` subclass of this `TensorOptions`, which changes
/// `type()` to return a variable type instead of a tensor type, such that
/// variables are created inside factory methods, instead of tensors.
struct TensorOptions {
TensorOptions() : TensorOptions(/*use_thread_local_default_options=*/true) {}
/// Constructs the `TensorOptions` with defaults taken from the thread local
/// `TensorOptions` object if `use_thread_local_default_options`, else
/// defaults to:
/// - dtype: kFloat,
/// - device: kCPU,
/// - layout: kStrided,
/// - requires_grad: false
explicit TensorOptions(bool use_thread_local_default_options);
/// Constructs the `TensorOptions` from the type of the given `Tensor`.
/// If the `Tensor` has a CUDA type, the `device_index` will match that of the
/// tensor. The `requires_grad` property of the tensor is ignored and set to
/// false in the created `TensorOptions`. See the constructor from `Type` for
/// the semantics w.r.t. the `type()` method.
explicit TensorOptions(Tensor tensor, bool discard_runtime_type = false) {
if (!discard_runtime_type) {
type_ = &tensor.type();
}
this->dtype(tensor.dtype());
this->device(tensor.device());
this->layout(tensor.layout());
}
/// Constructs the `TensorOptions` from a type and a `device_index`.
///
/// If `discard_runtime_type` is false (the default), the behavior of
/// `TensorOptions::type()` is changed in that it will always return this
/// `type`, irrespective of any `device` or `dtype` or `layout` specified at a
/// later time. This is to ensure that when a `TensorOptions` object is
/// constructed from a tensor's type, and that type has a dynamic type other
/// than `at::Type` (e.g. `torch::autograd::VariableType`), constructing a new
/// tensor from this `TensorOptions` will use this same derived type. If
/// instead the given `type` were destructured into its components (backend,
/// dtype and layout), information about the runtime type of the `Type` would
/// be lost. Set `discard_runtime_type` to `true` to always destructure the
/// type into its components and discard its runtime type.
/* implicit */ TensorOptions(
const Type& type,
int32_t device_index = -1,
bool discard_runtime_type = false) {
if (!discard_runtime_type) {
type_ = &type;
}
this->dtype(type.scalarType());
this->device({type.backend(), device_index});
this->layout(type.layout());
}
/// Constructs a `TensorOptions` object with the given layout.
/* implicit */ TensorOptions(Layout layout) : TensorOptions() {
this->layout(layout);
}
/// Constructs a `TensorOptions` object with the given device.
/* implicit */ TensorOptions(Device device) : TensorOptions() {
this->device(device);
}
/// Constructs a `TensorOptions` object from a backend, forwarded to the
/// `Device` constructor.
/* implicit */ TensorOptions(Backend backend)
: TensorOptions(Device(backend)) {}
/// Constructs a `TensorOptions` object with the given dtype.
/* implicit */ TensorOptions(ScalarType dtype) : TensorOptions() {
this->dtype(dtype);
}
/// True if all elements of the `TensorOptions` match that of the other.
bool operator==(const TensorOptions& other) const noexcept {
return dtype_ == other.dtype_ && layout_ == other.layout_ &&
device_ == other.device_ && requires_grad_ == other.requires_grad_;
}
/// True if any of the elements of this `TensorOptions` do not match that of
/// the other.
bool operator!=(const TensorOptions& other) const noexcept {
return !(*this == other);
}
/// Discards the runtime type stored if the `TensorOptions` was constructed
/// from a `Tensor` or a `Type`. See the documentation of the constructor from
/// a `Type` for implications on the behavior of the `type()` method on
/// `TensorOptions`.
const TensorOptions& discard_runtime_type() const {
type_ = nullptr;
return *this;
}
/// Sets the device of the `TensorOptions`.
TensorOptions& device(Device device) {
device_ = std::move(device);
update_underlying_type();
return *this;
}
/// Sets the device of the `TensorOptions` to CUDA, and then sets the device
/// index to the given one.
TensorOptions& device_index(int32_t device_index) {
return device({Device::Type::CUDA, device_index});
}
/// Sets the dtype of the `TensorOptions`.
TensorOptions& dtype(ScalarType dtype) {
dtype_ = dtype;
update_underlying_type();
return *this;
}
/// Sets the layout of the `TensorOptions`.
TensorOptions& layout(Layout layout) {
layout_ = layout;
update_underlying_type();
return *this;
}
/// Sets the `requires_grad` property of the `TensorOptions`.
TensorOptions& requires_grad(bool requires_grad) {
requires_grad_ = requires_grad;
return *this;
}
/// Returns the device of the `TensorOptions`.
const Device& device() const noexcept {
return device_;
}
/// Returns the device index of the `TensorOptions`.
int32_t device_index() const noexcept {
return device_.index();
}
/// Returns the dtype of the `TensorOptions`.
ScalarType dtype() const noexcept {
return dtype_;
}
/// Returns the layout of the `TensorOptions`.
Layout layout() const noexcept {
return layout_;
}
/// Returns the `requires_grad` property of the `TensorOptions`.
bool requires_grad() const noexcept {
return requires_grad_;
}
/// Constructs an `at::Type` from the members of the `TensorOptions`.
const Type& type() const {
if (type_ != nullptr) {
return *type_;
}
return getType(backend(), dtype_);
}
private:
/// Updates any stored underlying type to the current construction axes.
void update_underlying_type() {
if (type_) {
type_ = &type_->toScalarType(dtype_).toBackend(backend());
}
}
// Resolves the ATen backend specified by the current construction axes.
Backend backend() const noexcept {
Backend backend;
if (device_.type() == Device::Type::CPU) {
backend = (layout_ == kStrided) ? kCPU : kSparseCPU;
} else {
backend = (layout_ == kStrided) ? kCUDA : kSparseCUDA;
}
return backend;
}
private:
ScalarType dtype_{kFloat};
Device device_{Device::Type::CPU};
Layout layout_{Layout::Strided};
bool requires_grad_{false};
// Not part of the observable API, so make `mutable` so we can set it to
// `null` in `discard_runtime_type`.
mutable const Type* type_{nullptr};
};
/// Convenience function that returns a `TensorOptions` object with the `dtype`
/// set to the given one.
inline TensorOptions dtype(ScalarType dtype) {
return TensorOptions().dtype(dtype);
}
/// Convenience function that returns a `TensorOptions` object with the `layout`
/// set to the given one.
inline TensorOptions layout(Layout layout) {
return TensorOptions().layout(layout);
}
/// Convenience function that returns a `TensorOptions` object with the `device`
/// set to the given one.
inline TensorOptions device(Device device) {
return TensorOptions().device(std::move(device));
}
/// Convenience function that returns a `TensorOptions` object with the
/// `device_index` set to the given one.
inline TensorOptions device_index(int32_t device_index) {
return TensorOptions().device_index(device_index);
}
/// Convenience function that returns a `TensorOptions` object with the
/// `requires_grad` set to the given one.
inline TensorOptions requires_grad(bool requires_grad = true) {
return TensorOptions().requires_grad(requires_grad);
}
/// From Tensor.h
inline TensorOptions Tensor::options() const {
return TensorOptions(*this);
}
namespace detail {
inline Tensor to(
const Tensor& tensor,
const TensorOptions& options,
bool non_blocking) {
// Don't copy if the options match.
if (tensor.options() == options) {
return tensor;
}
DeviceGuard guard(options.device());
return options.type().copy(tensor, non_blocking);
}
} // namespace detail
inline Tensor Tensor::to(Device device, ScalarType dtype, bool non_blocking)
const {
if (this->device() == device && this->dtype() == dtype) {
return *this;
}
return detail::to(*this, options().device(device).dtype(dtype), non_blocking);
}
inline Tensor Tensor::to(ScalarType dtype, bool non_blocking) const {
if (this->dtype() == dtype) {
return *this;
}
return detail::to(*this, options().dtype(dtype), non_blocking);
}
inline Tensor Tensor::to(Device device, bool non_blocking) const {
if (this->device() == device) {
return *this;
}
return detail::to(*this, options().device(device), non_blocking);
}
} // namespace at
#include <ATen/core/TensorOptions.h>

Some files were not shown because too many files have changed in this diff Show More