Compare commits

...

3103 Commits

Author SHA1 Message Date
ee77ccbb6d Enabling inplace relu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28710

Test Plan: Imported from OSS

Differential Revision: D18146120

Pulled By: z-a-f

fbshipit-source-id: d8f0982f5a2ae35f7deb34e67cdb64be700a9d6c
2019-11-04 15:37:02 -08:00
f7f538566e Quantized Tensor support copy (#28612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612

att

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D18255247

fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68
2019-11-04 15:36:30 -08:00
432724b3e2 Fix torch.where to accept only tensors with same dtypes(CPU) (#29078)
* Make zeros argument of torch.where same dtype as other argument

* Added check for torch.where on CPU that both arguments have same dtype

* Changes based on PR comments

* Fix flake8

* Fixed test for CUDA

* Changes basen on PR comments

* Changes based on PR review
2019-11-04 18:32:42 -05:00
cc98c93bf3 Tensoriterator type promotion fixes (#28961)
* preserve original tensoriterator behavior when not explicitly promoting

Summary:

Cherry-picking of https://github.com/pytorch/pytorch/pull/28231 to
1.3.1 branch.

Fix: https://github.com/pytorch/pytorch/issues/28010

A mixed-type index assignment that would have been an error in 1.2 was unintentionally made possible (with incorrect results) in 1.3. This PR restores the original behavior.

This is BC-breaking because:
```
        a = torch.ones(5, 2, dtype=torch.double)
        b = torch.zeros(5, dtype=torch.int)
        a[:, [1]] = b.unsqueeze(-1)
```
now raises an error (as in 1.2) whereas it did not in 1.3.

* Compute correct strides after type promotion (#28253)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28253

Instead of trying to fix strides after changing dtypes, wait until after
promotion to set them.

fixes: https://github.com/pytorch/pytorch/issues/27824
fixes: https://github.com/pytorch/pytorch/issues/28502

Test Plan: Imported from OSS

Differential Revision: D18124950

Pulled By: nairbv

fbshipit-source-id: e4db90b2a6bb0f5d49cb388e0cd1971303c6badd
2019-10-31 23:56:18 -04:00
eb8e8c1bcf [v1.3.1] Add polygamma and lgamma to the docs (#28964)
* Add Polygamma to the docs (#27696)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/25347
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27696

Differential Revision: D17916790

Pulled By: ezyang

fbshipit-source-id: ac2635a300b1ef0ab437e3ffac152239754fe828

* Add documentation for torch.lgamma (#27812)

Summary:
Changelog:
- Add doc string in _torch_docs.py, _tensor_docs.py
- Expose in docs/source/torch.rst, docs/source/tensors.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27812

Test Plan:
- Remove `lgamma`, `lgamma_` from the blacklist

Fixes https://github.com/pytorch/pytorch/issues/27783

Differential Revision: D17907630

Pulled By: ezyang

fbshipit-source-id: 14e662a4e5262126889a437e5c4bfb21936730e8
2019-10-31 23:49:09 -04:00
f0a4ac3ee0 argmax for half datatype fix (#28787) (#28915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28787

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#28787 argmax for half datatype fix**

Test Plan: Imported from OSS

Differential Revision: D18194420

Pulled By: pbelevich

fbshipit-source-id: d2abec1ea8a9ce3a93aec5a2c5bba57d163197e6
2019-10-31 23:47:46 -04:00
bd9766a36a Return None correctly from Tensor.names (#28659) (#28922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28659

Previously, we would return None from `Tensor.names` without bumping the
refcount. This is a bug; the Python API requires the developer to
increment the refcount on new references to None. This is because None
is a singleton object and does not automatically have its reference
count bumped when one uses Py_None (which is a pointer to the actual
None singleton object).

See the following for Python documentation on this:
- https://docs.python.org/3/c-api/none.html#c.Py_RETURN_NONE
- https://docs.python.org/3/extending/extending.html#back-to-the-example

Fixes https://github.com/pytorch/pytorch/issues/28646

Test Plan: - New test.

Differential Revision: D18140593

Pulled By: zou3519

fbshipit-source-id: 302a09021b68229e2e7b1b584b3549b30506bdab
2019-10-31 23:46:19 -04:00
80cca51adc Update hyperlink syntax for XLA, torchaudio, torchtext, and C++ (#28022) 2019-10-30 12:53:16 -07:00
9d45ee1d81 Add note that cuda quantization is not supported (#27829)
Summary:
People get confused with partial support otherwise: https://github.com/pytorch/pytorch/issues/27811 #27729

Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27829

Differential Revision: D17910931

Pulled By: dzhulgakov

fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c
2019-10-30 12:53:10 -07:00
de394b672d Add autofunctions in torch.rst
This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/
Originally by @dzhulgakov
2019-10-10 09:23:22 -07:00
92c6401bb9 Include add_docsstr method in _torch_docs.py
This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/
originally by @dzhulgakov
2019-10-10 09:23:14 -07:00
b4f32dd292 Update to quantization
Organize APIs logically in subsections. Fix typos.

This is the v1.3.0 version of a 3 Part PR originally made to master PR: https://github.com/pytorch/pytorch/pull/27677/
originally by @dzhulgakov
2019-10-10 09:22:39 -07:00
3e451b4796 updated the list of APIs that can be used in with quantized tensors. 2019-10-10 09:22:39 -07:00
036a591556 capitalization changes requested by jessica 2019-10-10 09:22:39 -07:00
86d9ee8dee Removed "NOTE" on the URLs. 2019-10-10 09:22:39 -07:00
aa44ffb4c9 added the quantization formula to the quantization doc 2019-10-10 09:22:39 -07:00
162b054e39 cleaning up URLs 2019-10-10 09:22:39 -07:00
7f044f7398 added a draft ops list from Zafar and Raghu 2019-10-10 09:22:39 -07:00
0c81d6ba4b changes from Raghu about the model preparation. 2019-10-10 09:22:39 -07:00
d1752f2bf8 change to the URL we link to for the concept of custom ops 2019-10-10 09:22:39 -07:00
49fbeb8cc8 adding quantization.rst file for quantization feature
This was written by Raghu, Jessica, Dmytro and myself.
2019-10-10 09:22:39 -07:00
f0d3fc70b4 take2: Docstring only changes in quantization, fake_quantize, and observer (#27574)
* docstring only formatting changes in the quantize.py and fake_quantization.py files to render better in HTML.

* docstring change on observer.py as well

* just kind of tweaking the docstrings a bit more.

* switching to r""" for the mult-line string. Per Zafar's suggestion.

* trying to resolve the merge conflict soumith saw

* trying to avoid a conflict when this gets merged back to master
2019-10-10 08:22:16 -07:00
fb489555a9 Quant other doc changes for relbranch pr (#27640)
* Cherry picked in changes from Jessica's branch.

Consolidate all quantization docs in quantization.rst. Add a link to quantization docs from torch.rst. Order quantization.rst alphabetically in index.rst

* Fix Quantized reference

* Add prose for Quantized Functions in the torch.nn docs

* Remove Quantization section

* Updates to index for v1.3.0

* Update "Package Reference" to "Python API"
* Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
* Add "Other Languages" section, add in C++ docs, add in Javadocs
* Add link to XLA docs under Notes: http://pytorch.org/xla/

* Doc tests caught that we'd somehow dropped documenting a few functions like
result_type, can_cast, promote_types

* Add javasphinx extension
2019-10-10 08:21:49 -07:00
b5144f1068 Add javadocs for v1.3.0 (#27656)
* Add javadocs for v1.3.0

* Delete Tensor-Tensor_float32 because it is not public

* Delete Tensor-Tensor_float64 because it is not public

* Delete Tensor-Tensor_int32 because it is not public

* Delete  Tensor-Tensor_int64 because it is not public

* Delete Tensor-Tensor_int8 because it is not public

* Delete Tensor-Tensor_uint8 because it is not public

* Add reference to DType and TensorImageUtils
2019-10-10 08:21:35 -07:00
a5c08a6abd Update docs CI for v1.3.0 (#27638)
This PR updates the docs CI. After this is merged, we open a PR from
1.3.0 -> master. That open PR will build docs on this branch and push
them to pytorch.github.io:site-v1.3.0. This is done in dry_run mode
so the pushing won't actually happen; I will follow up with a
subsequent change to drop dry_run mode after verifying that everything
builds correctly.
2019-10-10 08:21:10 -07:00
6cc759269f add type promotion info to torch.add/mul/div docs (#27501) 2019-10-10 08:20:44 -07:00
6742476ba3 fix install_requires properly 2019-10-09 12:24:36 -04:00
067aee5f30 Documentation for named tensors (#27573)
`docs/source/named_tensor.rst` is the entry point; most users will land
either here or the named tensor tutorial when looking to use named
tensors. We should strive to make this as readable, concise, and understandable
as possible.

`docs/source/name_inference.rst` lists all of the name inference rules.
It should be clear but it's hard to make it concise.

Please let me know if anything doesn't make sense and please propose
alternative wordings and/or restructuring to improve the documentation.
This should ultimately get cherry-picked into the 1.3 branch as one
monolithic commit so it would be good to get all necessary changes made
in this PR and not have any follow ups.

Test Plan:
- built and reviewed locally with `cd docs/ && make html`.

ghstack-source-id: dc2ca7a204f86d4849bd45673c189d5bbddcb32c
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173
2019-10-09 08:52:22 -04:00
0a7f7e6d30 [jit] Set existing attributes under recursive script (#27545)
Landing in master in #27514
2019-10-09 08:51:48 -04:00
e9fc91cbca Adding docstrings for nnq.functional (#27473)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363

Test Plan: Imported from OSS

Differential Revision: D17758907

Pulled By: zafartahirov

fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2
2019-10-09 08:51:06 -04:00
23df957e94 Revert "Mark protobuf include path as system include (#23012)"
This reverts commit a2b3403962efce151d4c447e27106f9617c52595.
2019-10-08 20:11:56 -04:00
a7b161c08b Clean up JavaDoc comments in pytorch_android
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455

Test Plan: Imported from OSS

Differential Revision: D17800658

Pulled By: dreiss

fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
2019-10-08 17:01:33 -04:00
6bae48c127 Various cleanups to pytorch_android API (#27454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454

See detailed discussion at
https://github.com/pytorch/pytorch/issues/27350

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800480

Pulled By: dreiss

fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
2019-10-08 17:01:33 -04:00
c248943743 Refactor python_android test to separate Android-specific components (#27453)
Summary:
All of the test cases move into a base class that is extended by the
intrumentation test and a new "HostTests" class that can be run in
normal Java.  (Some changes to the build script and dependencies are
required before the host test can actually run.)

ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800410

fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
2019-10-08 17:01:33 -04:00
e058a37fe4 Modify PyTorch's integration of NNPACK to use a unified underlying thread pool implementation. (#27547) 2019-10-08 17:00:12 -04:00
aa7112a618 Add missing Optional annotation. (#27557) 2019-10-08 16:55:12 -04:00
b728ffabc3 #include <stdexcept> into flat_hash_map.h (#27480) 2019-10-07 22:20:02 -04:00
d67898a93b update (#27386) 2019-10-07 22:19:18 -04:00
9a25673478 Revert to align_corners=True as default. (#27469) 2019-10-07 16:02:53 -04:00
17613ad73c Fix native ctc_loss gradient indexing bug for large target sizes
Fixes: #27442

Thank you Mohamed Yousef (@ASDen) for the report with minimal
reproducing example and detailed analysis!
2019-10-07 08:58:30 -07:00
beaae6a2b6 [android][torchvision] Add methods to write image tensor content to buffer (#27407)
ghstack-source-id: fd0fc8e7d2c99d67930dd34a286020e6d47ad402
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359
2019-10-07 01:25:56 -04:00
328f49968c MovingAverage Observer (#27396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396

Observer that estimates moving averages of min and max values per batch,  more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018

Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details

buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17727213

fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
2019-10-06 22:22:46 -07:00
7f9096f868 Replacing the skip_list with white_list in the qconfig propagation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183

Test Plan: Imported from OSS

Differential Revision: D17700548

Pulled By: zafartahirov

fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16
2019-10-06 22:22:46 -07:00
7e94ee235f Avoid calling tensor.numel() in for loops (#27298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27298

PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place.
This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined.
Also the thread local state is expensive to read/write so many times and this kills perf.

PR #27274 is another approach to fix this and has more details.

Test Plan:
Quantized mobilenetV2 perf before this change
Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696

Perf after this change
Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267

Imported from OSS

Differential Revision: D17742565

fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff
2019-10-06 22:22:46 -07:00
318fb8e8b9 Factored out the default mappings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27164

Test Plan: Imported from OSS

Differential Revision: D17694475

Pulled By: zafartahirov

fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
2019-10-06 22:22:46 -07:00
43bb1b2356 Fix reprs for _intrinsic modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27184

Test Plan: Imported from OSS

Differential Revision: D17717481

Pulled By: jamesr66a

fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
2019-10-06 22:22:46 -07:00
87fbd27cc0 Allow set for qconfig for dynamic_quantize
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27181

Test Plan: Imported from OSS

Differential Revision: D17717482

Pulled By: jamesr66a

fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
2019-10-06 22:22:46 -07:00
225c38b719 Rename _intrinsic to intrinsic
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27194

Test Plan: Imported from OSS

Differential Revision: D17704957

Pulled By: zafartahirov

fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
2019-10-06 22:22:46 -07:00
8074526e7f Enabling intra-op parallelism (#26692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26692

Adding intra-op parallelism for qconv and qlinear.

export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv
python test/test_quantized.py TestQuantizedLinear.test_qlinear

TODO: Performance numbers.
ghstack-source-id: 91135613

Test Plan:
export OMP_NUM_THREADS=4
python test/test_quantized.py TestQuantizedConv.test_qconv

python test/test_quantized.py TestQuantizedLinear.test_qlinear

Differential Revision: D17540567

fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
2019-10-06 22:22:46 -07:00
06a866de94 Suppressing hypothesis health check for qnnpack_add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27193

Test Plan: Imported from OSS

Differential Revision: D17704958

Pulled By: zafartahirov

fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb
2019-10-06 22:22:46 -07:00
9b22a55499 Handle uninitialized min/max values in histogram observer (#27151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27151

We need to be ab le to handle observers with no min/max data correctly as models sometimes have modules that do not get any data.
ghstack-source-id: 91113403

Test Plan:
buck test caffe2/test:quantization -- test_minmax_observer

buck test caffe2/test:quantization -- test_per_channel_minmax_observer

buck test caffe2/test:quantization --test_histogram_observer

Reviewed By: csummersea

Differential Revision: D17690828

fbshipit-source-id: e95709333ea0f66d79ddb8141b7cba5a83347dbd
2019-10-06 22:22:46 -07:00
f8d3eac4c3 Unify quantized conv and linear tests (#26992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26992

Run the same test for FBGEMM and QNNPACK backends.
Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines)

Test Plan:
python test/test_quantized.py TestQuantizedLinear
    python test/test_quantized.py TestQuantizedConv
    python test/test_quantized_models.py
    python test/test_quantized_nn_mods.py

Imported from OSS

Differential Revision: D17689171

fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b
2019-10-06 22:22:46 -07:00
68b4d22da7 Uninitialize the accumulation buffer to save some overhead (#27005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27005

Similar to https://github.com/pytorch/pytorch/pull/27002, we want to save some overhead.
ghstack-source-id: 91046563

Test Plan: CI

Differential Revision: D17641819

fbshipit-source-id: 9320919242a48f48532035e61d9844de671d39af
2019-10-06 22:22:46 -07:00
66b73b0950 Fuse module enhancements (#26457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26457

Enhancement to fuse module to support sequentials, fuse list can now be just like the state dict.
Also add support for Conv-Relu and linear-relu fusion
Also support inplace and out of place fusion of models.
ghstack-source-id: 91076386

Test Plan:
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D17466382

fbshipit-source-id: 0a548f8f4c366f3ecc59db693bac725ccd62328e
2019-10-06 22:22:46 -07:00
32eb3b8d7b Add control for observers in Fake-quantize module (#27113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113

Fix bug in fake quant control of observer and fake-quantize operations.
Add test to ensure that features work as expected
ghstack-source-id: 91071181

Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control

Differential Revision: D17678875

fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b
2019-10-06 22:22:46 -07:00
5a2a34cd2d Support for add relu functional module (#26612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26612

Add support for add relu functional module, this allows for fusion of add and relu quantized operations
ghstack-source-id: 91055976

Test Plan: buck test caffe2/test:quantization -- 'test_functional_module \(test_quantization\.FunctionalModuleTest\)' --print-passing-details

Differential Revision: D17518268

fbshipit-source-id: e1e8b4655d6b32405863ab9d1c7da111fb4343cc
2019-10-06 22:22:46 -07:00
a8083e18e8 Default observer and fake-quant for backends (#26627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26627

ghstack-source-id: 91008337

Test Plan: buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17518194

fbshipit-source-id: 1eb8a7a85dc811c4ee5228d68563abb157613ceb
2019-10-06 22:22:46 -07:00
bc3fb36ed7 Emulate weight and activation only quant with fake quant, numerics test (#26625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26625

ghstack-source-id: 91008296

Test Plan: buck test caffe2/test:quantized -- 'test_weight_only_activation_only_fakequant \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Differential Revision: D17520342

fbshipit-source-id: 26e148d3299afcfdfb1187aff6ab80687ed8df47
2019-10-06 22:22:46 -07:00
e0822f1089 Quantization aware training: Freeze batch norm support (#26624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26624

For QAT we need to be able to control batch norm for all modules from the top. Adding helper functions to enable/disable batch norm freezing during training
ghstack-source-id: 91008297

Test Plan: buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17512199

fbshipit-source-id: f7b981e2b1966ab01c4dbb161030177274a998b6
2019-10-06 22:22:46 -07:00
cbfd4e05e9 Per channel fake quant (#26623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26623

Per-channel fake quant cpu and cuda operators,
per-channel support in fake quant module,
tests for per-channel fake-quant and serializability of fake quant modules

ghstack-source-id: 91008299
ghstack-source-id: 91008299

Test Plan:
buck test mode/dev caffe2/test:fake_quant  --
 Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929
      ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed)
      ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed)
      ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed)
      ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed)
      ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed)
      ✓ caffe2/test:fake_quant - main 0.000 (passed)

Differential Revision: D17439406

fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077
2019-10-06 22:22:46 -07:00
b9a2c8ac5c Improve repr for quantized modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27008

Test Plan: Imported from OSS

Differential Revision: D17649174

Pulled By: jamesr66a

fbshipit-source-id: e3e6c4bb31e1ad8ed1ebe27f803f90d564ecfe53
2019-10-06 22:22:46 -07:00
6d7a73c0da Per-channel baseline (#26516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516

ghstack-source-id: 90982010

Test Plan:
Integrate per-channel support into conv and linear modules.
The following tests pass:
buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details

buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Differential Revision: D17342622

fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
2019-10-06 22:22:46 -07:00
15e4827617 Dont zero out buffers in dynamic linear (#27002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27002

This was taking a significant amount of time in my benchmarks with larger output sizes (e.g. final output projection in a language classification model)

Test Plan: Imported from OSS

Differential Revision: D17641765

Pulled By: jamesr66a

fbshipit-source-id: b0ef30767eec9774fc503bb51fed039222026bba
2019-10-06 22:22:46 -07:00
024fa34700 fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_overflow_large to working state 2019-10-06 22:05:27 -07:00
95d2c7fc98 fix segfault when printing error msg for list comp (#27398)
* fix segfault when printing error msg for list comp

* simplify error msg printing
2019-10-06 23:07:54 -04:00
7ba2baee00 Make align_to method-only. (#27304) (#27367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304

The ellipsis version of `align_to` only works if it is called as a
method. To prevent any confusion, this PR disables `torch.align_to` (but
keeps `Tensor.align_to`.

Test Plan: - [namedtensor ci]

Differential Revision: D17743809

Pulled By: zou3519

fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241
2019-10-06 23:07:07 -04:00
ccf3a6de3d add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274) (#27321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27274

This is yet another fix to address #26764.

PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place thus it's most logically sound place to do
such tweaks.

However, we observed nontrivial perf regression due to this fix. Turns out
the numel() tensor method gets called in several for-loops thus incurs ~7M
thread_local updates in a single forward call:
```
7173330 numel
    558 size
    416 q_scale
    302 _empty_affine_quantized
    288 contiguous
    257 q_zero_point
    216 qscheme
    173 empty
    110 set_
    105 as_strided
    104 permute
...
```

As numel() is not called from a single place so a natural workaround is to
update function_wrapper.py so that it only adds the guard on gen_namespace_function()
case and ignore the gen_tensor_method() case. But some tensor methods are actually
being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the
only "band aid" left on the table is to insert guard on JIT->aten path as originally
did on #26868 - this is a simplified version of it as it doesn't hurt to extend the
NonVariableMode scope a little bit to also cover stack drop/pack calls.

On Android we only expose JIT API so we don't need worry about TensorMethods being
called directly. On iOS we don't provide a wrapper yet but we can mention this caveat
in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor
unification and remove all these hacks.

Test Plan:
- Verified it runs quantized/fp32 MobileNetV2 models;
- Verified it fixes the perf regression (revert #26908 separately);

Differential Revision: D17732489

Pulled By: ljk53

fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5
2019-10-06 23:06:22 -04:00
1ba6fc4ca6 Fixed Error message for tensor.align_to (#27221) (#27250)
Summary:
Fixing this [issue1](https://github.com/pytorch/pytorch/issues/27074) and [issue2](https://github.com/pytorch/pytorch/issues/27073)
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27221

Differential Revision: D17716235

Pulled By: izdeby

fbshipit-source-id: c7bafd16b469c91924ebc3dba77ca56424d4c93c
2019-10-06 23:05:33 -04:00
d4d4bf5686 Enabled comparison ops with named tensors (#27162) (#27249)
Summary:
Fixing this [issue](https://github.com/pytorch/pytorch/issues/27077).
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27162

Differential Revision: D17694187

Pulled By: izdeby

fbshipit-source-id: 939017c91605c89a0e08e0c3f8fe21de93bba95b
2019-10-06 23:04:42 -04:00
cee965fae9 Fix ONNX Interpolate (#27233)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27179

Reviewed By: hl475

Differential Revision: D17698364

Pulled By: houseroad

fbshipit-source-id: 8fddd1c13e7af026962cf2d9c05fd7c957d8526e
2019-10-06 23:02:27 -04:00
544c16cdbf make class types callable (#26743) (#27226)
Summary:
Allowing invoking of a UDT if they have a `__call__` method

Fix for https://github.com/pytorch/pytorch/issues/26725
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26743

Differential Revision: D17677795

Pulled By: eellison

fbshipit-source-id: 0ceb6088e22c4689e0735fdb9e07418a75603486
2019-10-06 23:01:53 -04:00
494a5563b4 [jit] Fix toIValue dict iteration (#27112) 2019-10-06 23:01:20 -04:00
ba4c3a1c2c Module method destroy (#27111)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27090

Test Plan: Imported from OSS

Differential Revision: D17674096

Pulled By: IvanKobzarev

fbshipit-source-id: d1c0db3797730bff90db83259a38904e71f7941d
2019-10-06 23:00:32 -04:00
c556f9052f Bump gloo (#27087)
Includes a bugfix for the uv transport used on macOS.

See https://github.com/facebookincubator/gloo/pull/220 for details.
2019-10-06 22:59:52 -04:00
5c80dd3c1f Turn on named tensor testing for v1.3.0 (#27084)
Previously, we would only test named tensors if:
1) we built with BUILD_NAMEDTENSOR=1
2) TEST_NAMEDTENSOR=1 is in the environment.

This PR makes it so that we ALWAYS test named tensors. This is OK
because all the release binaries should be able to run the named tensor
tests and be green; otherwise, there is something wrong.
2019-10-06 22:59:19 -04:00
2d8ee11139 [jit] Serializing autograd ops into its own namespace (#27079)
Summary:
This PR serialize autograd ops into its own namespace by turning the
serialization op name into torch.autograd.op, this is to keep the
original code namespace rather than turning all to the global namespace,
this will be more properly handled in the future when we handle the module
namespace. This change also preserve BC until we have namespace handling

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
2019-10-06 22:58:36 -04:00
ebc2519bec Serialize XLA Tensor (#27042) 2019-10-06 22:56:59 -04:00
8c9e4b250d make cudnn rnn respect current stream (#27044) 2019-10-06 22:55:54 -04:00
6a6f047fc6 fix pytorch_linux_xenial_py3_6_gcc5_4_build for release branch 2019-10-06 19:38:14 -07:00
deadc27c23 Update to ROCm 2.8 (#27337)
Summary:
New docker images built with tag 324.

Related jenkins changes:
83ec813357
aa235a14c8

Triggered CI runs:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337

Differential Revision: D17753827

Pulled By: bddppq

fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48
2019-10-04 16:32:05 -04:00
6276fda119 Fix circle CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27307

Test Plan: Imported from OSS

Differential Revision: D17746444

Pulled By: xta0

fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39
2019-10-04 16:31:54 -04:00
65ee8f2c23 Provide (but skip) 3.5 job by default on all PRs. (#27293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27293

This doesn't turn on 3.5 signal, but it makes it so that [test all]
will include it if you do request it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17738741

Pulled By: ezyang

fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148
2019-10-04 16:31:44 -04:00
6126cfab2c Report docker push / pull time (#26861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26861

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17712801

Pulled By: ezyang

fbshipit-source-id: 504594452e6594d79e41856ce5177ab370dc26f1
2019-10-04 16:31:36 -04:00
e2f6fed611 Don't apply should_run to the nightly/postnightly branches. (#27061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27061

Previously the cronjobs were run on master, but now the nightly builds
count as "PRs" so we must whitelist them from should_run calculation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17669066

Pulled By: ezyang

fbshipit-source-id: 3b92bf1d09aefa7ef524ea93dfa8c6f566161887
2019-10-04 16:31:25 -04:00
667deb92f7 Turn Caffe2 CUDA 9.1 + py2 to CUDA 10.1 + py3 (#26835)
Summary:
For TensorRT test introduced in https://github.com/pytorch/pytorch/pull/26426
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26835

Reviewed By: hl475

Differential Revision: D17580108

Pulled By: houseroad

fbshipit-source-id: c57fafec228b78c26b8a7946c92ad7434425bbd4
2019-10-04 16:31:16 -04:00
0e88de5580 fix OSX CI build (#27373)
Summary:
fix OSX caffe2 CI build, attempt 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373

Differential Revision: D17768461

Pulled By: soumith

fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e
2019-10-04 16:28:24 -04:00
3c8ce2a57e Make nonzero non differentiable as it supposed to be (#26980)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/26038

Somewhere between v1.1 and master `nonzero` become `abstract` and was marked as differentiable (by mistake) we need to but them into TH section of `tools/autograd/derivatives.yaml ` to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26980

Differential Revision: D17632276

Pulled By: VitalyFedyunin

fbshipit-source-id: d6cabcc53348af6148cea5a1bd1af2ef12547373
2019-10-04 10:59:55 -07:00
f2080fb3f2 [tensorboard] Add method add_hparams to API doc (#27349) 2019-10-04 02:12:36 -04:00
84afb7b0c1 [android][1.3.0] gradle.properties version bump (#27275) 2019-10-04 01:13:53 -04:00
b6e976ae2d Work around a gcc-7 bug in building Debug version of Sleef (#26993) (#27160)
Summary:
We always build the Release version of Sleef on gcc 7.

    Sep 26 02:59:19 cd /var/lib/jenkins/cpp-build/caffe2/build/sleef/src/libm && /opt/cache/bin/cc  -DDORENAME=1 -DENABLE_ALIAS=1 -DENABLE_BUILTIN_MATH=1 -DENABLE_PUREC_SCALAR=1 -DENABLE_SYS_getrandom=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DSLEEF_STATIC_LIBS=1 -DTH_BLAS_MKL -D_FILE_OFFSET_BITS=64 -I/var/lib/jenkins/cpp-build/caffe2/build/aten/src -I/var/lib/jenkins/workspace/aten/src -I/var/lib/jenkins/cpp-build/caffe2/build -I/var/lib/jenkins/workspace -isystem /var/lib/jenkins/cpp-build/caffe2/build/third_party/gloo -isystem /var/lib/jenkins/workspace/cmake/../third_party/gloo -isystem /var/lib/jenkins/workspace/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/workspace/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/workspace/third_party/protobuf/src -isystem /opt/python/2.7.9/include -isystem /var/lib/jenkins/workspace/third_party/gemmlowp -isystem /var/lib/jenkins/workspace/third_party/neon2sse -I/var/lib/jenkins/workspace/cmake/../third_party/benchmark/include -isystem /var/lib/jenkins/workspace/third_party -isystem /var/lib/jenkins/workspace/cmake/../third_party/eigen -isystem /var/lib/jenkins/workspace/torch/include -isystem /opt/rocm/hip/include -isystem /include -I/var/lib/jenkins/cpp-build/caffe2/build/caffe2/contrib/aten -I/var/lib/jenkins/workspace/third_party/onnx -I/var/lib/jenkins/cpp-build/caffe2/build/third_party/onnx -I/var/lib/jenkins/workspace/third_party/foxi -I/var/lib/jenkins/cpp-build/caffe2/build/third_party/foxi -isystem /var/lib/jenkins/workspace/third_party/ideep/include -I/var/lib/jenkins/workspace/third_party/NNPACK/include -I/var/lib/jenkins/workspace/third_party/NNPACK/src -I/var/lib/jenkins/workspace/third_party/cpuinfo/include -I/var/lib/jenkins/workspace/third_party/pthreadpool/include -I/var/lib/jenkins/workspace/third_party/FXdiv/include -I/var/lib/jenkins/workspace/third_party/psimd/include -I/var/lib/jenkins/workspace/third_party/FP16/include -I/var/lib/jenkins/workspace/third_party/sleef/src/common -I/var/lib/jenkins/workspace/third_party/sleef/src/arch -I/var/lib/jenkins/cpp-build/caffe2/build/sleef/src/libm/include -I/var/lib/jenkins/workspace/third_party/sleef/src/libm  -Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math -g -O1 -fPIC   -DCAFFE2_USE_GLOO -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -std=gnu99 -o CMakeFiles/sleefpurec_scalar.dir/sleefsimdsp.c.o   -c /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c
    Sep 26 02:59:20 /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c: In function 'gammafk':
    Sep 26 02:59:20 /var/lib/jenkins/workspace/third_party/sleef/src/libm/sleefsimdsp.c:3103:1: internal compiler error: in trunc_int_for_mode, at explow.c:55
    Sep 26 02:59:20  }
    Sep 26 02:59:20  ^
    Sep 26 02:59:20 Please submit a full bug report,
    Sep 26 02:59:20 with preprocessed source if appropriate.
    Sep 26 02:59:20 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
    Sep 26 02:59:20 sleef/src/libm/CMakeFiles/sleefpurec_scalar.dir/build.make:67: recipe for target 'sleef/src/libm/CMakeFiles/sleefpurec_scalar.dir/sleefsimdsp.c.o' failed
    Sep 26 02:59:20 make[2]: Leaving directory '/var/lib/jenkins/cpp-build/caffe2/build'

Also updated Sleef submodule to include fixes that are missed in https://github.com/pytorch/pytorch/issues/26749

https://github.com/pytorch/pytorch/issues/26994 provides a potentially cleaner fix

Close https://github.com/pytorch/pytorch/issues/26892
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26993

Differential Revision: D17669103

Pulled By: ezyang

fbshipit-source-id: 1b87a4a8fecc6441de3b008aee6929537768be1a
2019-10-04 01:06:11 -04:00
8626a1cc81 Update the link for iOS demo app in README.md (#27145)
Summary:
Update the link for iOS demo app in README.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27145

Differential Revision: D17746591

Pulled By: xta0

fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481
2019-10-03 22:05:08 -07:00
831566ec90 Fixed seek offset size to 64bit. (#27125 for 1.3.0) (#27069)
* Fixed seek offset size to 64bit. (#27047)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/26998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
(cherry picked from commit 1afe3fc01eb194a3e7ce58240462de2121646233)

* Use _lseeki64 instead for MSVC

(cherry picked from commit f49f78d4c89b42474b3357a10de76d179b383e2c)
2019-10-04 01:03:59 -04:00
f7b3b20457 Fix Windows CI (#27120)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27031

Differential Revision: D17665998

Pulled By: ezyang

fbshipit-source-id: 6926e304c75ba878520627f1e829412f633b1bec
2019-10-04 01:03:02 -04:00
8ce38cf27d Resubmit [pytorch][PR] [ONNX] Updating producer_version in exported ONNX models to PyTorch 1.3. (#27049) 2019-10-04 00:56:57 -04:00
a94f9c7246 Fix race condition in Function::optimized_graph(). (#27323)
The current logic is buggy, and will fail in the following situation:

Thread A: check optimized_graph_, it is empty.
Thread A: claim the mutex in order to initialize optimized_graph_.
Thread A: copy graph_ into optimized_graph_.
Thread A: start running optimizations on optimized_graph_.
Thread B: check optimized_graph_, it is not empty.
Thread B: start using optimized_graph_.

BUG: Thread B is using the graph while it's still being mutated by
Thread A.

[ghstack-poisoned]
2019-10-04 00:54:59 -04:00
2fc3bb8571 Remove outdated note in cholesky_solve and triangular_solve doc strings (#27018)
We do support inputs with dim > 2 in _out variants
2019-10-04 00:36:42 -04:00
f694d4d872 move parallel_for/parallel_reduce common implementation to cpp (#26969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26969

template got inflated into many places. This PR extracted out common
implementation that doesn't depend on template param.

After:
Compressed ARMv7 AAR size: 5,677,469->5,398,011
RAW libpytorch.so size: 16,862,108->16,047,004

Test Plan:
- Test perf/correctness as #26702;

- Run tests for non-mobile native aten_threading:
```
ATEN_THREADING=NATIVE python setup.py develop --cmake
pytest -s -v test/test_torch.py::TestTorch
pytest -s -v test/test_jit.py
```

Differential Revision: D17628089

Pulled By: ljk53

fbshipit-source-id: 987d1f28174870384d6642d0bd4912b138348f66
2019-10-03 21:35:13 -07:00
d7b6d945eb Fix test_overwrite_module_params_on_conversion_cpu_cuda after type promotion introduced for comparison ops (#27066) 2019-10-03 16:18:01 -04:00
09f0e949cd PyTorch Graph Mode Quantization API (#26390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26390

`quantize_script`: top level API for graph mode quantization

Test Plan:
there are some known issues, we can enable test after all known issues are fixed.

Imported from OSS

Differential Revision: D17645132

fbshipit-source-id: 61f261d5607409d493b39a2f4e05ebd017279f6b
2019-09-27 19:23:51 -07:00
da93cc5c2a Fix race condition in torch::jit::Function (#27009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27009

JIT can be called concurrently from two threads, so even the read from GraphExecutor has to be guarded by the lock.

This was a recent regression introduced by https://github.com/pytorch/pytorch/pull/26571/files#diff-40af5094abe4f522e8a78adb591dde19

Reviewed By: jamesr66a, wanchaol

Differential Revision: D17645407

fbshipit-source-id: f0a3a5d6d8ced04e043bdc56f4263f91d6189be1
2019-09-27 18:44:45 -07:00
f8db764f6c Remove unimplmented passes (#26978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26978

We can add them later if there is a need.

Test Plan:
ci

Imported from OSS

Differential Revision: D17643009

fbshipit-source-id: 053ec65c4acc03371aab4760793282682f039933
2019-09-27 18:33:46 -07:00
766767652a Move patterns in QuantFusion to a separate file (#26848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26848

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17636399

fbshipit-source-id: 7a2bc99a5dd7120c3b7de2adc72c772cb0759066
2019-09-27 18:10:57 -07:00
5e79b5b1c7 Move some class/functions in test_jit.py to jit_utils.py (#26839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26839

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17643010

fbshipit-source-id: 5768b70410b7bdfdbee734d3a00296e5b1ad30d5
2019-09-27 18:07:24 -07:00
b0f1b5c757 Add QuantFusion to graph_executor (#26591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26591

att

Test Plan:
.

Imported from OSS

Differential Revision: D17636651

fbshipit-source-id: 85f3fba1ac0f890622f8c3d8bfb1894de5c050e0
2019-09-27 18:01:18 -07:00
541de7e140 Migrate le/gt/ge/eq/ne from the TH to Aten. Added support of type promotion. (#26981)
Summary:
https://github.com/pytorch/pytorch/issues/24606 Migrate ne and ne_ from the TH to Aten (CUDA)
https://github.com/pytorch/pytorch/issues/24740 Migrate ne and ne_ from the TH to Aten (CPU)
https://github.com/pytorch/pytorch/issues/24573 Migrate gt and gt_ from the TH to Aten (CUDA)
https://github.com/pytorch/pytorch/issues/24709 Migrate gt and gt_ from the TH to Aten (CPU)
https://github.com/pytorch/pytorch/issues/24556 Migrate eq and eq_ from the TH to Aten (CUDA)
https://github.com/pytorch/pytorch/issues/24696 Migrate eq and eq_ from the TH to Aten (CPU)
https://github.com/pytorch/pytorch/issues/24568 Migrate ge and ge_ from the TH to Aten (CUDA)
https://github.com/pytorch/pytorch/issues/24703 Migrate ge and ge_ from the TH to Aten (CPU)
https://github.com/pytorch/pytorch/issues/24582 Migrate le and le_ from the TH to Aten (CUDA)
https://github.com/pytorch/pytorch/issues/24719 Migrate le and le_ from the TH to Aten (CPU)

Performance characteristics are similar to https://github.com/pytorch/pytorch/issues/25998

This PR migrates comparison ops from TH to ATen and adds type promotion in the same way as in https://github.com/pytorch/pytorch/issues/25998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26981

Differential Revision: D17635651

Pulled By: ifedan

fbshipit-source-id: 6ec7615207f5c248a6dd85fc54c25bd5e6d328e6
2019-09-27 17:28:56 -07:00
ff8b7ef63d fix range for non-int inputs and pow implementation (#26926)
Summary:
Previously we did not throw if an input to `range` was a non-integer.

We also typed the result from `int ** int` as an integer but returned a float value. The return type should be a float, because if the exponent is negative `int ** int` returns a float.

Batching these two PRs together because it is easier to land and we're almost at the branch cut.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26926

Differential Revision: D17643039

Pulled By: eellison

fbshipit-source-id: b49203a9d420417e1307bbb653d2e33cd9e530e3
2019-09-27 17:14:23 -07:00
9080f1c5dd Rewrite argmax and argmin as TensorIterator reductions (#26181)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8817

This rewrites `argmax` and `argmin` to use `TensorIterator` as suggested by ngimel in https://github.com/pytorch/pytorch/issues/8817. To support this, the reduction operation is now passed the index along with the current element. I also had to change a few places where the input and output tensor `dtype`s were assumed to be the same.

Unfortunatley, this isn't enough to reimplement the variants of `min` and `max` that return indices. There are several places where multiple tensor outputs are assumed to all have the same `dtype` and so returning `pair<scalar_t, int64_t>` for `ops.project` isn't possible.

#### Performance Results
**Edit:** These timings are invalid, see below for a better perf comparison
Timings reported by [`argmax.py`](https://gist.github.com/SsnL/6898c240d22faa91da16fc41359756a2):
```
cuda : 0.1432
cpu  : 26.976
numpy: 2.1350
```

So, the `TensorIterator` reductions are much faster on the GPU but significantly slower on the CPU. `htop` shows the cpu kernel using 4 cores for the cpu reduction so it's not clear what the issue is there.
Should I just revert to the old implementation on CPU or is it worth investigating further? I see that other `TensorIterator` cpu reductions are similarly faster in `numpy`  e.g. `max`, `mean` `std`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26181

Differential Revision: D17631979

Pulled By: pbelevich

fbshipit-source-id: 58424818ef32cef031d436cb6191e9a6ca478581
2019-09-27 16:58:55 -07:00
0c6a18de8d Add torch.promote_types function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26655

Test Plan: Imported from OSS

Differential Revision: D17556196

Pulled By: nairbv

fbshipit-source-id: eeebce8968bfb2ffd25c066595bc19e5dee6ea6f
2019-09-27 16:48:38 -07:00
024a422f41 Add fakefp16 transformation.
Summary: ATT.

Reviewed By: hyuen

Differential Revision: D17559866

fbshipit-source-id: 58e3de97d00f20a9b5556e35504c520926d43cbd
2019-09-27 16:46:03 -07:00
aa0b28428c Add optimized quantize function for ARM (#26867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26867

Use caffe2::Int8Quantize for pytorch mobile. Currently this is only implemented for uint8 tensors and runs using NEON intrinsics.
For all other cases it falls back to naive pytorch quantize_val implementation.

Previously, naive implementation of quantize_val is slow on mobile, taking up more than 50% of the execution time.

Results
Before
aten::quantize_per_tensor 42.893 ms
Total model runtime 70.5ms

After
aten::quantize_per_tensor 0.340 ms
Total model runtime 27.5ms

Test Plan:
Tested current python tests work python test/test_quantized.py TestQNNPackOps
Also tested using quantized mobilenetV2 on mobile and compared output

Imported from OSS

Differential Revision: D17638732

fbshipit-source-id: 76445d1e415e6e502d05ba5b900e5e1d875fc1b0
2019-09-27 16:43:16 -07:00
ad58045af9 Remove LOG(INFO) from math_cpu.cc (#27001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27001

This unconditional log line spams the logs enough that it's a drag on cpu and will eventually fill up logs.

Test Plan: Allow unit test and automated testing to give feedback.

Reviewed By: jspark1105

Differential Revision: D17638140

fbshipit-source-id: 4e8a44bda31327ba7e797f7579a9e3bf866eef7e
2019-09-27 16:37:49 -07:00
6d715c9e79 Bring back the optimization of integer.pow({2.0, 3.0}) on CPU (#26938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26938

They were accidentally removed in #26020

Test Plan: Imported from OSS

Differential Revision: D17632120

Pulled By: pbelevich

fbshipit-source-id: d62f2b5635fb4976fd4eda2f2015fdf67138a0c0
2019-09-27 16:35:04 -07:00
3a18e2e768 support re-creating/destroying process groups when some trainers recover after failures (#26912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26912

group name is used as prefix in the c10d store and without a consistent name process group cannot be initialized.

When process group doesn't have an explicit name (only WORLD (default) process group can have an explicit name), we use global _group_counter to generate the name. We need to reset the counter on destruction to allow consistent value to be generated when we re-create process groups after some trainers recover from failure.

Test Plan: existing tests passed

Reviewed By: mrshenli

Differential Revision: D17594268

fbshipit-source-id: 17f4d2746584dadaa5d468085d871ff3e95a1c84
2019-09-27 16:16:58 -07:00
250f482aa5 Support qadd_relu on pytorch mobile (#26982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26982

Fused add+relu support

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qnnpack_add

Also,
Add torch.backends.quantized.engine = "qnnpack"
Ran
python test/test_quantized.py TestQuantizedOps.test_qadd_relu_different_qparams
python test/test_quantized.py TestQuantizedOps.test_qadd_relu_same_qparams

Imported from OSS

Differential Revision: D17635063

fbshipit-source-id: dd1cdf07f66c4cd657c1907f1b650e50d3d4725f
2019-09-27 16:13:42 -07:00
b518ff3cb8 Re-write of tensor-scalar mul
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26937

Test Plan: Imported from OSS

Differential Revision: D17618028

Pulled By: jamesr66a

fbshipit-source-id: 90ef461972e826327a19467ad4cefdeb35e13adc
2019-09-27 16:09:27 -07:00
91a0eb7cc5 Add int8 resize nearest 3d op in DNNLOWP (#26063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26063

As title

Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test

Reviewed By: protonu, amylittleyang

Differential Revision: D17330625

fbshipit-source-id: 137b1faa86b4346512c49ee5d163ca1d75c1accd
2019-09-27 15:53:27 -07:00
646b69b3d0 Xray image inference on multi-cpu and dumping dnnlowp tensors (#22537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22537

Enable multi-CPU model evaluation;
Dump intermediate tensors in conv dnnlowp operators for debugging

Test Plan:
Local run and dump tensors:
```
buck run mode/opt experimental/summerdeng/xray_image:test_net_quantization -- --model_path=/mnt/public/summerdeng/xray_image/models/oct_resnext101_50.mdl --batch_size=1 --test_max_images=100 --octave_conv --octave_conv_ratio=0.5 --output_dir=/mnt/public/summerdeng/xray_image/output --num_cpus=4 --caffe2_dnnlowp_dump_tensors
```

Dumped .mtx files can be found here: /mnt/public/summerdeng/xray_image/dump_tensors
Histogram plots can be found here: https://our.intern.facebook.com/intern/anp/view/?id=112033

Example flow runs for model evaluation:

f124056759 Evaluating fp32 Oct-ResNext101 with 16 cpus
```
fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_fp32_16cpu --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1  --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output"  --num_cpus=16
```

f124275053 Evaluating int8 Oct-ResNext101 with 16 cpus
```
fry flow-cpu --resource '{"cpu_core": 16}' --binary-type local --name [quantization_eval]oct_resnext101_0.5_int8_nongroupwise_l2approx --disable-source-snapshot true --distribute-to-local-dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --flow-entitlement gpu_prod ~/fbsource/fbcode/buck-out/gen/experimental/summerdeng/xray_image/test_net_quantization.par --test_data="/mnt/vol/gfsai-oregon/ai-group/users/zyan3/octconv/xray_v11_annotation_data_fullfeat_32x4_dedup_split_05202019_posonly_labeled_2018_05_29_test.csv" --batch_size=1 --model_path="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/oct_resnext101_50.mdl" --octave_conv --octave_conv_ratio=0.5 --test_max_images=-1 --int8_model_saved --int8_model_type="mdl" --output_dir="/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_image/output" --int8_model_mdl_name="int8_oct_resnext101_50_nongroupwise_l2approx.mdl" --num_cpus=16
```

Reviewed By: stephenyan1231

Differential Revision: D16106577

fbshipit-source-id: 9de359f2afe7f9a7722ae404f0d9aeca1d9c3c75
2019-09-27 15:53:23 -07:00
ee68c512c5 Add P99 method with configurable thresholds
Summary:
Update the P99 quantization method with configurable thresholds.
Add dnnlowp options for the configuration.

Test Plan: buck run mode/opt experimental/summerdeng/xray_image:test_net_quantization -- --model_path=/mnt/public/summerdeng/xray_image/models/oct_resnext101_50_2B_pretrained.mdl --batch_size=1 --test_max_images=100 --octave_conv --octave_conv_ratio=0.5 --output_dir=/mnt/public/summerdeng/xray_image/output --quantize --histogram_file=/mnt/public/summerdeng/xray_image/activation_histograms/oct_resnext101_50_2B_pretrained_hist_200k_compiled.txt --int8_model_type="mdl" --int8_model_mdl_name="int8_oct_resnext101_50_2B_l2_nongroupwise.mdl" --skip_first_conv --weight_quant="l2" --activation_quant="p99" --activation_p99_threshold=0.999 --measure_quantization_error

Reviewed By: amylittleyang

Differential Revision: D16626158

fbshipit-source-id: 7718dcf429f73aa54e82a6b6f6e631d94e3a134c
2019-09-27 15:53:20 -07:00
55a358546f Revert D17631902: [pytorch][PR] [ONNX] Updating producer_version in exported ONNX models to PyTorch 1.3.
Test Plan: revert-hammer

Differential Revision:
D17631902

Original commit changeset: 6d5896465740

fbshipit-source-id: ebf9e5e1c582027dbba2db68328ea4136a974c6b
2019-09-27 15:49:36 -07:00
5aa01fd89a C++ API parity: AdaptiveMaxPool3d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26775

Test Plan: Imported from OSS

Differential Revision: D17627824

Pulled By: pbelevich

fbshipit-source-id: c4ae077ea5575c5d1df795e74a0dcb74a695ad06
2019-09-27 15:31:37 -07:00
2afa5fe112 Better error message for calculate_qparams (#26985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26985

Produce better error message when `calculate_qparams` doesn't return
something we expect. It should return a Tuple of two tensors.

Test Plan:
ci

Imported from OSS

Differential Revision: D17636252

fbshipit-source-id: 6caee48134f46d2f25dec3fa655e99c15043a67f
2019-09-27 15:28:26 -07:00
23260f3e7d Add logging in constant propagation pass
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26653

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D17621895

Pulled By: bzinodev

fbshipit-source-id: eda7df423a995590fd50052424891b6d04277882
2019-09-27 15:24:42 -07:00
3e480f8fb8 Fix fbjni packaging, exclude for publishing, include by default (#26995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26995

Fix current setup, exclude fbjni - we can not use independently pytorch_android:package, for example for testing `gradle pytorch_android:cAT`

But for publishing it works as pytorch_android has dep on fbjni that will be also published

For other cases - we have 2 fbjni.so - one from native build (CMakeLists.txt does add_subdirectory(fbjni_dir)), and from dependency ':fbjni'
We need both of them as ':fbjni' also contains java classes

As a fix: keep excluding for publishing tasks (bintrayUpload, uploadArchives), but else - pickFirst (as we have 2 sources of fbjni.so)

# Testing

gradle cAT works, fbjni.so included
gradle bintrayUpload (dryRun==true) - no fbjni.so

Test Plan: Imported from OSS

Differential Revision: D17637775

Pulled By: IvanKobzarev

fbshipit-source-id: edda56ba555678272249fe7018c1f3a8e179947c
2019-09-27 15:21:26 -07:00
38f7a51cf2 add AutoNonVariableTypeMode guard on JIT->ATen boundary
Summary:
- This PR together with #26908 attempt to address issue #26764 (`Issue 1` mentioned below).

- Current flow without USE_STATIC_DISPATCH (for server build):
```
S1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding VariableType methods.
    b.3 VariableType method uses `AutoNonVariableTypeMode` guard before calling into ATen implementation, as ATen generally expects `CHECK(!is_variable())`.
    b.4 VariableType method uses `as_variable` to wrap the results.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after S1.a / S1.b.

S2. module::forward()
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 - a.4: same as S1.b.1 - S1.b.4.

  x. Different from S1.x, seems JIT doesn't expect `CHECK(is_variable())` during the entire `forward()` call.
```

- Current flow with USE_STATIC_DISPATCH (for mobile build):
```
M1. jit::load()
  a. JIT calls variable_factories.h methods to instantiate variable instances.

  b. JIT calls some ATen methods during intitalization, e.g.: conv_prepack, q_scale.
    b.1 First runs corresponding `Operation` in generated register_aten_ops_xxx.cpp, which calls `at::` functions, then calls ATen dispatcher.
    b.2 ATen dispatcher dispatches to corresponding ATen implementation directly.
      // Issue 1: NO VariableType methods / `AutoNonVariableTypeMode` so `CHECK(!is_variable())` in ATen will fail!
      // (Hypothetical) Issue 2: NO `as_variable()` to wrap result as variable. M1.x will fail if is ever used to check this result.

  x. Somewhere in JIT it expects `CHECK(is_variable())` - not sure before/after M1.a / M1.b.

M2. module::forward() // PR #26477 wraps this call with `AutoNonVariableTypeMode` guard.
  a. JIT interpreter calls some ATen methods (via JIT registry).
    a.1 same as M1.b.1, calls into register_aten_ops_xxx.cpp.
    a.2 same as M1.b.2, calls ATen implementation directly.
      // `CHECK(!is_variable())` in ATen won't fail thanks to the outer scope `AutoNonVariableTypeMode` guard.

  x. Same as above, seems JIT never expects `CHECK(is_variable())` during the entire `forward()` call.
```

- Wrong solution: if we wrap M1 with `AutoNonVariableTypeMode`, it will solve `Issue 1` for some models but will fail M1.x for some other models.

- Proposed solution:
I feel the root cause is that mobile build doesn't have `VariableType` as a barrier sitting between JIT and ATen to convert between is_variable() and !is_variable().

Without `VariableType` the best alternative place to put a barrier is M1.b.2 as Edward did in #26908.

For some reason we also need toggle variable state for c10 ops: this is what this PR does. We haven't figured how non-mobile build works without this logic so it's kinda bandaid for now.

This PR doesn't try to address (Hypothetical) Issue 2 as I haven't seen it. PR #26477 can be replaced by #26908 + this PR but we can keep it until M2.x is no longer true.

- Ultimate solution:
After Variable and Tensor are completely merged: #23032 then is_variable() checks can be changed to requires_grad() checks and all problems will be solved. We can clean up these hacks by then.

- References:
* Effect of `AutoNonVariableTypeMode`: all `is_variable()` inside current thread scope returns false:
  https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L811

* Effect of `as_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/VariableTypeUtils.h#L159
  It calls `make_variable`: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L539

Test Plan: - Load and run MobileNetV2 fp32 & int8 models.

Differential Revision: D17595179

Pulled By: ljk53

fbshipit-source-id: ed417ba6b696d722ea04fe18adf6b38ababa6b7c
2019-09-27 15:17:27 -07:00
a625734f6a Acquire GIL before creating py::object in RPC python handler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26988

Test Plan: Imported from OSS

Differential Revision: D17635297

Pulled By: mrshenli

fbshipit-source-id: 43c93e44fe0dceba9a41a292c53a665c612843e9
2019-09-27 15:13:32 -07:00
baa227b410 Revert D17579439: Add std::variant backport as torch::variant
Test Plan: revert-hammer

Differential Revision:
D17579439

Original commit changeset: 6416521047f5

fbshipit-source-id: 0a57bef5d1d2d5366f84fcfa52b3968e01802164
2019-09-27 14:31:50 -07:00
290405321a Better named tensor error messages. (#26974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26974

Suggest `Tensor.rename` to rename tensors and/or drop names on named
tensors.

Test Plan: - [namedtensor ci]

Differential Revision: D17628950

Pulled By: zou3519

fbshipit-source-id: b701f46c46093046691eace698be8282d049d37a
2019-09-27 14:12:36 -07:00
6b3c0c1f22 Updating producer_version in exported ONNX models to PyTorch 1.3. (#26976)
Summary:
Bumping up the `producer_version` in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26976

Reviewed By: hl475

Differential Revision: D17631902

Pulled By: houseroad

fbshipit-source-id: 6d58964657402ac23963c49c07fcc813386aabf0
2019-09-27 13:50:24 -07:00
0ae0c9788e Fix misuages for TORCH_CHECK/TORCH_INTERNAL_ASSERT with string (#26897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26897

TORCH_INTERNAL_ASSERT("foo") doesn't do what you think it does :)

I'll try to do a fix to catch it in the compiler, but for now - let's fix usages

Found them using regex:
```
ag --cpp "TORCH_(CHECK|INTERNAL_ASSERT)\([ \n]*\"" --multiline
```

Test Plan: Imported from OSS

Differential Revision: D17624299

Pulled By: dzhulgakov

fbshipit-source-id: 74f05737ef598fd92b5e61541ee36de2405df23d
2019-09-27 13:45:19 -07:00
764bf826e3 Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840

Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized

Test Plan: Imported from OSS

Differential Revision: D17604403

Pulled By: dzhulgakov

fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc
2019-09-27 13:45:15 -07:00
e4fba752cb fix type annotation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26930

Test Plan: Imported from OSS

Differential Revision: D17614745

Pulled By: vincentqb

fbshipit-source-id: 1c29543f74d9cf307e9665aa890b4830b886fe63
2019-09-27 13:39:36 -07:00
2cdfec6b24 Make named tensor implementations more robust (#26968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26968

To make implementations of an operator more robust, we should have a
separate "named area" where name propagation happens and an "unnamed
area" where the implementation is. Right now, many functions are
implemented without an "unnamed area". The problem with that is that if
someone modifies the implementation, it is very easy to break
namedtensor support by using a helper function that does not propagate
names correctly. The test coverage for named tensors is also
insufficient to catch such breakages.

This PR modifies some named tensor implementations to have separate
"named area" and "unnamed area". The following implementations were
changed:
- dropout, softmax, log_softmax, bernoulli
- dot, mm, addmm, addmv, mv

Test Plan: - [namedtensor ci]

Differential Revision: D17627920

Pulled By: zou3519

fbshipit-source-id: 9300ac3962219b1fcd8c4c8705a2cea6f8c9d23d
2019-09-27 13:25:41 -07:00
95a08c5b95 Add documentation for overload names (#23844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23844

-
ghstack-source-id: 90941095

Test Plan: testinprod

Differential Revision: D16660167

fbshipit-source-id: 504b57535156bfeba62396aca7f6a431d8233b7a
2019-09-27 13:15:52 -07:00
486305066a fix mobile.sh build (#26975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26975

ExportModule doesn't exist in mobile libtorch.a, it doesn't fail for
regular mobile build guess _save_for_mobile was stripped altogether.
But for host toolchain with different linker flag this will fail.
Add #if macro as Module::save.

Test Plan: - scripts/build_mobile.sh works;

Differential Revision: D17629869

Pulled By: ljk53

fbshipit-source-id: 7d3cebe0a7c3f7b56928eb5a9d9c9174403fe6e5
2019-09-27 12:52:33 -07:00
d63d7ab997 Expose PiecewiseLinearTransform to PyTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26903

Test Plan: Unit Test

Reviewed By: bddppq

Differential Revision: D17585637

fbshipit-source-id: fe669aaf3301d7efb5c28ec0097945d55a71773d
2019-09-27 12:49:04 -07:00
71011211c1 Add std::variant backport as torch::variant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26836

Test Plan: Imported from OSS

Differential Revision: D17579439

Pulled By: yf225

fbshipit-source-id: 6416521047f5b93c01514e3cd153c9abc3ad3417
2019-09-27 12:44:13 -07:00
bb7a415bcc C++ API parity: AdaptiveMaxPool2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26772

Test Plan: Imported from OSS

Differential Revision: D17627823

Pulled By: pbelevich

fbshipit-source-id: 195f1edabbbbe245de3568beb0c7925eb347118a
2019-09-27 12:41:38 -07:00
3ad1bbe16a Named tensor support for: index_fill_, index_fill, squeeze, median(Tensor) (#26914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26914

Also added dimname overloads for index_fill_ and squeeze.

Test Plan: - [namedtensor ci]

Differential Revision: D17609136

Pulled By: zou3519

fbshipit-source-id: 29c7ad52ffe24e0b3ad679111fee7a78eca7acdf
2019-09-27 12:28:49 -07:00
2f1932fc5c Fix issues in torch::tensor constructor (#26890)
Summary:
This PR contains the following:
1. Fix ambiguous overload problem when `torch::tensor({{1, 2}})` is used:
```
../test/cpp/api/tensor.cpp: In member function ‘virtual void TensorTest_MultidimTensorCtor_Test::TestBody()’:
../test/cpp/api/tensor.cpp:202:41: error: call of overloaded ‘tensor(<brace-enclosed initializer list>)’ is ambiguous
     auto tensor = torch::tensor({{1, 2}});
                                         ^
In file included from ../caffe2/../torch/csrc/api/include/torch/types.h:7:0,
                 from ../caffe2/../torch/csrc/api/include/torch/detail/static.h:4,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/pimpl.h:4,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/module.h:3,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/cloneable.h:3,
                 from ../test/cpp/api/support.h:7,
                 from ../test/cpp/api/tensor.cpp:2:
../torch/csrc/autograd/generated/variable_factories.h:177:644: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<unsigned char>)
../torch/csrc/autograd/generated/variable_factories.h:177:1603: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<signed char>)
../torch/csrc/autograd/generated/variable_factories.h:177:2562: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<short int>)
../torch/csrc/autograd/generated/variable_factories.h:177:3507: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<int>)
../torch/csrc/autograd/generated/variable_factories.h:177:4450: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<long int>)
../torch/csrc/autograd/generated/variable_factories.h:177:5404: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<float>)
../torch/csrc/autograd/generated/variable_factories.h:177:6354: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<double>)
../torch/csrc/autograd/generated/variable_factories.h:177:7630: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<bool>)
../torch/csrc/autograd/generated/variable_factories.h:177:9224: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<c10::Half>)
../torch/csrc/autograd/generated/variable_factories.h:177:10838: note: candidate: at::Tensor torch::tensor(c10::ArrayRef<c10::BFloat16>)
In file included from ../caffe2/../torch/csrc/api/include/torch/types.h:7:0,
                 from ../caffe2/../torch/csrc/api/include/torch/detail/static.h:4,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/pimpl.h:4,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/module.h:3,
                 from ../caffe2/../torch/csrc/api/include/torch/nn/cloneable.h:3,
                 from ../test/cpp/api/support.h:7,
                 from ../test/cpp/api/tensor.cpp:2:
../torch/csrc/autograd/generated/variable_factories.h:193:19: note: candidate: at::Tensor torch::tensor(torch::detail::InitListTensor)
 inline at::Tensor tensor(detail::InitListTensor list_init_tensor) {
                   ^
```

After this PR, the multidim tensor constructor `torch::tensor(...)` should be ready for general use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26890

Differential Revision: D17632608

Pulled By: yf225

fbshipit-source-id: 2e653d4ad85729d052328a124004d64994bec782
2019-09-27 12:07:50 -07:00
f77b295edc Disable cudnn transpose for int types (#26934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26934

Disable cudnn transpose for int types

Did experiment with int + 4d/5d

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:utility_ops_test

Reviewed By: houseroad

Differential Revision: D17607176

fbshipit-source-id: 83b9f9cf654b33d68b657f1b5a17d9bbd06df529
2019-09-27 11:36:10 -07:00
8fa9900c28 control of observer/fake-quant operations (#26520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520

Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT
ghstack-source-id: 90897063

Test Plan: buck test caffe2/test:quantization --  --print-passing-details

Differential Revision: D17491155

fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc
2019-09-27 11:01:34 -07:00
b2e43e4a2e Fix all factory invocations in quantized to correctly propagate options. (#26966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26966

Without this, you may allocate intermediates which are non-variables
when you should allocate variables.

Should help with discussion in #26868.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17629863

Pulled By: ezyang

fbshipit-source-id: 0dd9b218d3fc2dbbbbd9b1712db8ab4dac16ea22
2019-09-27 10:43:54 -07:00
102a148641 Default histogram observer (#26622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26622

ghstack-source-id: 90897064

Test Plan: buck test caffe2/test:quantization --  --print-passing-details

Differential Revision: D17508787

fbshipit-source-id: ae733ab35ec9b0233264014b8054d4d870fb05e1
2019-09-27 10:39:21 -07:00
6bf6788158 make repeat respect the current stream (#26946)
Summary:
Kernel launch did not have the stream argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26946

Test Plan: should be covered by current tests

Differential Revision: D17629397

Pulled By: ngimel

fbshipit-source-id: f91a72d0908b5672c6df045c9df49bf1d48a5ac9
2019-09-27 10:24:27 -07:00
428204dfa4 Fix the QuantizedAVX2 build issue (#26854)
Summary:
The QuantizedAVx2 does not support the int32 type. We switch to use at::quantize_vec function instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26854

Differential Revision: D17609872

Pulled By: llyfacebook

fbshipit-source-id: b4a77d93ce0ebfef696506b5cdbe3e91fe44bb36
2019-09-27 10:20:26 -07:00
b0a2f6f2f5 Serialization and range reduction support for Fake Quant/Observer (#26519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26519

ghstack-source-id: 90895631

Test Plan:
buck test caffe2/test:quantization -- 'test_histogram_observer \(test_quantization\.ObserverTest\)' --print-passing-details
and
buck test caffe2/test:fake_quant -- 'test_fq_serializable \(test_fake_quant\.TestFakeQuantizePerTensorAffine\)' --print-passing-details

Differential Revision: D17217408

fbshipit-source-id: 0da7efdcdae0c065dd035c5dd2b6a78231545ece
2019-09-27 10:09:39 -07:00
3acbcb96d4 Include iteration_ in SGD optimizer serialization (#26906)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/24192 by including the private field `iteration_` in SGD optimizer serialization. Under the hood, `iteration_` is serialized into an `IValue`, then stored in a JIT module as an attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26906

Differential Revision: D17628359

Pulled By: yf225

fbshipit-source-id: beec1367459e973a1c9080dc86f502e4c7bc5ebd
2019-09-27 09:37:20 -07:00
0a393f6ef5 C++ API parity: AdaptiveMaxPool1d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26755

Test Plan: Imported from OSS

Differential Revision: D17627828

Pulled By: pbelevich

fbshipit-source-id: f898a4d2c269b98eb5905291914caa25bca87ce0
2019-09-27 09:10:39 -07:00
9a5e2e80b8 Fake quantization enhancements for QAT/PTQ support- fix tests (#26876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26876

Add ability to turn fake quantization and observers independently.
ghstack-source-id: 90892132

Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details

Differential Revision: D17592961

fbshipit-source-id: 24c60c94ed7c6c9fa55c634a8545731614e4f52f
2019-09-27 08:59:29 -07:00
2a43b74196 Add torch.can_cast(from, to) function (#26805)
Summary:
https://github.com/pytorch/pytorch/issues/25472
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26805

Differential Revision: D17628434

Pulled By: nairbv

fbshipit-source-id: 6af8031ac3afda1505d338075c0637ad043f8b7e
2019-09-27 08:40:34 -07:00
76a76a6cb9 Switch nightly jobs to trigger on 'nightly' branch rather than cron. (#26830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26830

Fixes #26817

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17608535

Pulled By: ezyang

fbshipit-source-id: 18b47af508bd606391b1e6436cefe586b9926ace
2019-09-27 07:25:19 -07:00
8ec0414053 Automatic update of fbcode/onnx to 034921bd574cc84906b7996c07873454b7dd4135 (#26955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26955

Previous import was ab6b94203c595f74b1f126eb118eef22e4c05a57

Included changes:
- **[034921bd](https://github.com/onnx/onnx/commit/034921bd)**: Fix warnings (#2358) <Changming Sun>
- **[2873fea8](https://github.com/onnx/onnx/commit/2873fea8)**: Fix spec and shape inference for Unsqueeze op (#2347) <Hariharan Seshadri>
- **[a3c91452](https://github.com/onnx/onnx/commit/a3c91452)**: Bump NMS version for avoiding regression in existing models (#2348) <Wei-Sheng Chin>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17623703

fbshipit-source-id: 2abc610ed6786680a622ade4a82594469d10f917
2019-09-27 03:31:32 -07:00
b60656bb0c Move Generator ops to c10 (#26434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26434

ghstack-source-id: 90902124

Test Plan: unit tests

Differential Revision: D17465434

fbshipit-source-id: 469206d44e328c19008daf2f6a323dcd1ac97984
2019-09-27 02:05:07 -07:00
f01ae84bc1 RPC Backend Registry (#26919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26919

Make the names more systematic

ghstack-source-id: 90870882

Differential Revision: D17262797

fbshipit-source-id: 5a2e513a0d0cca5b699b40cbf530f51776392a2a
2019-09-27 01:11:39 -07:00
e2ef49b559 Updating submodules
Summary:
GitHub commits:

b6b5955e72
99ef0247c0
017ffed361
acb696352e
209a420612
7dfeddb5ba

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 751c4f1b52cb58d481c84c621a305480a258787d
2019-09-27 00:35:22 -07:00
6b9bcd0606 export baddbmm (#26901)
Summary:
Adding symbolic for baddbmm export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26901

Reviewed By: hl475

Differential Revision: D17620967

Pulled By: houseroad

fbshipit-source-id: 3931dff5a4afdcb4a45d967fb0efaf84029c16e5
2019-09-26 22:53:21 -07:00
614edfce81 Add Support to Dicts and Strings in ONNX for Inputs and Outputs (#25889)
Summary:
ONNX does not support dictionaries for inputs and output. The reason is that the arg flattening and unflattening does not handle Dictionary types.
This PR adds flattening/unflattening support for dictionaries and strings.
However this feature should be handled with caution for input dictionaries; and users need to verify their dict inputs carefully, and keep in mind that dynamic lookups are not available.

This PR will allow exporting cases where models have dictionnary outputs (detection and segmentation models in torchvision), and where dictionary inputs are used for model configurations (MultiScaleRoiAlign in torchvision).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25889

Reviewed By: hl475

Differential Revision: D17613605

Pulled By: houseroad

fbshipit-source-id: c62da4f35e5dc2aa23a85dfd5e2e11f63e9174db
2019-09-26 22:31:09 -07:00
7163bfdf58 Fix the weird bug in control_flow_op_test.py (#26931)
Summary:
In some version of python, then_net and else_net may switch the order. Let's make sure we are iterating the right arg node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26931

Reviewed By: hl475

Differential Revision: D17614829

Pulled By: houseroad

fbshipit-source-id: 3f1b4eb91ecf4d808f58c34896d3e628aa2e0af0
2019-09-26 20:44:03 -07:00
3b1b45898e Updating submodules
Summary:
GitHub commits:

e33f2fe68f
f25f6f4101
8c5eacf758
ae45835703
661db3896e
aa25d200c1
ad7794b41e
bc23c7482b

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: fe12edaf711ddaa40c9a04dfb103905e7ed6603f
2019-09-26 18:23:31 -07:00
7e95439e9f batch size 0 tests for etc DNNLOWP operators (#26877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26877

Add batch_size == 0 testings of other DNNLOWP operators not covered by the other diffs.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17596315

fbshipit-source-id: ddf5325f422402cafacbef9114314d92c49fc284
2019-09-26 17:41:33 -07:00
492660768f use new depthwise conv fbgemm interface (#26898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26898

This diff removes call sites using the old depth-wise conv fbgemm interface in Caffe2.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D17515368

fbshipit-source-id: 7200cf12ddac1103402e690596c58f378f95b1e9
2019-09-26 17:33:04 -07:00
257b61495e Revert D17610292: [pytorch][PR] Choose num_threads in parallel_for based on GRAIN_SIZE
Test Plan: revert-hammer

Differential Revision:
D17610292

Original commit changeset: 60b9fe4b0eec

fbshipit-source-id: cfa0be39eef5bf306ef128c134f86a135bb3d5c9
2019-09-26 17:16:18 -07:00
092b2f7fee Make TypeDefault, TypeDerived and VariableType anonymous namespaces (#26882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26882

Reduce binary size by 500kb by making TypeDerived and VariableType anonymous namespaces instead of classes. TypeDefault is also a namespace now but can't be anonymous because VariableType calls into it.his also has the nice side effect that VariableType.h and ${TypeDerived.h} are much smaller because they don't have to list the operator declarations anymore.

ghstack-source-id: 90865080

Test Plan: Measure libtorch.so size

Differential Revision: D17599686

fbshipit-source-id: da3c6641060b7410a7808f36a0a18ee3246ce2d2
2019-09-26 16:59:04 -07:00
771bcce6f1 Fix binary size in schema inference (#26878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26878

Before, for each function signature used in one or more ops, there's a template instantiation that creates the FunctionSchema object for it. As we've seen in the past, all these vector<> constructors in the FunctionSchema object take quite some binary size.

With this PR, we now create an intermediate constexpr std::array that has minimal binary size and can be embedded into the executable, then at runtime we will run a small piece of code that constructs the vector<>'s from it.

This reduces libtorch.so binary size by 800kb
ghstack-source-id: 90842811

Test Plan: measure libtorch.so size

Differential Revision: D17597752

fbshipit-source-id: 53442b565a7747c0d0384b2e3b845729c3daddfd
2019-09-26 16:59:00 -07:00
54b66c8c20 Fix shared_ptr binary size in op registration (#26869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26869

Having a lot of shared_ptr<Functor> cost us ~1.1MB of binary size in libtorch.so.
This PR fixes that.
ghstack-source-id: 90842812

Test Plan: measure libtorch.so size

Differential Revision: D17595674

fbshipit-source-id: 05151047ee8e85c05205b7510a33915ba98bab58
2019-09-26 16:58:56 -07:00
1a5d641de3 Improve binary size of function schema inference (#26860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26860

This improves libtorch.so size by 100-200kb

ghstack-source-id: 90842815

Test Plan: measure libtorch.so size

Differential Revision: D17593224

fbshipit-source-id: effbb5f3b7690b67edaabacf2ff9292a73c991a4
2019-09-26 16:58:52 -07:00
84e298e7b3 Fix c10 registration binary size (#26827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26827

The templates there had a binary size impact of ~20MB. This PR fixes that.

ghstack-source-id: 90842814

Test Plan: build it and see binary size of libtorch.so go down from 95MB to 70MB.

Differential Revision: D17566642

fbshipit-source-id: 57bebffce8e036675a452434bc1a9733f5f2cf6d
2019-09-26 16:58:48 -07:00
a6eec839ea use parallel_for in DepthwiseConvKernel (#26879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26879

Integrate with the at::parallel_for API for mobile.

Test Plan:
- Verified numerical results are the same as before.
- Benchmarked depthwise3x3_winograd layers in MobileNetV2 on two devices:
```
+-------------------+----------------+--------+-----------+----------+------------+-----------+
|       Input       |     Kernel     | Groups | S9 Single | S9 Multi | OP5 Single | OP5 Multi |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
| [1, 32, 112, 112] | [32, 1, 3, 3]  |     32 |      6796 |     1676 |       8520 |      5361 |
| [1, 144, 56, 56]  | [144, 1, 3, 3] |    144 |      8004 |     5523 |       9591 |      4157 |
| [1, 192, 28, 28]  | [192, 1, 3, 3] |    192 |      2771 |      730 |       3345 |      1436 |
| [1, 192, 28, 28]  | [192, 1, 3, 3] |    192 |      2688 |      730 |       3358 |      1979 |
| [1, 384, 14, 14]  | [384, 1, 3, 3] |    384 |      1641 |      461 |       1895 |       874 |
| [1, 384, 14, 14]  | [384, 1, 3, 3] |    384 |      1765 |      444 |       1914 |       870 |
| [1, 384, 14, 14]  | [384, 1, 3, 3] |    384 |      1636 |      448 |       1896 |       852 |
| [1, 384, 14, 14]  | [384, 1, 3, 3] |    384 |      1639 |      452 |       1964 |      1010 |
| [1, 576, 14, 14]  | [576, 1, 3, 3] |    576 |      2575 |      677 |       2854 |      1274 |
| [1, 576, 14, 14]  | [576, 1, 3, 3] |    576 |      2595 |      749 |       2836 |      1291 |
| [1, 960, 7, 7]    | [960, 1, 3, 3] |    960 |      1586 |      432 |       1714 |       675 |
| [1, 960, 7, 7]    | [960, 1, 3, 3] |    960 |      1552 |      421 |       1690 |      1770 |
| [1, 960, 7, 7]    | [960, 1, 3, 3] |    960 |      1680 |      424 |       1690 |       837 |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
|  TOTAL                                      |     36928 |    13167 |      43267 |     22386 |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
```

Differential Revision: D17598249

Pulled By: ljk53

fbshipit-source-id: aaeea221494f11b153a35af2b818a603f1f32ddf
2019-09-26 16:54:48 -07:00
77bfe61ff4 C++ API parity: TensorTest.Data fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26920

Test Plan: Imported from OSS

Differential Revision: D17614135

Pulled By: pbelevich

fbshipit-source-id: 96d70a5e7724338d2829bf006696c2d0ac1025a6
2019-09-26 16:51:24 -07:00
388430f6bc Make quantized max_pool2d error message more specific and less silly
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26918

Test Plan: Imported from OSS

Differential Revision: D17609624

Pulled By: jamesr66a

fbshipit-source-id: 3bc900d5035e9311ab95e3d4a945e95062396afa
2019-09-26 16:48:13 -07:00
b1a09dbec7 Support ceil_mode in quantized maxpool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26916

Test Plan: Imported from OSS

Differential Revision: D17609625

Pulled By: jamesr66a

fbshipit-source-id: a9e1878e7946ee71b6888a91f0dcb2e889939376
2019-09-26 16:48:09 -07:00
55fc377857 Check if QNNPACK is supported before set (#26935)
Summary:
ghstack-source-id: 0e873a56a879cab30b7fa1778e65d9cb89474f05
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26935
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26936

Differential Revision: D17617452

Pulled By: IvanKobzarev

fbshipit-source-id: 4dbcdc55044dd2050b28062baa8b58c8387a1e4e
2019-09-26 16:36:54 -07:00
8d5c2aa71c Set quantized engine backend for mobile in speed_benchmark_torch (#26911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26911

Check if QNNPACK is present as a backend (should always be present on mobile).
If it is present then set the backend to QNNPACK

Test Plan:
Test on mobile
./speed_benchmark_torch --model mobilenet_quantized_scripted.pt  --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20 --print_output True

Imported from OSS

Differential Revision: D17613908

fbshipit-source-id: af96722570a0111f13d69c38ccca52416ea5e460
2019-09-26 16:28:23 -07:00
638c4375de Export index_fill and index_copy, fix caffe2 scatter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23052

Reviewed By: hl475

Differential Revision: D16428486

Pulled By: houseroad

fbshipit-source-id: 8c5905052763fd70197c67aba5f28eeff0790721
2019-09-26 16:23:32 -07:00
d5490c662e batch size 0 tests in BatchMatMul ops (#26874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26874

Add batch_size == 0 testings of BatchMatMul DNNLOWP operator.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17596117

fbshipit-source-id: 029e29e6c2bd7894d83dac46e8ce8484cc92b1c0
2019-09-26 16:08:39 -07:00
ec1f0f08f1 batch size 0 support in norm operators (#26894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26894

Add batch_size == 0 testings of norm DNNLOWP operators.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17595416

fbshipit-source-id: 23086ecf8818be30da031eb4fc2922daea79ea7c
2019-09-26 16:08:35 -07:00
f99bc714c7 Migrate lt and lt_ from the TH to Aten (#25998)
Summary:
https://github.com/pytorch/pytorch/issues/24593
https://github.com/pytorch/pytorch/issues/24727

**torch.lt(Tensor a, Tensor b)**
will compute common dtype (highest) based on inputs and then compare values. The result will be Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> x < y
tensor([True])
```
Previously it was impossible to make comparison of two tensors with different dtype.

**torch.lt(Tensor a, Tensor b, out=c)**
will compute common dtype (highest) based on inputs and then compare values. The result can be populated only to Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> z = torch.empty([1], dtype=torch.bool)
>>> torch.lt(x, y, out=z)
tensor([True])
```
Previously it was impossible to make comparison of two tensors with different dtype. Also previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result.

**a.lt_(Tensor b)**
Expects that a and b has same dtype, otherwise it's possible to get an overflow(Example: 'a' is uint8, 'b' is float32. 'a' will be promoted to float32 and the result will be also float32. Then it will be casted back to uint8 so potential for overflow). Will not compute common dtype. Result will have type of a.
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> x < y
tensor([True])
```
Works similar to previous implementation.

**torch.lt(Tensor a, Scalar b)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare.
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> x < 0.5
tensor([True])

>>> x = torch.tensor([0], dtype=torch.int)
>>> x < 0.5
tensor([True])
```
Fix https://github.com/pytorch/pytorch/issues/22301.

**torch.lt(Tensor a, Scalar b, out=c)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. The result can be populated only to Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> torch.lt(x, 0.5, out=z)
tensor([True])
```
Previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. The rest works similar to previous implementation.

**torch.lt_(Tensor a, Scalar b)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. Result will have type of a.
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> x.lt_(1)
tensor([1], dtype=torch.int32)
>>> x = torch.tensor([0], dtype=torch.int)
>>> x.lt_(1.0)
tensor([1], dtype=torch.int32)
```
Works similar to previous implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25998

Differential Revision: D17431853

Pulled By: ifedan

fbshipit-source-id: b5effc6a5d9b32da379395b32abc628b604faaf7
2019-09-26 16:05:27 -07:00
9dd8a129de Fix Vec256<T>::abs() for floating point when applied on -0.0 (#26422)
Summary:
Currently when a Vec256<T> (base) object contains -0.0, Vec256<T>::abs()
would not produce 0.0, but -0.0 instead. This commit fixes this issue.
This bug will mostly affect CPUs without AVX support, such as ARM,
PowerPC,  and older Intel models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26422

Differential Revision: D17607346

fbshipit-source-id: e8d4595f0e88ad93018a61f89b9e3dcada485358
2019-09-26 15:55:55 -07:00
755b7e484f Remove an unused function propagate_names_if_namedtensor_enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26176

Differential Revision: D17452289

Pulled By: yf225

fbshipit-source-id: 46926e6774a37e40141763c598b6fe84118ba5be
2019-09-26 15:47:55 -07:00
ac99936553 No sccache (#26059)
Summary:
Proposed change:
Check whether sccache is available before running it to show statistics.
(If not available, simply skip it.  Showing these stats isn't mandatory to build.)

https://github.com/pytorch/pytorch/issues/26058
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26059

Differential Revision: D17364967

Pulled By: vincentqb

fbshipit-source-id: 0250c6ba5573bc0b292ae8e2188b3e1fa700409e
2019-09-26 15:45:14 -07:00
6f92aa2f82 Use intrinsics for trigonometric functions on CPU (#26431)
Summary:
A little benchmarking shows real improvements.

Benchmarking script:

```python
import timeit

for n, t in [(10_000, 8000),
             (100_000, 800)]:
    for dtype in ('torch.float', 'torch.double'):
        print(f'================ dtype {dtype}, {t} times ================================')
        for op in ('sin', 'sinh', 'cos', 'cosh', 'tan'):
            print(f'a.{op}() (a.numel() == {n}) for {t} times')
            print(timeit.timeit(f'a.{op}()',
                                setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})',
                                number=t))
```

RHEL 7.7, Debug build, gcc 8.3, turbo off:

Before this commit:

```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
2.690067914001702
a.sinh() (a.numel() == 10000) for 8000 times
7.025003784001456
a.cos() (a.numel() == 10000) for 8000 times
2.691191975001857
a.cosh() (a.numel() == 10000) for 8000 times
6.7473940790005145
a.tan() (a.numel() == 10000) for 8000 times
39.14060311800131
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
5.442704386001424
a.sinh() (a.numel() == 10000) for 8000 times
6.778444146999391
a.cos() (a.numel() == 10000) for 8000 times
5.429267812000035
a.cosh() (a.numel() == 10000) for 8000 times
6.625128638002934
a.tan() (a.numel() == 10000) for 8000 times
6.888564799002779
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
2.343601189000765
a.sinh() (a.numel() == 100000) for 800 times
6.4455943499997375
a.cos() (a.numel() == 100000) for 800 times
2.3377084899984766
a.cosh() (a.numel() == 100000) for 800 times
6.357531049001409
a.tan() (a.numel() == 100000) for 800 times
46.93665131099988
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
5.122997600999952
a.sinh() (a.numel() == 100000) for 800 times
6.233409892000054
a.cos() (a.numel() == 100000) for 800 times
5.071856587001093
a.cosh() (a.numel() == 100000) for 800 times
6.0974346790026175
a.tan() (a.numel() == 100000) for 800 times
6.5203832980005245
```

After this commit:

```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
1.5905082239987678
a.sinh() (a.numel() == 10000) for 8000 times
6.8216283560032025
a.cos() (a.numel() == 10000) for 8000 times
1.630263119997835
a.cosh() (a.numel() == 10000) for 8000 times
6.738510535000387
a.tan() (a.numel() == 10000) for 8000 times
1.7482984089983802
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
2.0000513029990543
a.sinh() (a.numel() == 10000) for 8000 times
6.876631892999285
a.cos() (a.numel() == 10000) for 8000 times
2.0672772910002095
a.cosh() (a.numel() == 10000) for 8000 times
6.678993797999283
a.tan() (a.numel() == 10000) for 8000 times
2.3625312719996145
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
1.2381345620015054
a.sinh() (a.numel() == 100000) for 800 times
6.400261008999223
a.cos() (a.numel() == 100000) for 800 times
1.284327255001699
a.cosh() (a.numel() == 100000) for 800 times
6.332740200999979
a.tan() (a.numel() == 100000) for 800 times
1.392364119998092
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
1.6348750549987017
a.sinh() (a.numel() == 100000) for 800 times
6.312609101998532
a.cos() (a.numel() == 100000) for 800 times
1.700102185997821
a.cosh() (a.numel() == 100000) for 800 times
6.141731683001126
a.tan() (a.numel() == 100000) for 800 times
1.9891383869980928
```

RHEL 7.7, Release build, gcc 8.3, turbo off:

Before this commit:

```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
1.0220722929989279
a.sinh() (a.numel() == 10000) for 8000 times
0.9413958889999776
a.cos() (a.numel() == 10000) for 8000 times
1.013564700999268
a.cosh() (a.numel() == 10000) for 8000 times
0.9127178879971325
a.tan() (a.numel() == 10000) for 8000 times
25.249723791999713
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
3.3466339340011473
a.sinh() (a.numel() == 10000) for 8000 times
0.909793314000126
a.cos() (a.numel() == 10000) for 8000 times
3.4019737700000405
a.cosh() (a.numel() == 10000) for 8000 times
0.918371007002861
a.tan() (a.numel() == 10000) for 8000 times
4.902741645997594
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.9870414770011848
a.sinh() (a.numel() == 100000) for 800 times
0.9038734009991458
a.cos() (a.numel() == 100000) for 800 times
0.9786967349973565
a.cosh() (a.numel() == 100000) for 800 times
0.8774048919985944
a.tan() (a.numel() == 100000) for 800 times
30.299459709000075
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
3.3855797659998643
a.sinh() (a.numel() == 100000) for 800 times
0.8303290260009817
a.cos() (a.numel() == 100000) for 800 times
3.3702223940017575
a.cosh() (a.numel() == 100000) for 800 times
0.822016927999357
a.tan() (a.numel() == 100000) for 800 times
4.889868417001708
```

After this commit:

```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
0.542676458000642
a.sinh() (a.numel() == 10000) for 8000 times
0.90598970100109
a.cos() (a.numel() == 10000) for 8000 times
0.6119738140005211
a.cosh() (a.numel() == 10000) for 8000 times
0.902145998999913
a.tan() (a.numel() == 10000) for 8000 times
0.7713400800021191
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
0.609621113002504
a.sinh() (a.numel() == 10000) for 8000 times
0.8993683010012319
a.cos() (a.numel() == 10000) for 8000 times
0.6876834479990066
a.cosh() (a.numel() == 10000) for 8000 times
0.8859291590015346
a.tan() (a.numel() == 10000) for 8000 times
0.9243346840012236
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.5219837559998268
a.sinh() (a.numel() == 100000) for 800 times
0.8755807839988847
a.cos() (a.numel() == 100000) for 800 times
0.5899826130007568
a.cosh() (a.numel() == 100000) for 800 times
0.8757360769996012
a.tan() (a.numel() == 100000) for 800 times
0.7496912290007458
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.578619064999657
a.sinh() (a.numel() == 100000) for 800 times
0.7951330530013365
a.cos() (a.numel() == 100000) for 800 times
0.6442456569966453
a.cosh() (a.numel() == 100000) for 800 times
0.7975544330001867
a.tan() (a.numel() == 100000) for 800 times
0.875703464000253
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26431

Differential Revision: D17470502

fbshipit-source-id: 82e930993c7b2827b04cbe5f9a962913a6069b62
2019-09-26 15:38:36 -07:00
5c67b01467 Switch internal CUDA build to C++14 (#26757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757

This doesn't switch any open source builds or CI.
The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11.
This diff changes that to C++14.

Because this doesn't change anything open source, the risk of this is low.
ghstack-source-id: 90728524

Test Plan: waitforsandcastle

Differential Revision: D17558142

fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a
2019-09-26 14:57:21 -07:00
bf1d957dc8 Fix the Bernoulli distribution sampler (#26864)
Summary:
The current Bernoulli distribution sampler is slightly off in that it returns true slightly too often. This is most obvious at very low p values, like p = 0, although it theoretically occurs at every probability. See  https://github.com/pytorch/pytorch/issues/26807.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26864

Differential Revision: D17610459

Pulled By: ezyang

fbshipit-source-id: 28215ff820a6046822513f284793e7b850d38438
2019-09-26 14:14:57 -07:00
e425bdb832 Choose num_threads in parallel_for based on GRAIN_SIZE (#26886)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24080

The OpenMP implementation of `parallel_for` now chooses the number of cores to use on a sliding scale between 1 and `OMP_NUM_THREADS`. This prevents wasteful core usage on many-core systems such as in https://github.com/pytorch/pytorch/issues/24080.

This is also consistent with the comment on GRAIN_SIZE:
e327df3965/aten/src/ATen/Parallel.h (L10-L11)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26886

Differential Revision: D17610292

Pulled By: ezyang

fbshipit-source-id: 60b9fe4b0eecb41a28c1488e3a575674c8f7000c
2019-09-26 14:11:43 -07:00
9f0deb4725 Get rid of -u (expansion of undefined variable) setting (#26907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26907

Somehow CircleCI broke this on update to their OS X workers;
the error looks like

    /bin/bash: line 1: PROMPT_COMMAND: unbound variable

I'm not sure if I've killed all the occurrences that are necessary,
let's see!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17607486

Pulled By: ezyang

fbshipit-source-id: 5e9a7ff69d4b18e759965bf97c67d38404841187
2019-09-26 13:30:31 -07:00
b2f671a3fb fix typo in job name: nigthly->nightly
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26881

Differential Revision: D17607874

Pulled By: kostmo

fbshipit-source-id: 758a7c5135eb04ffca8231b5d907ababbe55e74b
2019-09-26 12:26:36 -07:00
43b07ff2c4 Fix nuclear norm with requires_grad=True (#26303)
Summary:
Changelog:
- Selectively assign compute_uv in the at::svd used internally in the implementation of at::nuclear_norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26303

Test Plan:
- Add tests in common_method_invocations.py

Refixes: https://github.com/pytorch/pytorch/issues/18275

Differential Revision: D17605357

Pulled By: ezyang

fbshipit-source-id: d87d60afe678e2546dca6992ea66f2daeb6b0346
2019-09-26 12:08:25 -07:00
0e3389dced Fix circular deps in loading (#26758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26758

This PR changes the order in which we import classes and functions so
that is is no longer necessary for them to defined in order in a file,
or for there to be proper import statements in the exported file.

Actually importing a function/class now is driven by the need to resolve
the entity during unpickling, type resolution, or value resolution.

While this should allow significant simplification to the code that
serializes classes, this work has not been done yet in order to avoid
inevitable forward compat issues in the transition period.

Notes:
* Individual functions have been replaced with a SourceImporter object
  that exposes a resolveType method. This method loads the type if
  it has not been loaded yet, potentially parsing  (but not loading)
  the file it exists in if that file hasn't been parsed yet.
* Some legacy functionality needed to be added as a method to this object
  since the old format still used some of this logic for class resolution.

Test Plan: Imported from OSS

Differential Revision: D17558989

Pulled By: zdevito

fbshipit-source-id: 7eae3470bcbd388c4de463e3462d527776ed46c6
2019-09-26 11:39:16 -07:00
78a52549e4 Refactor dispatch structure so fallback code lives inline. (#26367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26367

This is necessary for boxed fallback, as boxed fallback must
live inside the templated code.  Error reporting code never
has to be in templated code, so that stays in the C++ file.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17448556

Pulled By: ezyang

fbshipit-source-id: 8244589251e359886dbfcd1c306ae6c033c7a222
2019-09-26 10:59:55 -07:00
272d7c021f Change calling convention of ATenDispatch from getOp to callUnboxed. (#26857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26857

Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle).  Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code).  This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.

Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17602540

Pulled By: ezyang

fbshipit-source-id: 6927e66924405f5bf5cb67f1b57e49bc9a0f58ec
2019-09-26 10:59:50 -07:00
0bfe12d04c Updating submodules
Summary:
GitHub commits:

cfdf778eaf
7f55d6c14f

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 2523bce9933cb27b7a02da1650d7ad6f05b0ff30
2019-09-26 09:49:27 -07:00
d3cab6571e batch size 0 tests for Quantize/Dequantize DNNLOWP ops (#26873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26873

Add batch_size == 0 testings of Quantize and Dequantize DNNLOWP operators.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17595077

fbshipit-source-id: 4a4f60d471a1b1b5746131b08623aa8b1d0059f5
2019-09-26 08:28:03 -07:00
78b0c58a9d batch size 0 support in FC DNNLOWP operators (#26872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26872

Add batch_size == 0 handlings in int8 FC operators. Added associated test cases.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17595385

fbshipit-source-id: d271b7bdbaf723fd6dee6f194da8c7fdfeef5fa2
2019-09-26 08:24:17 -07:00
41c1cc2f51 batch size 0 tests for element-wise DNNLOWP ops (#26870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26870

Add batch_size == 0 testings of element-wise DNNLOWP operators.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17595162

fbshipit-source-id: f358748b56b236cce8736bac16054ea84541bf7f
2019-09-26 08:22:08 -07:00
1aaf4810bb batch size 0 support in Conv DNNLOWP ops (#26871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26871

Add batch_size == 0 handlings in int8 Conv operators. Added associated test cases.

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17594809

fbshipit-source-id: 54506afc7ef4bfbfed0272c52d2842f6e144f725
2019-09-26 08:18:56 -07:00
2991bfdbe0 Add bitwise distributed reduction ops (#26824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26824

These ops are named after the bitwise reduction ops in MPI.

This is based on the work done by knottb in #22449.

Closes #22449.

Test Plan: Imported from OSS

Differential Revision: D17600210

Pulled By: pietern

fbshipit-source-id: 44c7041ce01bc5de170a4591c5a696e4f24431ef
2019-09-26 08:09:49 -07:00
dec0b6b792 Add some missing constructors to IValue.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26806

Test Plan: Imported from OSS

Differential Revision: D17581325

Pulled By: ezyang

fbshipit-source-id: 1340ed949a649d11cc821775a33f84513e9a5944
2019-09-26 07:56:40 -07:00
60b57d960f Make resize_as_ generic, so XLA works. (#26809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26809

resize_as_ shouldn't do multiple dispatch on its second argument.  Because it
currently has per CPU/CUDA dispatch, however, it will do proper dispatch on all
arguments. Bad!

There is only a very minor downside to this patch which is we have an extra
dynamic dispatch now.

Thank you Ailing for reporting this problem.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17581324

Pulled By: ezyang

fbshipit-source-id: e62cbb6cf497a7d6e53c4a24b905fef7a29b0826
2019-09-26 07:56:36 -07:00
8fb756d3b2 batch size 0 support in ChannelShuffle DNNLOWP op (#26858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26858

Handle batch size = 0 in ChannelShuffle operator

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D17591041

fbshipit-source-id: 63373aa752406c1f38401c3e93d8e1954ce7281e
2019-09-26 00:40:07 -07:00
0a8a779abe Add more inplace arguments to quantization top level API (#26782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782

At least we should be consistent on top-level APIs and prepare/convert/etc.

Logic is inplace=False by default but top-level APIs take care of doing fewer copies.

Also renames always-inplace methods like add_observer to have underscore in the end.

One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around

Test Plan: Imported from OSS

Differential Revision: D17595956

Pulled By: dzhulgakov

fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
2019-09-26 00:07:07 -07:00
5231699de2 Enable batch_size = 0 support in DNNLOWP Concat operator (#26849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26849

We were having division-by-zero errors when one of the input tensor dimension is 0 . Examples: P111481720 and P111481374
This diff adds unit tests for empty input tensors and fixes division-by-zero errors in the partition function.

Test Plan: buck test caffe2/caffe2/quantization/server:concat_dnnlowp_op_test -- --stress-runs=100

Reviewed By: jianyuh

Differential Revision: D17574566

fbshipit-source-id: 1d2c21308bde99b3c4f2da82f53201eec42b5d8b
2019-09-26 00:03:40 -07:00
60372dc713 remove backward functions from jit-op-registry for mobile build (#26851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851

Add codegen option to remove backward ops from jit-op-registry as they are not
likely to be used for inference only mobile build.

Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219.

Test Plan: - build and integrate with demo app;

Differential Revision: D17587422

Pulled By: ljk53

fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a
2019-09-25 23:17:25 -07:00
ed2607486f add mobile friendly at:parallel_for backend
Summary:
This diff implemented at::parallel_for()/parallel_reduce() and other
ATen/Parallel.h APIs for mobile using caffe2::ThreadPool.

caffe2::ThreadPool doesn't support submitting individual tasks
separately and running them in parallel - all tasks need to be submit in
one batch which will lock the thread pool until all of them finish - as a
result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface
and reuse at::parallel_for() implementation in ParallelNative.h. Because
of this constraint, intraop_launch() / intraop_launch_future() are not
supported yet.

This diff doesn't touch inter-ops pool - it's still default native c10
thread pool. Will work on it when it's widely used.

Test Plan: - This is early draft to receive feedback. Will do more thorough tests.

Differential Revision: D17543412

Pulled By: ljk53

fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30
2019-09-25 22:33:06 -07:00
14d7d5718e Improvements to GuardElimination and InsertBailouts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25430

Differential Revision: D17584722

Pulled By: Krovatkin

fbshipit-source-id: 9db099b904d71572c1bf3aef5419d38435cecbb5
2019-09-25 21:23:55 -07:00
20ed6ba077 Updating submodules
Summary:
GitHub commits:

f767351c4b

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: d0bfc9e5e62669ada8d56b853490a373eb8ba2f7
2019-09-25 21:14:38 -07:00
058ba0e761 Remove unnecessary functions and cleanup code in quantization.cpp.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26852

Test Plan: Imported from OSS

Differential Revision: D17587742

Pulled By: ZolotukhinM

fbshipit-source-id: f345ea4d524fde9741d6629dec1ea8ab870e49a5
2019-09-25 20:57:55 -07:00
8f359a48a6 Fix building with PARALLEL_BACKEND=NATIVE_TBB (#26742)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/26721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26742

Test Plan:
```
export USE_OPENMP=0
export USE_TBB=1
export BLAS=MKL
export MKL_THREADING=TBB
export MKLDNN_THREADING=TBB
export PARALLEL_BACKEND=NATIVE_TBB
export USE_CUDA=0
python setup.py build
```

Reviewed By: dskhudia

Differential Revision: D17586233

Pulled By: ilia-cher

fbshipit-source-id: 8e8befa6aa776b8c2b27bb4b79a3bff33dbcba7e
2019-09-25 20:37:25 -07:00
c25c507ffe Remove three unused declaration. (#26699)
Summary:
`frac()` in `Vec256<int{16,32,64}_t>` is not overridden.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26699

Differential Revision: D17549502

Pulled By: soumith

fbshipit-source-id: 87c65286032bfc88c447ec4eef1e3ebc73da5d27
2019-09-25 20:22:02 -07:00
f37aa2de12 Try to disable annoying hypothesis warnings again (#26853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26853

This is the same as https://github.com/pytorch/pytorch/pull/25188 but we add a version check for if the hypothesis version is too old

Test Plan: Imported from OSS

Differential Revision: D17589086

Pulled By: jamesr66a

fbshipit-source-id: b968965719593ff989d612384e00dfb823cf0a73
2019-09-25 20:21:58 -07:00
20ebd13f0a Re-write of tensor-scalar quantized add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766

Test Plan: Imported from OSS

Differential Revision: D17587105

Pulled By: jamesr66a

fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464
2019-09-25 20:19:28 -07:00
1d55616aa2 Fix broken failure messages for OverloadedMethodValue
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26846

Test Plan: Imported from OSS

Differential Revision: D17587050

Pulled By: jamesr66a

fbshipit-source-id: e5f3ea05b496afae15994b539f018ed0499ca62b
2019-09-25 20:16:46 -07:00
df16fb9ca1 Throw if someone tries to torch.save() quantized modules (#26828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26828

Pickle serialization for quantized modules is currently broken by https://github.com/pytorch/pytorch/issues/24045, so let's be loud and fail if the user tries to do it

Test Plan: Imported from OSS

Differential Revision: D17579127

Pulled By: jamesr66a

fbshipit-source-id: 3deccac7e4590c6f648f22bb79c57badf3bf0487
2019-09-25 19:55:17 -07:00
d842435c01 Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26703

Test Plan: Imported from OSS

Differential Revision: D17543131

Pulled By: ZolotukhinM

fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941
2019-09-25 19:18:46 -07:00
9df887df02 Use optimized_graph in graph_executor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26705

Test Plan: Imported from OSS

Differential Revision: D17543281

Pulled By: ZolotukhinM

fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9
2019-09-25 19:18:42 -07:00
ed82a28cf0 QEngine::QNNPACK enabled, module.eval()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26855

Test Plan: Imported from OSS

Differential Revision: D17589837

Pulled By: IvanKobzarev

fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7
2019-09-25 18:11:08 -07:00
2eb592324f Migrate multinomial from the TH to Aten (CUDA) (#26481)
Summary:
https://github.com/pytorch/pytorch/issues/24604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26481

Differential Revision: D17489859

Pulled By: ifedan

fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3
2019-09-25 17:57:05 -07:00
90ffab6e37 enable double backward for non-cudnn LSTM and GRU (#26660)
Summary:
An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels.
This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically.
The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used.
The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement.
The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit.
cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN.
Thanks to mcarilli whose approach to double backward in weight norm I copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660

Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled.

Differential Revision: D17581489

Pulled By: ngimel

fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa
2019-09-25 17:38:18 -07:00
91549ef6c8 Move the CUDA implementation of log to ATen. (#26494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494

Close #24586

Test Plan: Imported from OSS

Differential Revision: D17572497

Pulled By: VitalyFedyunin

fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945
2019-09-25 17:04:08 -07:00
7fc06ea541 Bytecode export flow (#25187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187

The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
    * The module object (in data.pkl) is the same as the original JIT model.
    * The serializer is dependent on pickle only (no protobuf or Json).
    * The major functionality is forked in ScriptModuleSerializer2::serialize().
    * The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).

The output layout looks like:

* folders of methods.
    * In each method folder (for example, forward/):
        * bytecode.pkl: instructions and operators
        * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.

Test Plan: Imported from OSS

Differential Revision: D17076411

fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046
2019-09-25 16:35:45 -07:00
660d9e24dd Highlighting in the doc that square root comes before adding epsilon
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735

Test Plan: Imported from OSS

Differential Revision: D17558505

Pulled By: vincentqb

fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4
2019-09-25 15:52:28 -07:00
08425d8c01 Fix CUDA named tensor copy_ (#26829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26829

The TensorIterator loop for `copy_` uses operations that are currently
unsupported by named tensors. The solution is to wrap `copy_` in a
function that does the name propagation and ignore names when running
the implementation of `copy_`. There is no test case because I'm not
sure how to trigger the incorrect behavior, but there is definitely code
in CUDA copy that doesn't support named tensors (expand_as isn't
supported):

aaf30cdf36/aten/src/ATen/native/cuda/Copy.cu (L141-L148)

Test Plan: - [namedtensor ci]

Differential Revision: D17577310

Pulled By: zou3519

fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2
2019-09-25 15:45:35 -07:00
d43480d6d1 support iterables, rangevalue in list comprehensions (#26768)
Summary:
Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works.

Fixes https://github.com/pytorch/pytorch/issues/26693
Fixes https://github.com/pytorch/pytorch/issues/22483
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26768

Differential Revision: D17562762

Pulled By: eellison

fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251
2019-09-25 15:41:32 -07:00
a23109e12e Do not call cpuinfo_initialize() on other than x86 arch. (#26265)
Summary:
cpuinfo_initialize() was not implemented for s390 arch.
cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc.
Without this patch an unnecessary error log is printed in s390 arch:
Error in cpuinfo: processor architecture is not supported in cpuinfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26265

Differential Revision: D17452301

Pulled By: izdeby

fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a
2019-09-25 15:08:45 -07:00
9383601523 fix to operate on cuda kernel with clang and libc++ (#25553)
Summary:
We find a bug about `std::tuple` with nvcc.

In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++.

c36b77fcda/aten/src/ATen/native/cuda/Loops.cuh (L109-L111)

The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization.

The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25553

Differential Revision: D17582118

Pulled By: yf225

fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce
2019-09-25 15:03:28 -07:00
5fc52482cf torch.load default encoding change to 'utf-8' (#26421)
Summary:
Default encoding when using torch.load to 'utf-8'

This commit provides changes for cases where user tries to torch.load
a pickled module with non-ASCII characters in the docstring as
discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii'
to 'utf-8'. Documentation for `torch.load` was updated and two tests
(loading py2 unicode module with unicode in it; error throwing when
user explicitly sets wrong encoding) were written.

~~This commit provides changes for better error handling in cases
where user tries to `torch.load` a pickled module with non-ASCII
characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~

Ping ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421

Differential Revision: D17581633

Pulled By: yf225

fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620
2019-09-25 14:59:02 -07:00
92a2d4232a Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (#26815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26815

This PR adds named tensor support for:
- any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not`

In addition, it adds smoke tests for a variety of tensor attributes and
fns:
- is_shared, is_signed
- retain_grad, register_hook

Test Plan: - [namedtensor ci]

Differential Revision: D17575905

Pulled By: zou3519

fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4
2019-09-25 14:56:28 -07:00
4bd1da1458 Revert D17473200: [pytorch][distributed] add function to get NCCL version for logging
Test Plan: revert-hammer

Differential Revision:
D17473200

Original commit changeset: 4881ed5221b3

fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576
2019-09-25 14:53:59 -07:00
81bbb7ebab Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (#26592)
Summary:
function_ref is pulled over from LLVM.  It is to callables what StringRef is to strings.
This allows it to be substantially lighter weight, particularly in code size.  That comes
at the cost of not being usable in situations where the callable's lifetime is shorter
than the function_ref.  This means it is suitable for callback-like scenarios, but not
for situations where the callable needs to be stored.  In converting TensorIterator,
I only encountered one situation that required refactoring to comply with function_ref's
constraints.

In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26592

Differential Revision: D17516202

fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48
2019-09-25 14:48:50 -07:00
5379e87a32 Cuda101 upgrade (#26823)
Summary:
test run: https://github.com/pytorch/pytorch/issues/26732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823

Reviewed By: soumith

Differential Revision: D17576095

Pulled By: mingbowan

fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b
2019-09-25 14:44:12 -07:00
b6a1d618b2 Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm
Test Plan: revert-hammer

Differential Revision:
D17565828

Original commit changeset: 85f605a7b3fa

fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37
2019-09-25 14:24:18 -07:00
b5d15315d8 Improve C++ maxpool and avgpool (#26521)
Summary:
This PR makes the following improvements:
1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`)
2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically).
3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26521

Differential Revision: D17507358

Pulled By: yf225

fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9
2019-09-25 13:52:58 -07:00
167722d36e Typevar matching fix + implicit conversions from Scalar to int/float (#26453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26453

Previously, schema matching would incorrectly widen typevar bindings
when later occurrences were supertypes of earlier ones. This allowed
callsites like `floatlist.append(tensor.item())` to pass the typechecker,
causing a runtime assert (issue #24856).

An earlier, reverted fix (#25136) insisted on strict equality across all
occurrences of a typevar, necessitating explicit casts around Scalar-typed
arguments to int- or float-typed parameters, like `tensor.item()` above.
This was per the original type system design, but turned out to break
existing user code that relied on the de facto dynamic downcast. (The
error required a specialized list representation.)

The current fix includes the prevention of typevar widening, but
adds logic to insert implicit conversions from Scalar to float or int
as needed to satisfy a matched schema.

Test Plan: Imported from OSS

Differential Revision: D17470598

Pulled By: bhosmer

fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659
2019-09-25 13:49:55 -07:00
03007b3dda Quantized Interpolate Kernel(upsample_bilinear2d) (#26631)
Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.

For nhwc performance improvement:
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
1.999044418334961       2.5860953330993652      1.2936657681940702
GB/s float      GB/s quant
1.6192056416115257      0.3129103516188541
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.02730655670166        2.6061582565307617      1.2855274639721328
GB/s float      GB/s quant
1.596632728927902       0.3105014816242217
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0180463790893555      2.4047350883483887      1.1916153728010588
GB/s float      GB/s quant
1.603959172365819       1.3460376636426636

===========with nhwc handling===========

**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0913314819335938      0.09696483612060547     0.04636512047863123
GB/s float      GB/s quant
1.5477527249803915      8.345458337015
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.1065664291381836      0.09959936141967773     0.04728042754408879
GB/s float      GB/s quant
1.5365591871338384      8.124710725706763
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044203281402588       0.6003522872924805      0.29368521846837126
GB/s float      GB/s quant
1.5834354779917448      5.391607675216635
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631

Differential Revision: D17521498

Pulled By: llyfacebook

fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103
2019-09-25 13:43:43 -07:00
334e78b1ce Fix Future default constructor missing for ParallelNative
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26739

Test Plan: Imported from OSS

Differential Revision: D17577908

Pulled By: bwasti

fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8
2019-09-25 12:52:11 -07:00
63fd10549a Export baddbmm (#25738)
Summary:
Added ONNX export for baddbmm in opset9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25738

Reviewed By: hl475

Differential Revision: D17565828

Pulled By: houseroad

fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5
2019-09-25 12:28:06 -07:00
3288da064f Fix CI docker builds (#26704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26704

nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :(

ghstack-source-id: 90714191

Test Plan: build docker images on Jenkins

Differential Revision: D17543120

fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25
2019-09-25 12:25:21 -07:00
ae2a8fea3d Validate Docker version in CI. (#26496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26496

It is a BAD BAD idea to deploy Docker versions which are not deployed
(per ossci-job-dsl) because those versions will get GC'ed after two
weeks.  At the moment, there is no verification that your Docker version
is deployed.  This adds an Azure job to check this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17575100

Pulled By: ezyang

fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a
2019-09-25 12:18:52 -07:00
f7742d2b21 Prepare for Cocoapods 1.3 Release (#26751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26751

### Summary

We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below:

1.  Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading.
2. Verify the binary locally  - Run tests on both arm64 and simulator
3. Publish the cocoapods officially

### Test plan

- podspec lint command succeeds
    - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`

Test Plan: Imported from OSS

Differential Revision: D17577131

Pulled By: xta0

fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac
2019-09-25 12:16:06 -07:00
f396b019b1 Remove one unnecessary copy of the output during the type promotion. (#26816)
Summary:
Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain.

BEFORE

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

AFTER

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26816

Differential Revision: D17573455

Pulled By: VitalyFedyunin

fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2
2019-09-25 12:06:51 -07:00
d9055319d4 add function to get NCCL version for logging (#26583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26583

Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be
used for logging NCCL version in exception messages.

Test Plan: See above

Differential Revision: D17473200

fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64
2019-09-25 11:56:31 -07:00
aaf30cdf36 Port CUDA implementation of expm1 to ATen (#26598)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24562
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598

Differential Revision: D17531503

Pulled By: VitalyFedyunin

fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48
2019-09-25 11:11:58 -07:00
729f8425f7 Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556)
Summary:
Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556

Test Plan:
_Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch.

_Performance_ - All measurements below on Pixel 2

**Before**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25"
>
> Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
>
> Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671

**After**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>
> Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
> Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425
>

Differential Revision: D17533311

Pulled By: AshkanAliabadi

fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d
2019-09-25 11:02:27 -07:00
1cae5195a6 Refactor checked_tensor_unwrap to take DeviceType instead of Backend (#26290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26290

Fixes #26206

Happily, I also can delete the dead Dense***Tensor cases, since they
are for the defunct THS backend.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17404368

Pulled By: ezyang

fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8
2019-09-25 10:59:07 -07:00
b0bb5e338e quantized_tensor tests (#26784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26784

Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D17566575

fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990
2019-09-25 10:33:30 -07:00
25cd3c6b7d Lets generic tests use multiple devices (#26594)
Summary:
- Separates device type from default (test) device
- Adds multidevice decorator
- Updates generic tests to use multidevice decorator where applicable

TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality.

Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594

Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'.

Differential Revision: D17568910

Pulled By: mruberry

fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128
2019-09-25 10:16:22 -07:00
db5791d543 autodiff changes to enable profiling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397

Differential Revision: D17565747

Pulled By: Krovatkin

fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed
2019-09-25 10:11:44 -07:00
0cb10d7ebf move more functions to InsertObserversHelper (#26773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26773

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17563673

fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08
2019-09-25 10:06:05 -07:00
f0e507cbaf Updating submodules
Summary:
GitHub commits:

5096b0ae1f
ecd6c10ea3
67abe5d0aa
90580f7e06
7f98961c7b
f8da6e6e36

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15
2019-09-25 09:52:18 -07:00
ee6cdb5726 Upgrade sleef to v3.4.0. (#26749)
Summary:
This reset the sleef submodule to upstream, since everything else except
a small build sanity fix
<191f655caa>
has been merged to upstream. The new release includes an important fix
for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431.

This should supersede https://github.com/pytorch/pytorch/issues/20536.

Close https://github.com/pytorch/pytorch/issues/20536.

cc colesbury resistor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749

Differential Revision: D17572783

Pulled By: ezyang

fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc
2019-09-25 08:25:43 -07:00
0f1fbc0eb2 Hub improvements (#26723)
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/25980.
Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles.
We can still manage to get it work with tarfile, but let's delay it when there's an ask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723

Differential Revision: D17551795

Pulled By: ailzhang

fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e
2019-09-25 08:21:50 -07:00
61dd485b3a Revert D17549623: Add some missing constructors to IValue.
Test Plan: revert-hammer

Differential Revision:
D17549623

Original commit changeset: 8880c09d85a1

fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd
2019-09-25 07:47:06 -07:00
f094afe4b9 Updating submodules
Summary:
GitHub commits:

6668c21398
189aebb344

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d
2019-09-25 07:27:36 -07:00
c5b57aa57d Add some missing constructors to IValue. (#26718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26718

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17549623

Pulled By: ezyang

fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb
2019-09-25 07:21:57 -07:00
037cfce745 Remove unnecessary include from TensorBody (#26360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26360

This is not just for aesthetics: this include blocks the inclusion
of headers like ivalue.h from ATenDispatch.h (as it causes an
include cycle.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17429163

Pulled By: ezyang

fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c
2019-09-25 07:21:52 -07:00
52614f5fd9 Implement multiple dispatch in boxed c10 dispatcher (#26118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26118

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17404367

Pulled By: ezyang

fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8
2019-09-25 07:21:48 -07:00
b56ad744a2 Delete backwards compatibility Backend overload for registerOp (#25914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17284083

Pulled By: ezyang

fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec
2019-09-25 07:21:44 -07:00
3346759774 Named tensor support for logsumexp, mode, kthvalue, median, min, max (#26563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26563

This adds name inference rules for pre-existing logsumexp, mode,
kthvalue, and median ops. Also adds overloads so that they can take
`Dimname` dimensions.

There are a lot of min/max overloads. This PR adds name inference to
the following overloads for (both) min and max:
- min(Tensor, int dim)
- min(Tensor, Dimname dim)
- min(Tensor)  (full reduction)

Test Plan: - new tests and [namedtensor ci]

Differential Revision: D17557050

Pulled By: zou3519

fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171
2019-09-25 07:04:31 -07:00
002c250139 Expose a torch.result_type and simplify tensor iterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012

Test Plan: Imported from OSS

Differential Revision: D17556197

Pulled By: nairbv

fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c
2019-09-25 06:52:23 -07:00
5001ec4252 Support Negative Axis in Size in ONNX (#26436)
Summary:
Currently, we export invalid ONNX models when size() is used with a negative dim.
This PR fixes the issue and allows exporting these models to ONNX (ex: input.size(-1)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26436

Reviewed By: hl475

Differential Revision: D17565905

Pulled By: houseroad

fbshipit-source-id: 036bc384b25de77506ef9fbe24ceec0f7e3cff8b
2019-09-25 06:08:16 -07:00
d396c7332a Update ONNX Export for Interpolate in Opset 11 (#26778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26778

- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Original PR resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17564911

Pulled By: houseroad

fbshipit-source-id: 591e1f5b361854ace322eca1590f8f84d29c1a5d
2019-09-25 05:43:20 -07:00
60343a82e9 Named tensor support for: atan2, output_nr, detach{_}, requires_grad_ (#26543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26543

Also adds a test for logical_xor (it already had named tensor support
but there was no test)

Test Plan: - [namedtensor ci]

Differential Revision: D17501403

Pulled By: zou3519

fbshipit-source-id: 49be15580be9fb520e25a8020164e5a599d22d40
2019-09-25 05:23:57 -07:00
be93d30e37 Revert D17458232: Fake quantization enhancements for QAT/PTQ support
Test Plan: revert-hammer

Differential Revision:
D17458232

Original commit changeset: f44380c60f1a

fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2
2019-09-25 04:56:30 -07:00
e2c3d7e52c Fake quantization enhancements for QAT/PTQ support (#26420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420

Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant.
ghstack-source-id: 90704254

Test Plan:
buck test caffe2/test:fake_quant --  --print-passing-details
buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17458232

fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62
2019-09-25 02:02:00 -07:00
a395c31147 Add <cinttypes> include to resolve PRIu32 macro (#26745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26745

This file doesn't appear to be included by default on GCC 7.3 and
causes compilation to fail. Adding this include fixes compilation.

Test Plan: Imported from OSS

Differential Revision: D17566444

Pulled By: pietern

fbshipit-source-id: 9afb3d4596e424efc5a6ea6ab3b1cffdb2b41fbb
2019-09-25 00:57:28 -07:00
bc4519dc27 Handle DeQuantStub() for QAT (#26518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518

Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant.
ghstack-source-id: 90704220

Test Plan:
buck test caffe2/test:quantization -- --print-passing-details

Tests in test_quantization pass with changes:
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989
Summary (total time 73.03s):
  PASS: 28
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17439333

fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571
2019-09-25 00:35:34 -07:00
9949638818 Improve error message in IR parser when accessing undefined variable.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26771

Test Plan: Imported from OSS

Differential Revision: D17562853

Pulled By: ZolotukhinM

fbshipit-source-id: b4d4bc6001e3ea06f4d1b8691ad2a339a04c16ea
2019-09-25 00:23:13 -07:00
6d0b004574 rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool
Summary:
Rename old mobile_threadpool() API, replace it with a new version that
returns caffe2::ThreadPool instead of pthreadpool_t.

Test Plan: - builds

Differential Revision: D17543413

Pulled By: ljk53

fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65
2019-09-24 22:27:35 -07:00
d4dc844ec3 Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (#26756)
Summary:
This PR includes the following improvements:
1. Add comments for limitations of the multidim tensor factory function `torch::tensor(...)`, noting the fact that `torch::tensor({})` and mixed data type such as `torch::tensor({{bool, 2.0}})` are not supported at the moment. (I will also update https://pytorch.org/cppdocs/notes/tensor_creation.html to include usage examples for the multidim tensor factory function `torch::tensor(...)`)
2. Rename `ListInitTensor` to `InitListTensor`, for better naming consistency.

This addresses reviews in https://github.com/pytorch/pytorch/pull/26210. I will work on a separate PR to move the factory function to `at::`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26756

Differential Revision: D17560136

Pulled By: yf225

fbshipit-source-id: eb8b45226e999784da48f75cc8953a998582df99
2019-09-24 19:21:23 -07:00
e54a9e1b5a use new fbgemm PackedDepthWiseConvMatrix without template parameter (#26760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26760

Follow-up of D17514003 . Change Caffe2 code to use the new PackedDepthWiseConvMatrix interface.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D17514350

fbshipit-source-id: 691d9f1fd35bdb7dd8ba152287f3a34359dc1f4c
2019-09-24 19:06:04 -07:00
0c0b4b6326 Revert D17559660: [fix] quantized_tensor tests
Test Plan: revert-hammer

Differential Revision:
D17559660

Original commit changeset: d4ce81d57729

fbshipit-source-id: b6c9dc31f08935d255fa9eb3a830bafc76a13799
2019-09-24 18:59:48 -07:00
1bb895e1c1 Revert D17330801: [pytorch][PR] Update ONNX Export for Interpolate in Opset 11
Test Plan: revert-hammer

Differential Revision:
D17330801

Original commit changeset: 1bdefff9e72f

fbshipit-source-id: dff07477403170c27260f736ab6e6010f0deca9f
2019-09-24 18:56:45 -07:00
ef8d1c50c4 Fix builtin lookup for Python functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26688

Pulled By: driazati

Differential Revision: D17560634

fbshipit-source-id: e1c50d1ca24e0313c2b7d704c488a29ef6a47cad
2019-09-24 18:02:36 -07:00
cc4219a799 Wrap dimensions during named inference (#26558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26558

Previously, name inference gets called after dimensions are wrapped.
This PR makes it so that name inference always wraps dimensions so that
it can be called anywhere. Ideally we would only wrap dimensions once,
but many of our operators wrap dimensions in weird places.

Wrapping dimensions in name inference is pretty inexpensive and only
happens for named tensors (name inference does not run on unnamed
tensors.)

Test Plan: - [namedtensor ci]

Differential Revision: D17557049

Pulled By: zou3519

fbshipit-source-id: 68c5636489e233dbf2588ab6ad4e379a6fe4c8ba
2019-09-24 17:47:55 -07:00
27ad34a703 Revert D17558701: [refactor] move more functions to InsertObserversHelper
Test Plan: revert-hammer

Differential Revision:
D17558701

Original commit changeset: 96ef87db74bd

fbshipit-source-id: fc398d3b8bb1cd0bae573e3fdac5cfb883b31373
2019-09-24 17:33:58 -07:00
89c5dc57d9 Add whitelist for backward compatible checks for function schemas (#26740)
Summary:
Now, we skip all function schema contains quantize key word
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26740

Reviewed By: hl475

Differential Revision: D17561753

Pulled By: houseroad

fbshipit-source-id: c5e47ada072e71bfa2341a0af8f1743e86ef733c
2019-09-24 17:31:04 -07:00
b93f0947a8 Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (#26736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26736

Previous import was 23bb6ea1a71f08e200114a153f48bd7adb66d486

Included changes:
- **[ab6b9420](https://github.com/onnx/onnx/commit/ab6b9420)**: Relax IF's shape inference rule (#2345) <Wei-Sheng Chin>
- **[c5af774a](https://github.com/onnx/onnx/commit/c5af774a)**: Clarify behavior in ConvTranspose (#2343) <Wei-Sheng Chin>
- **[a20ba2f1](https://github.com/onnx/onnx/commit/a20ba2f1)**: Fix node test case model for Gemm scalar bias case (#2342) <Hariharan Seshadri>
- **[1aa176e0](https://github.com/onnx/onnx/commit/1aa176e0)**: Update pybind (#2340) <Changming Sun>
- **[7840504d](https://github.com/onnx/onnx/commit/7840504d)**: Update gen_doc script to validate proto3 files (#2122) <Raymond Yang>
- **[bd35e623](https://github.com/onnx/onnx/commit/bd35e623)**: Fix some backend tests  (#2335) <Hariharan Seshadri>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17552449

fbshipit-source-id: 424acb261b54fc98485f782f6922b11b28c836eb
2019-09-24 17:14:13 -07:00
925e51ea7f Add a lot of dimname overloads (#26636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26636

This PR defines a lot of dimname overloads so that when named tensor
support is added for those operators, we will not have to modify the
autogenerated TensorMethods.h, thereby avoiding potential merge
conflicts in the future.

Overloads were added for the following:
- all
- any
- argmax
- argmin
- cumsum
- cumprod
- index_copy
- kthvalue
- mode
- permute
- squeeze
- index_add
- index_fill
- scatter
- scatter_add
- index_select
- gather
- sort
- argsort

Test Plan: - [namedtensor ci]

Differential Revision: D17522984

Pulled By: zou3519

fbshipit-source-id: eca6dea819ba4e4e43b71b700d5cf09176f00061
2019-09-24 17:03:36 -07:00
67bde6b724 quantized_tensor tests (#25429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25429

Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D17559660

fbshipit-source-id: d4ce81d577296c1137270fdaa6b1359fb703896f
2019-09-24 17:00:24 -07:00
4820ff3adc Switch our Android CI to Clang (#26656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26656

Updating the NDK to r18 or newer triggers a path in our CI scripts so that we now build with clang instead of gcc.
Google discontinued the gcc support for android quite a while ago, clang is the only way forward.
ghstack-source-id: 90698985

Test Plan: CI

Reviewed By: dreiss

Differential Revision: D17533570

fbshipit-source-id: 5eef4d5a539d8bb1a6682f000d0b5d33b3752819
2019-09-24 16:42:27 -07:00
5de5f793ed Added test case for reinit (#26506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26506

[pytorch] [distributed] Made test forgiving to allow rpc agent to return one of the two errors.
ghstack-source-id: 90667534

Test Plan: Made sure pg based UT works.

Differential Revision: D17488899

fbshipit-source-id: 41f76cf4b4a0ca5e651a5403d6e67b639f0b9c4f
2019-09-24 16:39:33 -07:00
7516156a35 move more functions to InsertObserversHelper (#26696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26696

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17558701

fbshipit-source-id: 96ef87db74bd1a5d4ddc69867ae71d78c0df83fd
2019-09-24 16:30:13 -07:00
d21232055e Address review comments in https://github.com/pytorch/pytorch/pull/26272 (#26587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26587

-
ghstack-source-id: 90557226

Test Plan: unit tests

Differential Revision: D17515048

fbshipit-source-id: 3459ee80efec29080060ec29d67642d789dd8749
2019-09-24 16:30:11 -07:00
e95f3125fd Make ONNX_ATEN_FALLBACK also works for _export (#26738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26738

someone may use torch._export directly. Here we change the onnx_export_type's default value to None,
and if it's pytorch onnx caffe2 bundle, we set it to ONNX_ATEN_FALLBACK, otherwise, it's ONNX.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17546452

fbshipit-source-id: 38e53926e2b101484bbbce7b58ebcd6af8c42438
2019-09-24 16:30:09 -07:00
f43b7c4435 Revert D17513451: Register values listed in __constants__ as attributes of the Module.
Test Plan: revert-hammer

Differential Revision:
D17513451

Original commit changeset: cf8f9b450e71

fbshipit-source-id: 319ec9399173eb06556969dc6be365b319c1ab6c
2019-09-24 16:30:06 -07:00
1058373205 Revert D17514653: [quant] Un-hardcode epsilon constant in FoldConvBatchNorm2d.
Test Plan: revert-hammer

Differential Revision:
D17514653

Original commit changeset: 7d9cc8f619b7

fbshipit-source-id: 2cf32082a46fe169a1db4926df78a9f3256616ad
2019-09-24 16:30:04 -07:00
9dd9a7ef5c Simplify operator sign using the helper.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25592

Test Plan: Imported from OSS

Differential Revision: D17552470

Pulled By: VitalyFedyunin

fbshipit-source-id: 6c8cc4f46dd390c231b2d0aac664ad2a6ac8876e
2019-09-24 16:30:02 -07:00
c8109058c4 Refactor android torchvision: not hardcoded mean/std (#26690)
Summary:
- Normalization mean and std specified as parameters instead of hardcode
 - imageYUV420CenterCropToFloat32Tensor before this change worked only with square tensors (width==height) - added generalization to support width != height with all rotations and scalings
- javadocs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26690

Differential Revision: D17556006

Pulled By: IvanKobzarev

fbshipit-source-id: 63f3321ea2e6b46ba5c34f9e92c48d116f7dc5ce
2019-09-24 16:29:59 -07:00
de3d4686ca Update ONNX Export for Interpolate in Opset 11 (#24805)
Summary:
- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17330801

Pulled By: houseroad

fbshipit-source-id: 1bdefff9e72f5e70c51f4721e1d7347478b7505b
2019-09-24 16:29:57 -07:00
d959b4e2d2 Add threadpool in qlinear and qconv for mobile (#26728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26728

Use Caffe2::mobile_threadpool() in linear and conv operators

Perf
Without threadpool - 76ms
With threadpool - 41 ms

Test Plan:
python test/test_quantized.py TestQNNPackOps

Imported from OSS

Differential Revision: D17553510

fbshipit-source-id: dd5b06f526f65d87727ec7e3dad0a5fa74cba9f9
2019-09-24 16:29:55 -07:00
f57ecd5f29 add timeout parameter to connect function in TCPStore (#26554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26554

Previously, in `TCPStore`'s constructor we did not pass in a timeout to
the `connect` function, which thus used the default timeout (-1, so infinite).
But the timeout variable in `TCPStore.cpp `is configurable by the user and set to
be 300 seconds by default, so we should be passing this into the connect function.

Test Plan: see above.

Differential Revision: D17486779

fbshipit-source-id: 42d38a3b8d492d9e9ff09110990a8e4a3a1292b2
2019-09-24 16:29:52 -07:00
5e5b9a9321 Add C++ nn::Identity (#26713)
Summary:
**Summary**:
Adds `torch::nn::Identity` module support for the C++ API.

**Issue**: https://github.com/pytorch/pytorch/issues/25883

**Reviewer**: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26713

Differential Revision: D17550982

Pulled By: yf225

fbshipit-source-id: f24483846e82d5d276d77a1a0c50884f3bc05112
2019-09-24 16:29:49 -07:00
c0c2921a06 fix annotation regex for flake8 (#26694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26694

Previously we would not properly populate `errorDesc` for:
```
./torch/jit/__init__.py:13:1: F401 'torch.nn.ModuleList' imported but unused
```

because we wanted only letters and spaces. Be more permissive

Test Plan: Imported from OSS

Differential Revision: D17551999

Pulled By: suo

fbshipit-source-id: b82567df1fa3c9729e7427dc3461bedfb40933dc
2019-09-24 16:29:47 -07:00
3f72bcfcaa Remove _dequantize_per_tensor (#26681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26681

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17542833

fbshipit-source-id: 653e906b0e146763609c69ef0de7f9cf38621586
2019-09-24 10:54:56 -07:00
d0fff0ebc8 Make is_optional check more robust (#26312)
Summary:
If the `Union` contains a non-class type, `issubclass` would fail, this
adds a check for that case
](https://our.intern.facebook.com/intern/diff/17505206/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312

Pulled By: driazati

Differential Revision: D17505206

fbshipit-source-id: 1331e412f938e2f08ecb079972147f11e3ec77cd
2019-09-24 10:44:40 -07:00
5cc353482d Add doc building instructions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26553

Differential Revision: D17551426

Pulled By: driazati

fbshipit-source-id: 53ce05882091aca4617586bc53944ee4c8b3a622
2019-09-24 10:38:23 -07:00
eddda3afdc Un-hardcode epsilon constant in FoldConvBatchNorm2d.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26584

Test Plan: Imported from OSS

Differential Revision: D17514653

Pulled By: ZolotukhinM

fbshipit-source-id: 7d9cc8f619b7dbe26fa58eac37cc131929c004d4
2019-09-24 10:30:35 -07:00
6c758ff244 Register values listed in __constants__ as attributes of the Module. (#26581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26581

We're currently inlining immediate values of the constants directly into
IR when we generate it providing no way to access these values by their
names later. This change registers such values as atrtibutes of the
module so that they are not lost after IR generation.

Differential Revision: D17513451

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: cf8f9b450e7178692211abd905ffd2d7ce5a6ce1
2019-09-24 10:30:31 -07:00
52b69fbcd4 Remove _dequantize_per_channel in the pattern (#26680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26680

This was introduced before under the assumption that we'll have a qconv_per_tensor_affine
and a qconv_per_channel_affine, but turns out we don't have these, so we'll remove
thse functions.

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17542607

fbshipit-source-id: b90ce5738170f0922bdc2eb1c4dbecd930f68a48
2019-09-24 10:27:52 -07:00
cf272d43ab Trivial quantized torch.mean implementation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253

Test Plan: Imported from OSS

Differential Revision: D17529994

Pulled By: jamesr66a

fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b
2019-09-24 10:18:15 -07:00
9f1da984ef Enable hub tests on MacOS (#26697)
Summary:
fix https://github.com/pytorch/pytorch/issues/26032.
This was broken by a bad openssl release in conda. Should be fixed now. Testing...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26697

Differential Revision: D17542095

Pulled By: ailzhang

fbshipit-source-id: ba99f9b36ef2a7c793842cf91bd46fb2634ac1aa
2019-09-24 10:11:00 -07:00
af3b15b74c Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (#26146)
Summary:
This is a follow-up PR for https://github.com/pytorch/pytorch/pull/23284. In that PR we had removed changing the default behavior for `keep_initializers_as_input` argument to the export API. With this PR we are enabling that change in that if `keep_initializers_as_input` is not specified then value/behavior for this argument is chosen automatically depending on whether the export type is ONNX or not.

This was part of the earlier PR was removed for further review. The test points have also been updated.

This change may fail some internal tests which may require explicitly setting `keep_initializers_as_input=True` to preserve old behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26146

Reviewed By: hl475

Differential Revision: D17369677

Pulled By: houseroad

fbshipit-source-id: 2aec2cff50d215714ee8769505ef24d2b7865a11
2019-09-24 10:02:31 -07:00
8b12602264 Add traces to specialize_autograd and lower_grad_of (2nd try)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22752

Differential Revision: D17543836

Pulled By: Krovatkin

fbshipit-source-id: 5cbca220943a580169bf60ac09780b6e67075d2b
2019-09-24 09:58:43 -07:00
a172fbf972 Expands TestAutogradDeviceType (#26708)
Summary:
- Ports all CUDA tests to TestAutogradDeviceType except those using multiple devices
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26708

Differential Revision: D17549435

Pulled By: mruberry

fbshipit-source-id: b564186444201d1351934b6a7d21f67bdfca6e3b
2019-09-24 09:52:53 -07:00
fa7b621afd Remove duplicate calculation of output shape (#26684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26684

Output heights and widths are already calculated by conv_p. Remove the duplicate calculation.
ghstack-source-id: 90633432

Test Plan:
buck test mode/dev caffe2/test:quantized
```
Summary (total time 18.69s):
  PASS: 45
  FAIL: 0
  SKIP: 10
    caffe2/test:quantized - test_qadd_scalar_relu (test_quantized.TestQuantizedOps)
    caffe2/test:quantized - test_equal (test_quantized.TestQuantizedOps)
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)
    caffe2/test:quantized - test_qconv_qnnpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_maxpoolMore details at https://our.intern.facebook.com/intern/buck/build/3b394f1e-ab99-4e59-bdf5-2766f46e9869
2d (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D17538375

fbshipit-source-id: b4b60e93fdec4cc7bbf6aee7182381221dfac243
2019-09-24 09:49:24 -07:00
128a65e2e0 Use noop observer to pass dtype for dynamic quantization (#26709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709

Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.

One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.

Test Plan: Imported from OSS

Differential Revision: D17544103

Pulled By: dzhulgakov

fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f
2019-09-24 09:24:39 -07:00
ae0732cde3 Speed up an integer to the power of a positive integer on CPU (#26020)
Summary:
Current integer scalar exps are always cast to double. This commit avoids cast if the tensor is also
integral and the scalar is positive to speed up.

Benchmark (Debian Buster, g++ 8, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz	0	0:0	3300.00 MHz	, Debug
build, Turbo turned off):

```python
import timeit

for n, t in [(1000, 13000),
            (10_000, 1300)]:
    for e in (2, 3, 4):
        for dtype in ('torch.int16', 'torch.int32', 'torch.int64'):
            print(f'a.pow({e}) (a.numel() == {n}) for {t} times')
            print(f'dtype {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a.pow({e})',
                                setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})',
                                number=t))
```

Before:

```
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.6958350749996498
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		0.7989626339999631
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		0.7973162800003593
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.8660746679997828
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		0.8101709959996697
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		0.8135280149999744
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		5.010833072999958
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		4.801007671999741
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		3.963344578000033
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.6216251330001796
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.5672429639998882
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.5544572270000572
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.656308512999658
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		1.502670819999821
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.5757876879997639
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		4.775718216999849
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		4.754745475000163
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		3.737249878000057
```

After:

```
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.1006453190002503
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.0849009019998448
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.093259106000005
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.0859826279997833
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.1076840900000207
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.0755480369998622
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.918211066999902
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.9183043200000611
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.930021430999659
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		0.7271483560002707
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.7289002070001516
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.7267536800000016
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		0.7301799359997858
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.7289195180001116
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.7270008230002531
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.5354506029998447
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		1.528263066999898
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		1.5369428439998956
```

 ---

Best viewed with whitespace changes turned off
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26020

Differential Revision: D17485400

Pulled By: VitalyFedyunin

fbshipit-source-id: 3a16b074825a5aab0f7e7af3d8100f9e4b7011a3
2019-09-24 09:17:09 -07:00
66d27504e3 allow building docker without torchvision (#26168)
Summary:
There is an issue with the torchvision version not matching the pytorch version if one builds the docker from a tag, see issue https://github.com/pytorch/pytorch/issues/25917.  The current solution requires one to re-init the submodules or manually change the version of torchvision.  This PR allows one to build the docker image without torchvision, which not only fixes the above mentioned bug but also frees non-image pytorch users from the tyranny of torchvision 😆.

In all seriousness, for NLP researchers especially torchvision isn't a necessity for pytorch and all non-essential items shouldn't be in the docker.  This option removes one extra thing that can go wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26168

Differential Revision: D17550001

Pulled By: soumith

fbshipit-source-id: 48b8b9e22b75eef3afb392c618742215d3920e9d
2019-09-24 09:12:57 -07:00
3cae3021e5 Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (#26559)
Summary:
This ensures that `F::cosine_similarity` and `F::pairwise_distance` can be used simply by including `torch/torch.h` and set `namespace F = torch::nn::functional`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26559

Differential Revision: D17507421

Pulled By: yf225

fbshipit-source-id: f895dde3634d5c8ca66ee036903e327e5cdab6b1
2019-09-24 09:10:42 -07:00
714b05e499 Updating submodules
Summary:
GitHub commits:

ff4a61094e
ad81c3823e
518d8a1832

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 2a9a47805569a43e05d044c5494b57f6a7996bc4
2019-09-24 08:56:02 -07:00
ff78d743b4 Don't generate named tensor functions to RegistrationFunctions.h (#26685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26685

This prevents XLA from picking up on named tensor APIs. I ran into some
problems while attempting to support dimname overloads in XLA; since we
don't need the first iteration of named tensors to work with XLA this is
OK.

Test Plan: - run CI.

Differential Revision: D17538893

Pulled By: zou3519

fbshipit-source-id: 93d579c93f5b1dc68541c07c4a3d61792859507d
2019-09-24 08:52:20 -07:00
05f708187c Typo fix (#26417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26417

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17548776

Pulled By: ezyang

fbshipit-source-id: 8c79893ee4216780edb838671e701de5518c4cd0
2019-09-24 08:41:54 -07:00
efaa65dd60 resolve ignored module method type annotations (#26683)
Summary:
Previously we weren't passing an rcb around, causing NamedTuples with unused methods to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26683

Differential Revision: D17539656

Pulled By: eellison

fbshipit-source-id: 50091e78eea5fa3a22b4655b65384eee47a1c9d6
2019-09-24 08:16:08 -07:00
5e5cbceeba remove tools/setup_helpers/cudnn.py (#25876)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.

Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876

Differential Revision: D17346270

Pulled By: ezyang

fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
2019-09-24 07:44:33 -07:00
9f3351de81 Add warning to anomaly_mode doc fix #26408 (#26615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26615

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#26615 Add warning to anomaly_mode doc fix #26408**

Test Plan: Imported from OSS

Differential Revision: D17527854

Pulled By: albanD

fbshipit-source-id: d925dae049e64d88a50d08c46db33e3aabc1b849
2019-09-24 07:27:39 -07:00
cf1dbc79db Vectorize unary operator erfinv (#26629)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/19088 for erfinv.

erfinv speedup (MKL, AMD Ryzen Threadripper 2970WX 24-Core Processor): 22x
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26629

Differential Revision: D17527230

Pulled By: ezyang

fbshipit-source-id: 0a5a53a88f7eb219617120383a454a01ad78279a
2019-09-24 07:24:50 -07:00
c643290982 Add derivative for cholesky_inverse (#26451)
Summary:
Changelog:

- Add derivative of cholesky_inverse. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26451

Test Plan:
- Added tests for cholesky_inverse in test_autograd.py

Closes https://github.com/pytorch/pytorch/issues/4669.

Differential Revision: D17548526

Pulled By: ezyang

fbshipit-source-id: 51aa8b900a8dc4012b01a73d432606f216f62c9d
2019-09-24 07:12:41 -07:00
7bdc0c138a Move the CUDA implementation of trunc to ATen. (#25423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25423

Fix #24650

Test Plan: Imported from OSS

Differential Revision: D17397489

Pulled By: VitalyFedyunin

fbshipit-source-id: 933f915a44ff9b7803ddb2708bf0e723433ee0b6
2019-09-24 07:08:55 -07:00
d6ee58494f Automatic update of fbcode/onnx to 23bb6ea1a71f08e200114a153f48bd7adb66d486 (#26441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26441

Previous import was 1316afc9f972f81340faa05763e2898f38bcc3b0

Included changes:
- **[23bb6ea1](https://github.com/onnx/onnx/commit/23bb6ea1)**: Gemm optional bias (#2330) <James Allingham>
- **[1ac1f219](https://github.com/onnx/onnx/commit/1ac1f219)**: Changes for AIX platform (#1913) <kavanabhat>
- **[13b026f5](https://github.com/onnx/onnx/commit/13b026f5)**: Updated test cases for reshape (#2127) <James Allingham>
- **[97fcfe30](https://github.com/onnx/onnx/commit/97fcfe30)**: Replace is by == (#2326) <G. Ramalingam>
- **[3b5601e6](https://github.com/onnx/onnx/commit/3b5601e6)**: Updated docs for strides and dilations attributes  (#2291) <James Allingham>
- **[d0c697b1](https://github.com/onnx/onnx/commit/d0c697b1)**: Revamped test cases for Gemm (#2060) <James Allingham>
- **[a3955c3c](https://github.com/onnx/onnx/commit/a3955c3c)**: Add more shape inference tests for Logical operators to improve coverage (#2133) <Hariharan Seshadri>
- **[e2e12d97](https://github.com/onnx/onnx/commit/e2e12d97)**: Change incorrect use of ValueError to TypeError (#2304) <prcvih>
- **[1f4b5f8c](https://github.com/onnx/onnx/commit/1f4b5f8c)**: Support dynamic 'pads' and 'value' in Pad operator (#2031) <Hariharan Seshadri>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17466717

fbshipit-source-id: 0f89a7a5a821d2c693492c99b4bebd5966e21d9f
2019-09-24 05:38:52 -07:00
450504cd95 C++ API parity: at::Tensor::set_data
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26647

Test Plan: Imported from OSS

Differential Revision: D17542604

Pulled By: pbelevich

fbshipit-source-id: 37d5d67ebdb9348b5561d983f9bd26d310210983
2019-09-24 04:51:22 -07:00
2cf1183ec1 Use optimized graph in Inline (essentially, making Inline recursive now). (#26489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26489

This basically fixes Inline(recurse=true) and makes it a default. One
reservation against running inlining recursively in the original
implementation was that we might hit a quadratic behavior, but in this
implementation it's not an issue since we're inlining only already
inlined graphs and as we recursively descend the call tree we're caching
graphs we've already optimized.

Test Plan: Imported from OSS

Differential Revision: D17485744

Pulled By: ZolotukhinM

fbshipit-source-id: 2ed7bdc69863b90a8c10a385d63f8e7c9e7b05f5
2019-09-24 00:22:29 -07:00
c522b6356c Add 'optimized_graph' to Function. (#26488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26488

Currently the main use case for this graph is inlining and that's the
only optimization we perform. We probably should run more cleanups on
this graph in future.

Test Plan: Imported from OSS

Differential Revision: D17485745

Pulled By: ZolotukhinM

fbshipit-source-id: 7b30c9ba47b4e5fff3591a0063560bfeb68f2164
2019-09-24 00:22:26 -07:00
c034f9796f Use std::mutex instead of std::call_once in Function when we initialize GraphExecutor. (#26571)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26571

We will need a mutex for computing optimized graph too, which will be
implemented in subsequent commits.

Test Plan: Imported from OSS

Differential Revision: D17510883

Pulled By: ZolotukhinM

fbshipit-source-id: 273b25426785e50f67a103204de98f6ed14182db
2019-09-24 00:22:22 -07:00
76e2ffc877 Remove 'recurse' parameter from Inline. (#26487)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26487

The way it is implemented currently is bad because while we're inlining
to a graph G, we are also mutating all the graphs that are being
inlined. The problem is that the graphs we're inlining are usually the
original graphs of functions, so we're silently changing them behind the
scenes, and we don't have a way to recover 'unoptimized' graphs
afterwards.

Test Plan: Imported from OSS

Differential Revision: D17485748

Pulled By: ZolotukhinM

fbshipit-source-id: 6094ef56077240e9379d4c53680867df1b6e79ef
2019-09-24 00:22:18 -07:00
a65db650a8 Enable registering stackbased kernels with lambdas (#26658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26658

By SFINAE'ing the lambda registration to only kernels that aren't stackbased kernels,
an attempt to register a stackbased lambda kernel will correctly fallback to the stackbased registration function and work as expected.
ghstack-source-id: 90610843

Test Plan: unit tests

Differential Revision: D17533871

fbshipit-source-id: 1bfe3106b0576d46798a51bdaa5b7b5508164766
2019-09-24 00:18:36 -07:00
839e636fa1 Revert D17495679: [pytorch][PR] A few hub improvements
Test Plan: revert-hammer

Differential Revision:
D17495679

Original commit changeset: 695df3e803ad

fbshipit-source-id: 6c85bc980991971b08714f05155dd23147eed233
2019-09-23 23:38:19 -07:00
98bbb7788c Updates and extends TestNNDeviceType (#26638)
Summary:
- Moves several tests to TestNNDeviceType
- Merges helper base with TestNNDeviceType
<s>- Enables non-default stream for TestNN (like recent updates to TestTorch and TestCUDA)</s>

Reverted non-default stream due to failure of test_variable_sequence_cuda (main.TestNN).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26638

Differential Revision: D17543899

Pulled By: mruberry

fbshipit-source-id: 001fa191f5fe424f2e7adc378b8fb5ee7f264f16
2019-09-23 22:48:21 -07:00
ade60f8a8d Allow per-channel QTensor accept any floating type for scales (#26676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26676

Just makes it more user-friendly to be able to pass any floating point or int point values to scales or zero_points for per-channel quantization. It matches behavior or per tensor quantizer where those arguments are scalars (not tensors) and thus automatic casting is applied.

Test Plan: Imported from OSS

Differential Revision: D17537051

Pulled By: dzhulgakov

fbshipit-source-id: e955ccdb5b4691828a559dc8f1ed7de54b6d12c4
2019-09-23 22:29:05 -07:00
b93823cb65 Per-channel quantized tensor to have only a single axis (#26675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26675

Based on offline poll, we're very unlikely to have multi-axis quantized tensors in the foreseeable future. Let's simplify API and just return int instead of list. It also matches the singular `axis` name.

Test Plan: Imported from OSS

Differential Revision: D17537052

Pulled By: dzhulgakov

fbshipit-source-id: 676abc3b251d288468aaed467b5e5ca4063b98b0
2019-09-23 22:29:01 -07:00
9aad4d7b5f Fix _empty_per_channel_affine_quantized to be less hacky (#26243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26243

This is an attempt to fix _empty_per_channel_affine_quantized to be more sane. It's a factory function that nevertheless receives a Tensor argument and it throws the codegen off course.

Before people did a hacky workaround of appending _like to the function name to trick codegen, it also required non-natural argument order.

This PR explicitly allows to override the 'category' of the function to make codegen do the right thing. Now name and the argument order (in C++) make more sense.

Test Plan: Imported from OSS

Differential Revision: D17443221

Pulled By: dzhulgakov

fbshipit-source-id: c98c1c74473d8cbf637f511d26ceb949d8ae2a1a
2019-09-23 22:28:58 -07:00
fbc3c14830 adding OpProfile proto into ProfDAGProtos to support storing operation cost (#26677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26677

This diff adds OpProfile proto into ProfDAGProtos to support storing operation cost. During performance estimation idx, net_name, type, and exec_time will be stored in this proto.

Test Plan:
```
buck test caffe2/caffe2/fb/net_transforms/tests/:stats_collector_test
buck test caffe2/caffe2/fb/net_transforms/tests/:perf_estimator_test
buck run caffe2/caffe2/fb/distribute/snntest/cogwheel/:cogwheel_snntest_offline_training_simple_online_training
```

Reviewed By: heslami

Differential Revision: D17533791

fbshipit-source-id: a339c8eadcac891aa631daaf64522b69876b5045
2019-09-23 20:44:15 -07:00
ba8002ec13 Quantized Interpolate Kernel(upsample_nearest2d) (#26617)
Summary:
In this PR, we implemented the support of quantized interpolate with upsample_nearest2d case.

import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        # float_out = torch.nn.functional.avg_pool2d(x, kernel_size=5, stride=None, padding=0)
        # float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5)
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        # quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=5, stride=None, padding=0)
        # quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5)
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

=========without special handling of NHWC layout=============
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.08712100982666        2.1624231338500977      1.0360794240817361
GB/s float      GB/s quant
1.5508750976872339      0.37421723220248165
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.056601047515869       2.184889316558838       1.0623787823107091
GB/s float      GB/s quant
1.573890086222483       0.3703693335250963
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0152783393859863      2.067704200744629       1.0260142037623525
GB/s float      GB/s quant
1.6061622539873104      1.5654386148823074

=========with special handling of NHWC layout=============
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044649124145508       0.009250640869140625    0.004524317038018256
GB/s float      GB/s quant
1.5830902044636819      87.47675014597938
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.049403190612793       0.009107589721679688    0.004444020465761265
GB/s float      GB/s quant
1.579417859221808       88.8507305147644
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0601415634155273      0.01062631607055664     0.0051580513976618066
GB/s float      GB/s quant
1.5711852318699757      304.6082930818039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26617

Differential Revision: D17519146

Pulled By: llyfacebook

fbshipit-source-id: 126876e550ef7009fd75f5ccc033599f1f37456d
2019-09-23 20:32:19 -07:00
aa95c7951e _per_channel_affine_qtensor -> _make_per_channel_quantized_tensor (#26679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26679

making it more explicit that it's a factory function.

Test Plan:
ci

Imported from OSS

Differential Revision: D17540861

fbshipit-source-id: bf66c87d6afad411afd5620cf2143a8f5596ad6b
2019-09-23 19:01:27 -07:00
8a919f4f3d Skip observing bias across function call hierarchy (#26642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26642

att

Test Plan:
python test/test_jit.py 'TestJit.test_insert_observers'

Imported from OSS

Differential Revision: D17538667

fbshipit-source-id: ac8f561160eed0803f6ac48cf0fed253adb58bb5
2019-09-23 18:49:40 -07:00
af96e0cb5b Whitelist ATen/core sources and headers for Caffe2 (#26609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26609

Previously, we were globbing all of ATen/core and excluding specific files.
However, this frequently resulted in new files being missed, and PyTorch
diffs triggering Caffe2 builds.  Now, instead, we will list the ATen/core
files that are required for Caffe2.

Test Plan: Ran internal Caffe2Go unit test.

Reviewed By: smessmer

Differential Revision: D17504740

fbshipit-source-id: 5b9bf7a6e8fa7848b2dfd375246d32630ca40cd5
2019-09-23 18:31:06 -07:00
d63143dc5b _per_tensor_affine_qtensor -> _make_per_tensor_quantized_tensor (#26678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26678

making it more explicit that it's a factory function.

Test Plan:
ci

Imported from OSS

Differential Revision: D17540862

fbshipit-source-id: 14c5a4dcc7bb85ae849c9e4e0882601005e2ed3a
2019-09-23 18:27:53 -07:00
7d612066ce Add ObserveHelper and remove some common function parameters (#26641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26641

att

Test Plan:
python test/test_jit.py 'TestJit.test_insert_observers*'

Imported from OSS

Differential Revision: D17538668

fbshipit-source-id: 42d0b251b245337227f877e57611b50f392a6d7e
2019-09-23 18:24:44 -07:00
7f89464b2d fix github actions for forked PRs (#26562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26562

I was trying to be too clever with GITHUB_HEAD_REF...

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D17538517

Pulled By: suo

fbshipit-source-id: 82c71ee3c6edb299ac8eb73675d96967e00a29f1
2019-09-23 17:59:37 -07:00
45391ccecb Update qengine flag in python to string (#26620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620

This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now).
set_qengine and get_qengine return an int which represents the at::QEngine enum

Test Plan:
python test/test_torch.py

Imported from OSS

Differential Revision: D17533582

fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f
2019-09-23 17:56:50 -07:00
5d82cefa55 remove unneeded code (#26640)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26640

Remove some code that we forgot to remove before

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17538669

fbshipit-source-id: 9614e45f6e5ad6f2fe2b4936deb23d0ffdfcd86a
2019-09-23 17:39:27 -07:00
1eaaf8b68b A few hub improvements (#25980)
Summary:
This PR does a few small improvements to hub:
- add support `verbose` option in `torch.load`. Note that this mutes hitting cache message but keeps the message of first download as suggested. fixes https://github.com/pytorch/pytorch/issues/24791
- add support loading state dict from tar file or zip file in `torch.hub.load_state_dict_from_url`.
- add `torch.hub.download_url_to_file` as public API, and add BC bit for `_download_url_to_file`.
- makes hash check in filename optional through `check_hash`, many users don't have control over the naming, relaxing this constraint could potentially avoid duplicating download code on user end.
- move pytorch CI off `pytorch/vision` and use `ailzhang/torchhub_example` as a dedicated test repo. fixes https://github.com/pytorch/pytorch/issues/25865
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25980

Differential Revision: D17495679

Pulled By: ailzhang

fbshipit-source-id: 695df3e803ad5f9ca33cfbcf62f1a4f8cde0dbbe
2019-09-23 17:24:19 -07:00
c79d116a7d Update ONNX Export for Gather and Scatter for Opset 11
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24790

Reviewed By: hl475

Differential Revision: D17159723

Pulled By: houseroad

fbshipit-source-id: a63bb7c681120de85588dafecd03f04742dde8b7
2019-09-23 17:13:25 -07:00
3569a1c6dd Fix Exporting RNN/LSTM's Initial State (h0/c0) to ONNX
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22813

Reviewed By: hl475

Differential Revision: D16275791

Pulled By: houseroad

fbshipit-source-id: 6e2259e84e1f5a674daabcbe0df99b1360ed2b35
2019-09-23 17:08:24 -07:00
cb9fd0ce58 quantized torch.topk (#26486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26486

This PR adds a quantized version of torch.topk, supporting all the same options

Benchmark script
```
import torch
import time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    X = torch.rand(6, 5, 1024)
    qX = torch.quantize_linear(X, 0.01, 0, dtype)
    X = qX.dequantize()

    NITER = 10000

    s = time.time()
    for i in range(NITER):
        float_out = torch.topk(X, 50)
    float_time_per_iter = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.topk(qX, 50)
    quant_time_per_iter = (time.time() - s) / NITER

    print(dtype)
    print('float ms', 'quant ms', 'float gB/s', 'quant gB/s', sep='\t')
    nbytes_float = (X.numel() + float_out[0].numel()) * X.element_size()
    nbytes_quant = (qX.numel() + quant_out[0].numel()) * qX.element_size()
    print(float_time_per_iter * 1000,
          quant_time_per_iter * 1000,
          nbytes_float / float_time_per_iter / 1e9,
          nbytes_quant / quant_time_per_iter / 1e9, sep='\t')
```

Results

```
torch.qint8
float ms	quant ms	float gB/s	quant gB/s
0.3706729888916016	0.3370296716690064	0.34769191136743244	0.09559989136992947
torch.quint8
float ms	quant ms	float gB/s	quant gB/s
0.38260042667388916	0.34079675674438475	0.3368527346412275	0.09454315325003715
torch.qint32
float ms	quant ms	float gB/s	quant gB/s
0.38033516407012935	0.3364055633544922	0.3388590174539739	0.38310900305828427

```

Test Plan: Imported from OSS

Differential Revision: D17529988

Pulled By: jamesr66a

fbshipit-source-id: b5edfe90c592b6c84459d1c0c77e4c18f5b04417
2019-09-23 16:47:47 -07:00
64d58c2f41 Allow batch size of 0 in Conv
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26214

Test Plan: Imported from OSS

Differential Revision: D17377035

Pulled By: jamesr66a

fbshipit-source-id: feb2ce195742e7102df0497e6c345e7173a10e19
2019-09-23 14:47:29 -07:00
fcd13549f9 add CondValue to unify refinements and code emission (#26145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26145

This is step towards isinstance type refinement.
It primarily does yak shaving in compiler.cpp to unify the handling
of special case behavior that occurs in conditional expressions:

* Handling type refinement as part of emission.
* Handling `is None` static-if specialization.

It introduces a CondValue object that is a Value that also has
additional type refinements that are true when that Value is true,
and potentialy a static-true/false value that, if set, will cause if
statements to be handled statically, omitting typechecking of the other side.

This ends up expanding some behavior, for instance `is None` specialization
used to happen only for single expressions, but now works through
boolean logic.

Test Plan: Imported from OSS

Differential Revision: D17359500

Pulled By: zdevito

fbshipit-source-id: ce93804496c8b4c3197a5966bc28c608465fda64
2019-09-23 14:24:18 -07:00
cbdbdd3c8c Fix the flaky test_qlinear test caused by hypothesis deadline (#26663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26663

As Title says.

Example error:
https://circleci.com/gh/pytorch/pytorch/2894108?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link%2Fconsole

```
Sep 23 19:08:00 Unreliable test timings! On an initial run, this test took 453.13ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 23.01 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.
```
ghstack-source-id: 90613535

Test Plan: CI

Differential Revision: D17534476

fbshipit-source-id: d3ab91c8b290a0433eab4af3fc73ecbf728ec5bf
2019-09-23 14:19:39 -07:00
21314cfdde "fixing" gcc bug introduced with cuda 10.1 (#26445)
Summary:
> Cuda 10.1: Nvidia, you're now "fixing" gcc bugs that gcc doesn't even have

see [discussion]( https://devtalk.nvidia.com/default/topic/1048037/linux/cuda-10-1-nvidia-you-re-now-quot-fixing-quot-gcc-bugs-that-gcc-doesn-t-even-have/) for detail
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26445

Reviewed By: soumith

Differential Revision: D17533850

Pulled By: mingbowan

fbshipit-source-id: d668b0c4a3c71d58b4a0fa8e00d873708add3dea
2019-09-23 14:08:16 -07:00
ebc2365fd3 Serialization for per channel qtensor (#26339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26339

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Test Plan: Imported from OSS

Differential Revision: D17443222

Pulled By: dzhulgakov

fbshipit-source-id: a34758de1ffd2ec1cdc5355f5baf95284a4ccf4b
2019-09-23 13:28:11 -07:00
c0aa6a01ce NHWC specialization for quantized::cat (#26524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26524

This creates an NHWC specialization for `quantized::cat` that kicks in when all inputs are `NHWC`. This ensures the correct layout is propagated downstream as well as is an optimized implementation specifically for this data layout

Benchmark script based on Squeezenet shapes:
```
import torch, time

torch.manual_seed(0)

# NHWC
sizes = [
    (1, 54, 54, 64),
    (1, 54, 54, 128),
    (1, 26, 26, 128),
    (1, 26, 26, 256),
    (1, 12, 12, 256)
]

for size in sizes:
    x = torch.rand(*size)
    y = torch.rand(*size)
    qX = torch.quantize_linear(x, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])
    qY = torch.quantize_linear(y, 0.01, 3, torch.qint8).permute([0, 3, 1, 2])

    ref = torch.cat([qX.dequantize(), qY.dequantize()], dim=1)

    NITER = 1000
    s = time.time()
    for i in range(NITER):
        out = torch.ops.quantized.cat([qX, qY], dim=1, scale=0.01, zero_point=3)
    time_per_iter = (time.time() - s) / NITER

    print('time per iter ms', time_per_iter * 1000)
    print('gb/s', (qX.numel() + qY.numel() + out.numel()) * qX.element_size() / time_per_iter / 1e9)

    torch.testing.assert_allclose(out.dequantize(), ref)
```

Before this change

```
time per iter ms 0.6898486614227295
gb/s 1.0821156026605054
time per iter ms 1.5480577945709229
gb/s 0.9644291093239284
time per iter ms 0.3180875778198242
gb/s 1.0881028500775023
time per iter ms 0.6702737808227539
gb/s 1.032748139350315
time per iter ms 0.13010454177856445
gb/s 1.1333655073392244
```
After this change
```
time per iter ms 0.11604785919189453
gb/s 6.432656364350577
time per iter ms 0.15956878662109375
gb/s 9.356416324360508
time per iter ms 0.040181636810302734
gb/s 8.613685939027139
time per iter ms 0.06564664840698242
gb/s 10.544696748392909
time per iter ms 0.018549680709838867
gb/s 7.949247337814738
```

Test Plan: Imported from OSS

Differential Revision: D17503593

Pulled By: jamesr66a

fbshipit-source-id: ec5d57ad8fbcb3fd9379e8bd370abd29d386f953
2019-09-23 13:19:29 -07:00
69631c3ee3 nightly prefix for android nightly jobs (#26652)
Summary:
At the moment we have the same names for PR jobs and nightly jobs and results as we see on https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master:
pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build-1
pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build-2

 => adding nightly_ prefix for nightly jobs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26652

Differential Revision: D17533456

Pulled By: IvanKobzarev

fbshipit-source-id: 586f48dc361c9143d8223e6742bbe78ef96b64fe
2019-09-23 13:04:09 -07:00
aeb6532e7f BlobReference __getattr__ can only throw AttributeError (#26654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26654

As per python contract, __getattr__ can only throw AttributeError. Throwing something else breaks hasattr() and causes upstream issues.

Similar bug was in pytorch earlier.

Test Plan: builds

Differential Revision: D17529471

fbshipit-source-id: bb6ac6c9e3be8b80fa2967e6a2e293afd1594cf9
2019-09-23 13:01:00 -07:00
8fc8652598 Import torch.quantization when one imports torch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26649

Test Plan: Imported from OSS

Differential Revision: D17529567

Pulled By: dzhulgakov

fbshipit-source-id: 8bf814c69ceb5e13891b57659cc729cccbfbc853
2019-09-23 12:58:17 -07:00
567a1981a7 Fix ellipsis behavior for Tensor.align_to to glob all missing dims (#26648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26648

Previously:
- `Tensor.align_to(*names)` only works on fully named tensors. In addition, the
desired ordering `names` must not have any None-names.
- `Tensor.align_to(*names)` accepted `...`, but expanded it based on
position. i.e., in `tensor.align_to('N', ..., 'C', 'H')`, `...` expand
to `*tensor.names[1:-2]`. This is wildly incorrect: see the following
concrete example.

```
tensor = tensor.refine_names('N', 'C', 'H, 'W')
tensor.align_to('W', ...) # ... expands to 'C', 'H, 'W'
```

This PR changes it so that `...` in `tensor.align_to` grabs all
unmentioned dimensions from `tensor`, in the order that they appear.
`align_to` is the only function that takes ellipsis that requires this
change. This is because all other functions (`refine_to`) require their
list of names to work in a positional manner, but `align_to` lets the
user reorder dimensions.

This does not add very much overhead to `align_to`, as shown in the
following benchmark. However, in the future, we should resolve to make
these operations faster; align_to should be as fast as view but isn't
most likely due to Python overhead.

```
[ins] In [2]: import torch
         ...: named = torch.randn(3, 3, 3, 3, names=('N', 'C', 'H', 'W'))
         ...: unnamed = torch.randn(3, 3, 3, 3)
         ...: %timeit unnamed[:]
         ...: %timeit unnamed.view(-1)
         ...: %timeit named.align_to(...)
         ...: %timeit named.align_to('N', 'C', 'H', 'W')

31 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
43.8 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
69.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
66.1 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

Test Plan:
- new tests [namedtensor ci]

allows the user to transpose and permute dimensions.

Differential Revision: D17528207

Pulled By: zou3519

fbshipit-source-id: 4efc70329f84058c245202d0b267d0bc5ce42069
2019-09-23 12:16:46 -07:00
fdf2bdef0c Revert D17450502: [pytorch][PR] [WIP] Enabled bfloat16 dtype on CUDA
Test Plan: revert-hammer

Differential Revision:
D17450502

Original commit changeset: 0a5acc5fe1b1

fbshipit-source-id: 6360e750e9805dc9c7c6ca8a9c16256ecd749416
2019-09-23 12:11:52 -07:00
e0f86f7aba Add namedtensor build & tests to default sets (#26633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26633

This will enable named tensor CI on all pull requests. Previously, named
tensor CI only ran on master.

This is essential for the experimental release because we would like to
prevent failures in the named tensor tests. In addition, when
cherry-picking changes to the release branch, the first signals appear
on the pull requests and it is good to be able to tell that something is
wrong before the cherry-pick is merged.

Test Plan:
- run CI
- check that the named tensor build / tests are indeed running on this
PR.

Differential Revision: D17523064

Pulled By: zou3519

fbshipit-source-id: d8d09bf584f1293bd3cfd43bf710d84f87d766ae
2019-09-23 12:08:52 -07:00
a9a9d362e2 Makes test_indexing.py device generic (#26634)
Summary:
- Makes test_indexing.py device generic
- Removes test_indexing_cuda.py

Note: a couple tests in test_indexing.py were already CPU and CUDA tests, meaning these tests were run multiple times when CUDA was available. Genericizing test_indexing.py corrects this and lets these tests be run on other device types, like XLA, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26634

Differential Revision: D17529001

Pulled By: mruberry

fbshipit-source-id: e71ba28d947749255a0aceeb7b77a42c4811439d
2019-09-23 11:52:48 -07:00
2a574e49b0 Sync docker images
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26651

Differential Revision: D17531136

Pulled By: IvanKobzarev

fbshipit-source-id: 550757ac409f59b2a3783455a5a0144724078598
2019-09-23 11:25:25 -07:00
781f861847 Add testing script for iOS x86 build (#26632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26632

### Summary

This script builds the TestApp (located in ios folder) to generate an iOS x86 executable via the `xcodebuild` toolchain on macOS. The goal is to provide a quick way to test the generated static libraries to see if there are any linking errors. The script can also be used by the iOS CI jobs. To run the script, simply see description below:

```shell
$ruby scripts/xcode_ios_x86_build.rb --help

-i, --install                    path to the cmake install folder
-x, --xcodeproj                  path to the XCode project file
```

### Note

The script mainly deals with the iOS simulator build. For the arm64 build, I haven't found a way to disable the Code Sign using the `xcodebuiild` tool chain (XCode 10). If anyone knows how to do that, please feel free to leave a comment below.

### Test Plan

- The script can build the TestApp and link the generated static libraries successfully
- Don't break any CI job

Test Plan: Imported from OSS

Differential Revision: D17530990

Pulled By: xta0

fbshipit-source-id: f50bef7127ff8c11e884c99889cecff82617212b
2019-09-23 11:21:21 -07:00
fc926d9242 fix operator level benchmark to have NHWC layout (#26577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26577

Have the NHWC layout expected by qconv kernel.
for rexnext101-32x4d shapes

Before :
```
Forward Execution Time (us) : 4787.046
Forward Execution Time (us) : 1320.065
Forward Execution Time (us) : 2611.631
Forward Execution Time (us) : 2562.389
Forward Execution Time (us) : 1072.342
Forward Execution Time (us) : 2330.658
Forward Execution Time (us) : 1894.549
Forward Execution Time (us) : 3446.532
Forward Execution Time (us) : 2381.251
Forward Execution Time (us) : 1157.339
Forward Execution Time (us) : 2712.621
Forward Execution Time (us) : 3789.905
Forward Execution Time (us) : 4057.886
Forward Execution Time (us) : 6104.570
Forward Execution Time (us) : 11328.552
Forward Execution Time (us) : 3707.519
Forward Execution Time (us) : 4681.272
Forward Execution Time (us) : 2459.266
Forward Execution Time (us) : 849.564
Forward Execution Time (us) : 3000.764
Forward Execution Time (us) : 3019.704
Forward Execution Time (us) : 5216.046
Forward Execution Time (us) : 3403.549
Forward Execution Time (us) : 1291.878
Forward Execution Time (us) : 2057.147
```

After
```
Forward Execution Time (us) : 4398.649
Forward Execution Time (us) : 993.619
Forward Execution Time (us) : 2252.265
Forward Execution Time (us) : 2230.500
Forward Execution Time (us) : 977.389
Forward Execution Time (us) : 2233.356
Forward Execution Time (us) : 1223.085
Forward Execution Time (us) : 2758.765
Forward Execution Time (us) : 2208.028
Forward Execution Time (us) : 821.816
Forward Execution Time (us) : 2396.748
Forward Execution Time (us) : 2505.803
Forward Execution Time (us) : 2771.251
Forward Execution Time (us) : 4816.474
Forward Execution Time (us) : 10065.299
Forward Execution Time (us) : 2424.949
Forward Execution Time (us) : 3854.800
Forward Execution Time (us) : 2297.426
Forward Execution Time (us) : 682.403
Forward Execution Time (us) : 2297.541
Forward Execution Time (us) : 2317.828
Forward Execution Time (us) : 4517.372
Forward Execution Time (us) : 2716.691
Forward Execution Time (us) : 942.385
Forward Execution Time (us) : 1717.172
```
ghstack-source-id: 90536232

Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output

Differential Revision: D17512291

fbshipit-source-id: 7764b2ab38e0e8e0aab982006915176638004df6
2019-09-23 11:12:51 -07:00
a79b3685db Simplify observers declaration with functools.partial (#26492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492

Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided.

Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention.

Test Plan: Imported from OSS

Differential Revision: D17521265

Pulled By: dzhulgakov

fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e
2019-09-23 10:15:59 -07:00
76697a3bfc Enabled bfloat16 dtype on CUDA
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26407

Differential Revision: D17450502

Pulled By: izdeby

fbshipit-source-id: 0a5acc5fe1b1555c61ebe038aee9eaaae9dac228
2019-09-23 09:19:04 -07:00
e4821012ad prevent generating caffe2::mkldnn for multiple times (#25257)
Summary:
This is a similar problem to https://github.com/pytorch/pytorch/issues/25004. After the merge of https://github.com/pytorch/pytorch/issues/25167, I recompiled torch and discovered another similar bug.

ezyang please take a look
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25257

Differential Revision: D17528116

Pulled By: ezyang

fbshipit-source-id: 1657d9ee6dced3548f246010b05e2b3c25c37dee
2019-09-23 08:53:02 -07:00
786d225968 ATen port of lgamma (cuda) (#26600)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/24585 .

Btw, there are two ways to define unary operator support:
1. Use `IMPLEMENT_UNARY_OP_VEC_CUDA(someunaryop)` in `aten/src/ATen/UnaryOps.cpp` and in `native_functions.yaml` have:
```
- func: someunaryop(Tensor self) -> Tensor
  use_c10_dispatcher: full
  supports_named_tensor: True
  variants: method, function
  dispatch:
    CPU: someunaryop
    CUDA: someunaryop
```

2. Or, in `aten/src/ATen/UnaryOps.cpp` have
```
Tensor& someunaryop_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, someunaryop_stub); }
Tensor someunaryop(const Tensor& self) { return unary_op_impl(self, someunaryop_out); }
Tensor& someunaryop_(Tensor& self) { return unary_op_impl_(self, someunaryop_out); }
```
   and in `native_functions.yaml` (note that `dispatch` section is removed):
```
- func: someunaryop(Tensor self) -> Tensor
  use_c10_dispatcher: full
  supports_named_tensor: True
  variants: method, function
```

It turns out that the way 1 is 3% more performant than the way 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26600

Differential Revision: D17527166

Pulled By: ezyang

fbshipit-source-id: 112ba298ad3f67d04078b921859e73dcd184852b
2019-09-23 08:46:44 -07:00
15b506068b Remove deprecated torch.gels (#26480)
Summary:
Changelog:
- Remove `torch.gels` which was deprecated in v1.2.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26480

Test Plan: - No tests were changed and all callsites for `torch.gels` where modified to `torch.lstsq` when `torch.lstsq` was introduced

Differential Revision: D17527207

Pulled By: zou3519

fbshipit-source-id: 28e2fa3a3bf30eb6b9029bb5aab198c4d570a950
2019-09-23 07:15:39 -07:00
557246b77d Fixing the calling parameters of write_gif function of the moviepy.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21218

Differential Revision: D17509260

Pulled By: ezyang

fbshipit-source-id: 51e392cbcc20ade4c38c4edb75919f9bb314a830
2019-09-23 06:53:24 -07:00
808f4a4d61 Revert D17521607: Name inference for min(Tensor, dim?) / max(Tensor, dim?)
Test Plan: revert-hammer

Differential Revision:
D17521607

Original commit changeset: 303e3cef2291

fbshipit-source-id: a27b99c2c1c8b2e389d34395ba28a74d2946bb5a
2019-09-23 05:43:40 -07:00
4fada96218 Renames tensor.renamed -> rename, tensor.names_ -> rename_ (#26548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26548

This makes the naming more consistent with PyTorch's API. The original
concern was that `tensor.rename` might make the operation seem like it
is in-place. However, we have many "verb" APIs: `tensor.add(other)`, for
example, doesn't add other to tensor in-place, but `tensor.add_(other)`
does.

`tensor.rename_` does exactly the same place as `tensor.rename`, but
in-place.

Test Plan: - [namedtensor ci]

Differential Revision: D17502021

Pulled By: zou3519

fbshipit-source-id: 6a5b93136a820075013cd1e30fb8fc6b9d77d7d9
2019-09-22 15:38:26 -07:00
d3e90bc47d Name inference for min(Tensor, dim?) / max(Tensor, dim?) (#25582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25582

There are a lot of min/max overloads. This PR adds name inference to
the following overloads for (both) min and max:
- min(Tensor, int dim)
- min(Tensor, Dimname dim)
- min(Tensor)  (full reduction)

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17521607

Pulled By: zou3519

fbshipit-source-id: 303e3cef22916dbc9da6a092d4f23e39e74c39e4
2019-09-22 12:20:51 -07:00
6b25562489 C++ API parity: at::Tensor::detach
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26251

Test Plan: Imported from OSS

Differential Revision: D17427578

Pulled By: pbelevich

fbshipit-source-id: c3d23a8c2da4148b86e7760ba5023eb38f7835af
2019-09-22 06:10:48 -07:00
bdf10380d6 Whenever possible, use function pointers rather than std::function to represent Operation's. (#26560)
Summary:
This takes a lot of pressure off of the C++ typechecker as well as generating much more
efficient and smaller code.  In my not-super-rigorous testing, compile time for
register_prim_ops.cpp went from 68s to 35s, and the size of libtorch went from 72MB to 70MB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26560

Differential Revision: D17507305

fbshipit-source-id: 8bbd2c08304739432efda96da71f0fa80eb7668b
2019-09-21 20:51:24 -07:00
99226cd51e Unify Quantization APIs for add, pool and relu (#26586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26586

Use the backend engine flag to call QNNPACK for quantized ops.

Test Plan: python test/test_quantized.py TestQNNPACKOps

Differential Revision: D17515129

Pulled By: supriyar

fbshipit-source-id: 951e90205aa19581ea006a91d9514fc7a94409ef
2019-09-21 13:41:16 -07:00
7e619650c9 Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (#26432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26432

Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp. In import flow we link to unpickler only.

Test Plan: Imported from OSS

Differential Revision: D17465410

fbshipit-source-id: 9d34629aa05bc0b45383e8f809c87baa186c9804
2019-09-21 11:56:48 -07:00
95cb22f21f _dequantize_linear -> _dequantize_per_tensor (#26576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26576

to match `quantize_per_tensor`

Test Plan:
ci

Imported from OSS

Differential Revision: D17517439

fbshipit-source-id: 8c20f9b5d2a50d0e42e4444994b0987e6204ac56
2019-09-21 11:52:19 -07:00
eca01eb0a6 quantized average_pool2d and adaptive_avg_pool2d implementation(Revert d17437015) (#26580)
Summary:
In this PR, we tried to fix the windows build issue of  d17437015.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26580

Differential Revision: D17517341

Pulled By: llyfacebook

fbshipit-source-id: db726596aa8f7c992c5a7ddc2781dc3aa0312284
2019-09-21 11:10:26 -07:00
fcfca9ad62 Skip some fragile tests (#26599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26599

These fail due to tolerance in equality comparison. Disable them for now.
ghstack-source-id: 90553855

Test Plan: unit tests

Differential Revision: D17517085

fbshipit-source-id: a4d9278e356318719ccd84047404915a97944f52
2019-09-21 11:06:42 -07:00
2e82ee0335 quantize_linear_per_channel -> quantize_per_channel (#26575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26575

To keep consistent with `quantize_per_tensor` we also
rename `quantize_linear_per_channel` to `quantize_per_channel`

Test Plan:
ci

Imported from OSS

Differential Revision: D17517360

fbshipit-source-id: 3af7d8f0fbe99148b79fcb1ad2fe811f776590cd
2019-09-21 11:02:17 -07:00
2667493f4c Expose supportedQEngines to python (#26474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26474

att

Test Plan:
python test/test_torch.py

Imported from OSS

Differential Revision: D17517373

fbshipit-source-id: af931761d6ee31a88808d05f686002a83b6b25af
2019-09-21 10:36:13 -07:00
d117842e56 C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26561

Test Plan: Imported from OSS

Differential Revision: D17507167

Pulled By: pbelevich

fbshipit-source-id: 167890c7b745acc9cb9ce4185f1d8c1745aaecc2
2019-09-21 08:37:46 -07:00
aa78523467 Fix CI (#26593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26593

This broke due to a merge conflict between my diffs and ezyang's multi dispatch diff being reverted and then relanded.
ghstack-source-id: 90549856

Test Plan: unit tests

Differential Revision: D17515837

fbshipit-source-id: c0bfd5f159ee4de80035079a1a2f39d5beafec41
2019-09-21 01:06:11 -07:00
d09d1d9aac Add inplace argument to InsertObservers and InsertQuantDeQuant (#26389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26389

att

Test Plan:
.

Imported from OSS

Differential Revision: D17504458

fbshipit-source-id: a1a5c908eabf270c1e8d2098532ffc46978a240c
2019-09-20 22:43:29 -07:00
1bec8d7a15 Get scalar type from observer module (#26425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26425

Currently the scalar type is hardcoded for weight and normal tensor
but what we want is to get it from corresponding observer module

Test Plan:
there are some known issues right now,
will test e2e later when all the issues are fixed

Imported from OSS

Differential Revision: D17504459

fbshipit-source-id: f5a21789c2ebeb60bff4acc777db80170063c9f8
2019-09-20 22:19:18 -07:00
254122dd4e quantize_linear -> quantize_per_tensor (#26574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26574

Since we also have `quantized::linear`, `quantize_linear` sounds
confusing, so we plan to rename it before the branch cut

Test Plan:
ci

Imported from OSS

Differential Revision: D17514876

fbshipit-source-id: 01d9005e6ec8cb9950b9d8bba122109c389641d3
2019-09-20 21:58:48 -07:00
7e6a55e417 Add DimType info in dumped debug nets (#26589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26589

Just for better debugging purpose.

Test Plan: Dump the net and check the dim type info is in the pb_txt.

Reviewed By: dreamingleo

Differential Revision: D17505931

fbshipit-source-id: ceba4c3849eb271c22227fa07a05d5bcb07344a5
2019-09-20 21:46:19 -07:00
1d2fb8d1a6 Compiler warnings cleanup for quantization.cpp.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26585

Test Plan: Imported from OSS

Differential Revision: D17514739

Pulled By: ZolotukhinM

fbshipit-source-id: a666564aad9ca8837b592d285da22701a4bf76df
2019-09-20 21:06:42 -07:00
5016796089 Enable creation of boxing wrappers for some aten operators (#26273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26273

[namedtensor ci]
ghstack-source-id: 90459908

Test Plan: unit tests

Differential Revision: D17393318

fbshipit-source-id: 1831da121ca7c64d1148b34c88a57bdc16c9fddf
2019-09-20 20:44:45 -07:00
9ed6074827 Correct the test of a big number (2 ^ 31) (#26491)
Summary:
2 ^ 31 is 29, which is not a big number. Corrected to 2 ** 31.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26491

Differential Revision: D17494296

fbshipit-source-id: 83d320e8fb6d1b7df41e4474933a98107c8e4129
2019-09-20 19:14:55 -07:00
8f68a7f241 Add two levels to use_c10_dispatcher (#26272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26272

```
use_c10_dispatcher: 'unboxed_only'
```
This is the previous implementation. The operator is registered with c10, but only in its unboxed form. No boxing wrapper is generated.

```
use_c10_dispatcher: 'full'
```
This does everything done by 'unboxed_only', but additionally creates a boxing wrapper so the op can be called through the c10 dispatcher using a boxed operator call.

This only changes registration, not the calling path. These operators are still called through the unboxed function pointer.

The final goal is to have 'full' for all operators, but this isn't immediately going to work for all ops.

[namedtensor ci]
ghstack-source-id: 90459907

Test Plan: unit tests

Differential Revision: D17393317

fbshipit-source-id: d629edfb3baede8c4ac869aa1886e512782ed2aa
2019-09-20 18:55:29 -07:00
ed207b53ab c10::KernelFunction (#26337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

ghstack-source-id: 90459911

Test Plan: unit tests

Differential Revision: D17416607

fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
2019-09-20 18:55:25 -07:00
8f54d0d6b6 update android/iOS build library packing (#26565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26565

For OSS mobile build we should keep QNNPACK off and PYTORCH_QNNPACK on
as we don't include caffe2 ops that use third_party/QNNPACK.

Update android/iOS build script to include new libraries accordingly.

Test Plan: - CI build

Differential Revision: D17508918

Pulled By: ljk53

fbshipit-source-id: 0483d45646d4d503b4e5c1d483e4df72cffc6c68
2019-09-20 17:48:15 -07:00
f7ba68e1f7 Support IValue string type (#26517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26517

Support IValue string kind

added 2 instrumented tests -> regenerated test.pt

# Test plan
Start android emulator
```
cd ./android/
gradle pytorch_android:cAT
```
tests passed

# Nits
Moved method IValue#getBool() - to have an order: bool, long, double, string

Test Plan: Imported from OSS

Differential Revision: D17513683

Pulled By: IvanKobzarev

fbshipit-source-id: d328f25772b61f54fb6fd3b2afacde3d7372f25c
2019-09-20 17:29:42 -07:00
b401e9d8e0 Corrected variable name and added test (#26503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26503

[pytorch] [distributed] Corrected variable name and added test
ghstack-source-id: 90454793

Test Plan: Made sure pg based UT works.

Differential Revision: D17488846

fbshipit-source-id: 6e6cba110a6f61ee1af3d37c5a41c69701de1a8b
2019-09-20 17:18:17 -07:00
516cf051ee Revert D17504331: Unify Quantization APIs for add, pool and relu
Test Plan: revert-hammer

Differential Revision:
D17504331

Original commit changeset: 35cb2189067a

fbshipit-source-id: d433288f1dbb430d647c6694b3e3ad4276787c3b
2019-09-20 17:13:01 -07:00
d6e3aed032 add eigen blas for mobile build (#26508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26508

Enable BLAS for pytorch mobile build using Eigen BLAS.
It's not most juicy optimization for typical mobile CV models as we are already
using NNPACK/QNNPACK for most ops there. But it's nice to have good fallback
implementation for other ops.

Test Plan:
- Create a simple matrix multiplication script model:
```
import torch

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.weights = torch.ones(1000, 1000)

    def forward(self, x):
        return torch.mm(x, self.weights)

n = Net()
module = torch.jit.trace_module(n, {'forward': torch.ones(1000, 1000)})
module.save('mm.pk')
```

- Before integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 2218.52.
```

- After integrate with eigen blas:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mm.pk \
--input_dims="1000,1000" \
--input_type=float \
--warmup=5 \
--iter=5'

Milliseconds per iter: 314.535.
```

- Improve MobileNetV2 single thread perf by ~5%:
```
adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 367.055.

adb shell 'cd /data/local/tmp; \
./speed_benchmark_torch_eigen \
--model=mobilenetv2.pk \
--input_dims="1,3,224,224" \
--input_type=float \
--warmup=5 \
--iter=20 \
--print_output=false \
--caffe2_threadpool_force_inline=true'

Milliseconds per iter: 348.77.
```

Differential Revision: D17489587

fbshipit-source-id: efe542db810a900f680da7ec7e60f215f58db66e
2019-09-20 15:45:11 -07:00
6fcbc37753 improve how pytorch_android cmake imports static lib (#26525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26525

Create a util function to avoid boilerplate code as we are adding more
libraries.

Test Plan: - build CI;

Differential Revision: D17495394

Pulled By: ljk53

fbshipit-source-id: 9e19f96ede4867bdff5157424fa68b71e6cff8bf
2019-09-20 15:45:06 -07:00
f0b7132b87 Revert D17437015: [pytorch][PR] Add the quantized average_pool2d support and adaptive_avg_pool2d support
Test Plan: revert-hammer

Differential Revision:
D17437015

Original commit changeset: 496aed1e4171

fbshipit-source-id: 53e22a85e06bd9d7827579b124b7f136230b6c1d
2019-09-20 15:01:49 -07:00
f337459619 Unify Quantization APIs for add, pool and relu (#26335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26335

Use the backend engine flag to call QNNPACK for quantized ops.

Test Plan:
python test/test_quantized.py TestQNNPACKOps

Imported from OSS

Differential Revision: D17504331

fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
2019-09-20 14:58:35 -07:00
11f9fe2433 Fix the API for record observer (#26413)
Summary:
Mainly want to resolve comments from https://github.com/pytorch/pytorch/pull/25830.

Overall, we want to provide a recording observer for recording the runtime tensor values of activation path in order to debug the numerical accuracy loss offline.

According to the feedback from https://github.com/pytorch/pytorch/issues/25830, it might be better to record all the observers in a dict and query the dict to get corresponding tensor values. hx89 is working on how to insert the recording observers into model under debug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26413

Differential Revision: D17506502

Pulled By: llyfacebook

fbshipit-source-id: 3ab90dc78920e7ec3fa572c2a07327a9991c530a
2019-09-20 14:27:56 -07:00
6411b92d6e Add the quantized average_pool2d support and adaptive_avg_pool2d support (#25899)
Summary:
//copied from PR https://github.com/pytorch/pytorch/issues/25676

===============For avg_pool2d==============

import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_linear(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.avg_pool2d(x, kernel_size=3, stride=None, padding=0)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.avg_pool2d(q_x, kernel_size=3, stride=None, padding=0)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype)
    torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

Before the vectorization:
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.67439603805542        7.126874923706055       2.6648539791017924
GB/s float      GB/s quant
1.2470733401269298      0.11699265230915809
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.587001323699951       7.011299133300781       2.7102031487456535
GB/s float      GB/s quant
1.2892022781148076      0.11892118481150399
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.6659250259399414      7.03080415725708        2.637285028215745
GB/s float      GB/s quant
1.2510359321992184      0.4743650833393638

After the vectorization
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.6113319396972656      0.5631613731384277      0.2156605847679846
GB/s float      GB/s quant
1.2771903676047593      1.48055608884072
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.5221967697143555      0.5518221855163574      0.21878633425529784
GB/s float      GB/s quant
1.322326647963202       1.5109794819499591
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.5173258781433105      4.0132904052734375      1.5942673295177407
GB/s float      GB/s quant
1.324885279636461       0.8310308159154421

===============For adaptive_avg_pool2d==============
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_linear(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.adaptive_avg_pool2d(x, output_size=5)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.adaptive_avg_pool2d(q_x, output_size=5)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype)
    torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')
~
//Before the vectorization
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.286238670349121       4.600362777709961       2.0121970804594342
GB/s float      GB/s quant
1.4158031888707898      0.17590264922602994
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.2867274284362793      4.474163055419922       1.9565790831832832
GB/s float      GB/s quant
1.4155005794518536      0.180864217503144
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.3176145553588867      4.264359474182129       1.8399778618588218
GB/s float      GB/s quant
1.3966360335956578      0.7590504551966285

//After the vectorization:
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.3224568367004395      0.23195743560791016     0.09987588657942796
GB/s float      GB/s quant
1.3937240722194333      3.4886400510473843
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.255082130432129       0.2124309539794922      0.09420098324258604
GB/s float      GB/s quant
1.435364129899667       3.8093130254365883
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.266514301300049       1.6029787063598633      0.7072440290539581
GB/s float      GB/s quant
1.4281242338260862      2.0192807222938463
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25899

Differential Revision: D17437015

Pulled By: llyfacebook

fbshipit-source-id: 496aed1e41711048d0853254d6819d3fb141a0c0
2019-09-20 14:20:16 -07:00
87f80ff8ea Support torch.pow with named tensors (#26541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26541

`torch.pow` already supports named tensors; every one of its constituent
codepaths propagates names:
- TensorIterator propagates names
- resize_as_ and fill_ propagate names (exponent == 0 or base == 1)
- resize_as_ and copy_ propagate names (exponent == 1)

This PR adds `supports_named_tensor = True` to the pow overloads,
enabling `pow` to take named tensors.

Test Plan: - [namedtensor ci]

Differential Revision: D17501402

Pulled By: zou3519

fbshipit-source-id: 07ee91d685e55dd58bbbb3a3fc9e185de8bb7515
2019-09-20 14:15:03 -07:00
98b5b6fc13 Implement resize_, resize_as_ for named tensors (#26493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26493

resize_ and resize_as_ are low level functions that are not meant to be
used as a part of the regular PyTorch user's routine. However, they are
used to implement a lot of our operations: `out=` functionality is
implemented by resizing an output to be the correct size.

To keep in line with already implemented `out=` functionality, we do the
following:
- resize_as_(self, other) propagates names according to `out=` functionality.
This means that if self doesn't have names, then we propagate
other.names. If self does have names, they must be equal to other.names.

In addition, resize_ cannot resize a named tensor to anything but the same size.

Test Plan: - [namedtensor ci]

Differential Revision: D17501404

Pulled By: zou3519

fbshipit-source-id: e396e7fba55e1419355933925226d02dccb03012
2019-09-20 14:14:59 -07:00
916eee182c Fix for Conv shape check prints overflowed ints (#25827)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/19947
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25827

Differential Revision: D17508653

Pulled By: soumith

fbshipit-source-id: 1afec60b9b39de5f2d0be44a170650aa4c1879cf
2019-09-20 14:11:47 -07:00
9f4174c496 expose USE_STATIC_DISPATCH macro to public headers
Summary:
USE_STATIC_DISPATCH needs to be exposed as we don't hide header files
containing it for iOS (yet). Otherwise it's error-prone to request all
external projects to set the macro correctly on their own.
Also remove redundant USE_STATIC_DISPATCH definition from other places.

Test Plan:
- build android gradle to confirm linker can still strip out dead code;
- integrate with demo app to confirm inference can run without problem;

Differential Revision: D17484260

Pulled By: ljk53

fbshipit-source-id: 653f597acb2583761b723eff8026d77518007533
2019-09-20 14:01:49 -07:00
73ae23a4ea add support for real4bits quant (#25426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25426

Add embedding table 4bit quantization support.

* add the conversion from fp32 to int4.
* using brew to pass the context so that the 4bit operators are added when generating the predictor net.

Reviewed By: kennyhorror, chocjy

Differential Revision: D16859892

fbshipit-source-id: a06c3f0b56a7eabf9ca4a2b2cb6c63735030d70b
2019-09-20 13:45:23 -07:00
1a114948ce Fix jit/pass/peephole.cpp fuse addmm (#26357)
Summary:
Fix https://github.com/pytorch/pytorch/issues/26328. Reversing the order of inserting nodes. Previously the IR graph looks like

```
graph(%0 : Float(3, 3)):
  %5 : Float(3, 3) = aten::addmm(%0, %0, %0, %6, %6)
  %6 : int = prim::Constant[value=1]()
  return (%5)
```
where %6 is used before created. Now
```
graph(%0 : Float(3, 3)):
  %5 : int = prim::Constant[value=1]()
  %6 : Float(3, 3) = aten::addmm(%0, %0, %0, %5, %5)
  return (%6)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26357

Reviewed By: hl475

Differential Revision: D17463945

Pulled By: houseroad

fbshipit-source-id: 4f483c2bc004a4a88f0976a7b37d7994d97ba41a
2019-09-20 13:32:03 -07:00
8c4b7a1b4b Changes to support int8 weight and fp32 bias in QNNPACK (#26307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan: python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

Pulled By: supriyar

fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
2019-09-20 13:17:56 -07:00
f55a9da00e Move the CUDA implementation of floor to ATen. (#25372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25372

Close #24617

Test Plan: Imported from OSS

Differential Revision: D17397478

fbshipit-source-id: 11a515235391ae796e2f84cde1913e56561c41bc
2019-09-20 13:15:29 -07:00
71ec9a0035 Clarify and correct the doc of atan2.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26180

Reviewed By: ezyang

Differential Revision: D17500224

Pulled By: albanD

fbshipit-source-id: 98b9f32aa443963fe1e89b83e15bed9ff83a2694
2019-09-20 12:58:12 -07:00
da8fbe5bf0 Minor improvement to C++ nn::Distance tests (#26539)
Summary:
C++ `nn::Distance` tests can take advantage of the newly released multi-dimensional tensor constructor https://github.com/pytorch/pytorch/pull/26210 to simplify the tensor constructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26539

Differential Revision: D17501041

Pulled By: yf225

fbshipit-source-id: 21d5f95ab3ec02227115c823c581218cee2ce458
2019-09-20 12:40:52 -07:00
a5bcde97af Revert D17427577: C++ API parity: at::Tensor::version
Test Plan: revert-hammer

Differential Revision:
D17427577

Original commit changeset: e9b3e76ca44d

fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b
2019-09-20 11:19:43 -07:00
b59e856517 Revert D17486465: [jit] Make is_optional check more robust
Test Plan: revert-hammer

Differential Revision:
D17486465

Original commit changeset: c513cef3bbc0

fbshipit-source-id: 567311c001d7dd0b7ab9ffe8bb894954bea583c9
2019-09-20 11:06:19 -07:00
198521978b C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26217

Test Plan: Imported from OSS

Differential Revision: D17427577

Pulled By: pbelevich

fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2
2019-09-20 11:02:41 -07:00
30fc011b9e Refactor Dimname.h API to be nicer (#26366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26366

Changes:
- `NameType::NORMAL` -> `NameType::BASIC`
- `Dimname::is_wildcard` -> `Dimname::isWildcard()`
- `Dimname::is_normal` -> `Dimname::isBasic()`.
- `at::is_valid_identifier` -> `Dimname::isValidName(string)`
- `at::match`, `at::unify` are now methods on `Dimname`.

I am adopting CamelCase for struct members of a named tensor related
struct.

Test Plan: - [namedtensor ci]

Differential Revision: D17484757

Pulled By: zou3519

fbshipit-source-id: 21c128e5025e81513e14d34506a7d7744caefdc2
2019-09-20 10:59:49 -07:00
6703587156 Delete tagged names
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26365

Test Plan: - [namedtensor ci]

Differential Revision: D17484759

Pulled By: zou3519

fbshipit-source-id: 44068c1e9d84adf36c5ab5e7006a153b948914d6
2019-09-20 10:59:45 -07:00
858cf76ef7 Disable tagged names (#26479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26479

This PR doesn't delete the code for them yet because it takes some effort to
determine what to delete. I will send a followup PR fully deleting
tagged names, but this PR disables their creation.

Test Plan: - [namedtensor ci]

Differential Revision: D17484758

Pulled By: zou3519

fbshipit-source-id: 451409e36eac98ffee1b98884d0f675bb5d46c9d
2019-09-20 10:59:41 -07:00
49777e6730 Fix options usage in C++ module / optimizer constructors (#26483)
Summary:
With this PR, we establish the following conventions:
1. Options in C++ module / optimizer constructors should always be `const SomeOptions&` type, not `SomeOptions` type.
2. The options constructor arg should always be named `options_`, not `options`, to not be confused with the module / optimizer's internal field `options`.
3. We never use `std::move` to assign `options_` to the module / optimizer's internal field `options` in the constructor definition. Instead, we simply use `options(options_)`.

Here is the reasoning:
We might be tempted to declare the constructor as `SomeModule(SomeOptions options_)` and have `options(std::move(options_))` in the member initialization list. However, this can be a dangerous design because the constructor might use `options_` to set values for other member fields in the member initialization list (e.g. 8317f75b79/torch/csrc/api/include/torch/optim/lbfgs.h (L30-L34)), and use-after-move can cause hard-to-debug problems.
Instead, we choose to explicitly use `const SomeOptions&` type for `options_`, and never use `std::move` to assign it to the internal `options` field. This way we have stronger guarantee on the validity of `options_` at any point in the constructor.

Notable exceptions to the above conventions:
1. C++ Embedding module doesn't adhere to the conventions now, which will be fixed after https://github.com/pytorch/pytorch/pull/26358 is landed.
2. C++ dataloader and dataset classes likely need similar changes. We will do it when we start to work on dataloader/dataset parity.

Thanks ShahriarSS for discovering the options usage inconsistency! 🚀
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26483

Differential Revision: D17500451

Pulled By: yf225

fbshipit-source-id: 49361a3519e4ede933789db75731d40144f0b617
2019-09-20 10:56:19 -07:00
4c40dbcb75 Resolve NamedTuple types in Python (#26443)
Summary:
When used as annotations on Python functions, `NamedTuple`s go through our Python annotation -> type mapping which previously had no way of lookup up `NamedTuple`s (which are created lazily by checking if the type has certain properties, so the lookup is creating the `TupleType` from scratch). This PR threads through the necessary data to make them work.

Fixes #26437
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26443

Pulled By: driazati

Differential Revision: D17486441

fbshipit-source-id: a6bbb543ff05a5abe61f1a7f68db9ecdb652b358
2019-09-20 10:53:25 -07:00
9a5b784eac Make is_optional check more robust (#26312)
Summary:
If the `Union` contains a non-class type, `issubclass` would fail, this
adds a check for that case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312

Pulled By: driazati

Differential Revision: D17486465

fbshipit-source-id: c513cef3bbc038f15c021eb0c1bf36be0df1eb90
2019-09-20 10:50:00 -07:00
4444b91141 Fix quantized::conv2d patterns in QuantFusion (#26515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26515

Fix patterns of `prepack` and `permute` after recent changes
to `quantized::conv2d` and `quantized::conv2d_prepack`

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17502573

fbshipit-source-id: 1a719fd610e8ea9dc16075abaa042556e1edbceb
2019-09-20 10:40:44 -07:00
efd933dd01 use timeout in connect function to prevent against (#26364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26364

Per https://github.com/pytorch/pytorch/issues/25769, we sometimes get
an infinite loop when `TCPStore` calls `tcputil::connect`, and the server
continually returns `ECONNRESET` or `ECONNREFUSED`. If a proper timeout is passed
in, we guard against this by throwing an exception once the timeout has passed.

Testing: Tested with modifying `TCPStore` to connect to an invalid port, thus getting
`ECONNREFUSED`. If a valid timeout is passed in, the function correctly throws an
exception. Steps below:
1) in TCPStore.cpp's constructor, replace the `connect` call with this line:
 `storeSocket_ = tcputil::connect(tcpStoreAddr_, 1, true, std::chrono::milliseconds(3000));`
2) Build the `TCPStoreTest` binary.
3) Run the binary. Expected output:

```
terminate called after throwing an instance of 'std::runtime_error'
  what():  Connecting to TCP store timed out.
Aborted (core dumped)
```
ghstack-source-id: 90480086

Test Plan: See above.

Differential Revision: D17430164

fbshipit-source-id: 1482aca72fcc3ddb95ea25649ec057edda5d1934
2019-09-20 10:28:30 -07:00
9b7011c5c2 Implement multiple dispatch (#26468) (#26501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26501

Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id.

XLA companion patch at https://github.com/pytorch/xla/pull/1031

Billing of changes:
* ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there *should* have been something registered at some key, but there wasn't.)
* Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core.  There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments.
* The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into '*this'.  I think this may be duplicated with some logic somewhere else but I have to double check.

The new generated code looks like this:

```
inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const {
    static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)");
    return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(*this, src))(const_cast<Tensor&>(*this), src, non_blocking);
}
```

The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together.

After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse.

* Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++.
* One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it.
* A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message)
* `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch.
* `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity.
* c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely.

Benchmark:

Apply the following patch to the base commit and this commit:

```
 diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp
new file mode 100644
index 0000000000..b66f4d3ece
 --- /dev/null
+++ b/aten/src/ATen/native/Const.cpp
@@ -0,0 +1,10 @@
+#include <ATen/ATen.h>
+
+namespace at {
+namespace native {
+
+Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) {
+  return self;
+}
+
+}} // namespace at::native
 diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml
index b494ed7950..fddae638bb 100644
 --- a/aten/src/ATen/native/native_functions.yaml
+++ b/aten/src/ATen/native/native_functions.yaml
@@ -5878,3 +5878,9 @@
   dispatch:
     CPU: im2col_backward_cpu
     CUDA: im2col_backward_cuda
+
+# For benchmarking
+- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _const5
```

Comparisons with timeit:

One-argument, representative case:

Before:

```
In [6]: %timeit x.reshape(1, 1)
1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit x.reshape(1, 1)
1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: %timeit x.reshape(1, 1)
1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit x.reshape(1, 1)
1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit x.reshape(1, 1)
1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit x.reshape(1, 1)
1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments):

Before:

```
In [1]: import torch

In [2]: x = torch.zeros(1)

In [3]: %timeit torch._const5(x, x, x, x, x)
949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit torch._const5(x, x, x, x, x)
985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D17499154

Pulled By: ezyang

fbshipit-source-id: 8ea237c2e935134b0f4f8d6cfd89c6a93037c02c
2019-09-20 10:12:04 -07:00
74710f9b9f Implement more size-oriented opcodes in the depickler. (#26454)
Summary:
These are intentionally not yet used by the encoder to avoid backcompat issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26454

Differential Revision: D17480844

fbshipit-source-id: e88ae7f5b94e32c7f12341a750aa4b9f7374bfb7
2019-09-20 09:42:17 -07:00
60dd203a1d Fixes test_wrapped_number (#26523)
Summary:
test_wrapped_number was calling torch.set_default_tensor_type('torch.FloatTensor'), which was setting the default tensor types for all following tests until a class boundary (with unittest) or until end of file (with pytest). Tests that don't expect the default tensor type to be set this way were then failing if run afterwards.

This fixes the issue by copying the default_tensor_type decorator from test_nn and using that instead with test_wrapped_number. The decorator correctly resets the default tensor type after the test has run.

This fixes the many errors encountered when running pytest test_jit.py.

Note: test_wrapped_number was introduced in https://github.com/pytorch/pytorch/issues/22273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26523

Differential Revision: D17495283

Pulled By: mruberry

fbshipit-source-id: ab518c78b7706af7cb1c2d1c17823d311178996d
2019-09-20 09:39:00 -07:00
9ca901895f Make distructor virtual for class with virtual function (#26504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26504

[pytorch] [distributed] Make distructor virtual for class with virtual function
Not having virtual distructor may lead to a memory leak.
ghstack-source-id: 90454880

Test Plan: Made sure pg based UT works.

Differential Revision: D17488876

fbshipit-source-id: 5fdc55e175fd2b22e931b740c36cb1feed454066
2019-09-20 09:36:29 -07:00
e2515a4d6d Allocate empty tensor instead of empty_like in binary ops, fix pow (#26498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26498

We should allocate an empty tensor as a result tensor when performing
binary ops. Currently some ops use `empty_like(self)` as the initial
result tensor before passing it into TensorIterator. This is not very
efficient because TensorIterator may resize the tensor due to
broadcasting, causing more memory allocation. By using an empty tensor
as the result tensor, we only need to allocate/resize memory once as
opposed to twice.

Also fixes https://github.com/pytorch/pytorch/issues/26495. The bug
there is that the implementation of `pow` is missing a resize in one
case.

Test Plan:
- new test
- run tests

Differential Revision: D17500025

Pulled By: zou3519

fbshipit-source-id: bff4949af5e75541c04669b961bcf2e1ec456faf
2019-09-20 07:38:08 -07:00
872ca919a9 Distance module (#26424)
Summary:
Adds `Distance` module parity.
https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26424

Differential Revision: D17487314

Pulled By: yf225

fbshipit-source-id: c7d124cb4afb08a4733e7212af0bb276bf32d172
2019-09-20 07:28:49 -07:00
f433ee1499 Add the FP16 weight support for LSTM in dynamic_quantize (#25975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25975

We would like to add the FP16 weight support for the dynamic quantized LSTM.

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details

```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization
-- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated
  Total time: 13.9 sec
Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep  9
22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
      ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.184s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971
Summary (total time 4.35s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D17299116

fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166
2019-09-19 22:19:22 -07:00
956b708437 turn off autograd mode in android JNI wrapper (#26477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26477

- At inference time we need turn off autograd mode and turn on no-variable
  mode since we strip out these modules for inference-only mobile build.
- Both flags are stored in thread-local variables so we cannot simply
  set them to false glboally.
- Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h'
  to reduce friction for iOS engs who might need do this manually in their
  project.

P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not
very trivial (e.g. there are manually written part not covered by codegen).
Might try it again later.

Test Plan: - Integrate with Android demo app to confirm inference runs correctly.

Differential Revision: D17484259

Pulled By: ljk53

fbshipit-source-id: 06887c8b527124aa0cc1530e8e14bb2361acef31
2019-09-19 21:25:39 -07:00
afa5d0823b Fixes big endian arch bugs. (#26383)
Summary:
Serialization.cpp fails on big endian machines.
This patch fixes the endian bugs and also makes the pytorch
model files portable across different endian architectures.
x86 generated model file can be read on s390 arch.

First problem, is serialization.cpp forgets to convert "size" value
of the storage elements to the native byte order.
torch.load throws an assertion as a result
(see the first stack trace below).

Second problem is when it reads the model from storage (doRead)
it decodes values to little endian which is the wrong order
on a big endian machine.  The decode should be
to THP_nativeByteOrder() instead
	(see the model dump below)
```loaded_model = torch.load( opt.model_file, map_location=torch.device("cpu"))
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 422, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 616, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected 2305843009213693952 got 32
	(the very long number is actually 32 in the wrong endianness)
```

Model file load on x86 (correct output)
```>>> import torch
>>> torch.load('400f2k_best.model', map_location=torch.device("cpu"))
{'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 2.4608e-01, -1.1174e-01, -1.0854e-01,  4.0124e-01, -1.5261e-02,
         -1.2206e-01,  1.3229e-01, -1.2615e-01, -5.2773e-01,  2.6333e-01,
         -3.1462e-03, -1.4902e-01,  9.8545e-02, -1.5789e-01, -2.2625e-01,
         -1.0776e-01, -9.0895e-02, -3.8530e-01,  9.1152e-01, -3.9720e-01,
         -8.5848e-01, -4.7837e-02, -1.5178e-01,  8.5023e-02,  1.5013e-01,
         -9.9294e-02, -2.7422e-01, -4.3986e-01, -4.4297e-01, -3.9570e-01,
```

Model file load on s390x (wrong endianness; notice the exponents)
```>>> import torch
>>> torch.load( "400f2k_best.model", map_location=torch.device("cpu"))
{'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 9.2780e+21, -9.7722e-11,  4.1350e+33,  7.782e+34,  4.2056e-31,
          9.0784e+18,  1.1846e-32,  3.3320e-32, -4.8288e-28, -7.2679e+12,
          1.5379e-16, -5.2604e+12, -4.7240e+17,  4.6092e-21, -1.8360e-20,
         -2.7712e-31,  1.4548e-16, -2.5089e-27,  7.9094e-10,  7.1977e+34,
          1.1930e+26,  8.4536e+15,  2.7757e+23, -5.8455e-10, -1.5611e+09,
         -1.1311e-23,  6.6451e+19, -2.0970e+20,  3.4878e-19, -1.0857e-12,
          7.8098e+22,  5.3998e-35],
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26383

Differential Revision: D17480891

fbshipit-source-id: f40569c7b9c4a1935dceb41f1a2508ce21ea3491
2019-09-19 19:58:02 -07:00
8f50ea0f5c Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471

att

Test Plan:
.

Imported from OSS

Differential Revision: D17491215

fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9
2019-09-19 17:42:09 -07:00
aad0263a6b Support multidimensional inputs to torch::tensor (#26210)
Summary:
This PR adds support for multidimensional inputs to `torch::tensor`, to match the Python `torch.tensor` API.

Closes https://github.com/pytorch/pytorch/issues/16099.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26210

Differential Revision: D17456761

Pulled By: yf225

fbshipit-source-id: a53ce74c535c13c5dcb833f19e9b6b79d12376b5
2019-09-19 17:37:55 -07:00
436c60a854 javadocs for Tensor, IValue, Module (#26149)
Summary:
At the moment it includes https://github.com/pytorch/pytorch/pull/26219 changes. That PR is landing at the moment, afterwards this PR will contain only javadocs.

Applied all dreiss comments from previous version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26149

Differential Revision: D17490720

Pulled By: IvanKobzarev

fbshipit-source-id: f340dee660d5ffe40c96b43af9312c09f85a000b
2019-09-19 16:50:43 -07:00
0f42881269 fix schema matching of tuples to vartype lists (#25944)
Summary:
In schema matching we allow a homogenous tuple to be matched to list arguments. This logic wasn't yet extended for vartype lists, causing stuff like `len((1, 2, 3))` to fail.

Fix for https://github.com/pytorch/pytorch/issues/20500
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25944

Differential Revision: D17482510

Pulled By: eellison

fbshipit-source-id: aa63318c27a01d965a7a7b68ce8bec638168dc26
2019-09-19 15:46:27 -07:00
5f2c320840 Disable bitcode for iOS CI jobs (#26478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26478

### Summary

Since QNNPACK [doesn't support bitcode](7d2a4e9931/scripts/build-ios-arm64.sh (L40)), I'm going to disable it in our CMake scripts. This won't hurt any existing functionalities, and will only affect the build size. Any application that wants to integrate our framework should turn off bitcode as well.

### Test plan

- CI job works
- LibTorch.a can be compiled and run on iOS devices

Test Plan: Imported from OSS

Differential Revision: D17489020

Pulled By: xta0

fbshipit-source-id: 950619b9317036cad0505d8a531fb8f5331dc81f
2019-09-19 15:38:57 -07:00
e72b0be2e1 fix cdist gradient computation if first arg is 1xn (#26254)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26076. mruberry if https://github.com/pytorch/pytorch/issues/26248 goes in soon, I'll rebase after it, otherwise this should go in because it's a bug fix.
Side note: cdist backward testing is very light and I suspect is not testing all the code paths, but that's a separate issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26254

Test Plan: added test for the affected size to test_autograd.py. Streams are tested by existing tests.

Differential Revision: D17480945

Pulled By: ngimel

fbshipit-source-id: 0f18c9fd05e462d22c410a2ebddc2bcc9580582d
2019-09-19 15:28:49 -07:00
1f2fa8d4d8 Make jit dicts ordered (#26465)
Summary:
Makes c10::Dict Ordered and bins binds the OrderedDict() and dict() constructor into torchscript. For the case of the empty constructor dict() i typed it as [str, Tensor] because:
• we're almost dropping support for python 2, at which point all dicts are ordered
• then it's more conventional to write x : Dict[int, int] = {} which is already supported
• It is possible to construct an arbitrarily typed empty OrderedDict through
OrderedDict(torch.jit.annotate(List[Tuple[key, value]], [])

We could consider dropping the no inputs aten::dict constructor since then the types would be more explicit.

This replaces https://github.com/pytorch/pytorch/issues/26170 and https://github.com/pytorch/pytorch/pull/26372 b/c ghstack was poisioned and i had to resubmit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26465

Differential Revision: D17481604

Pulled By: eellison

fbshipit-source-id: d2d49795a518c3489881afac45d070e5262c5849
2019-09-19 15:09:02 -07:00
4f7848e520 Make c10::Scalar::to<T>() const (#26406)
Summary:
Since `c10::Scalar::to<T>()` is not an in-place operation, we should be able to make it const. This removes the need of using `const_cast` at https://github.com/pytorch/pytorch/pull/26210#discussion_r324880325.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26406

Differential Revision: D17452258

Pulled By: yf225

fbshipit-source-id: 26881e2861f0f1f46cc2d92cc02a467e1f7eaa64
2019-09-19 15:06:14 -07:00
30e7665f55 Add a CI Job to Check BC Changes in Function Schemas (#26329)
Summary:
Ready for review, and results, please check https://circleci.com/gh/pytorch/pytorch/2827354?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Also, an experimental PR stacked on this have caught bc-breaking changes introduced by it.
https://github.com/pytorch/pytorch/pull/26398
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26329

Reviewed By: hl475

Differential Revision: D17485668

Pulled By: houseroad

fbshipit-source-id: b10682f1785a20ea04521992e0973b1380b4dd3b
2019-09-19 14:56:21 -07:00
5304358859 Revert D17481256: Implement multiple dispatch
Test Plan: revert-hammer

Differential Revision:
D17481256

Original commit changeset: b3206936b4ca

fbshipit-source-id: a162c42168c17e24b5eaff83a7aae48beef3d2c2
2019-09-19 14:53:40 -07:00
ce3d024727 Make options.name_ private, and change all callsites to use options.name() (#26419)
Summary:
The implementation of several modules in C++ frontend currently has calls to `options.name_`,  which is bad practice because `options.name_` should be a private options field and we should use `options.name()` to access its value. This PR makes `options.name_` actually private and changes all callsites of `options.name_` to `options.name()`.

After this change, we can change all module options to have a map as the underlying data structure, and require that all options must be able to be stored in `c10::IValue`. These changes together would make serializing module options much easier.

Note that this PR is BC-breaking in the following way:

Previously, calling `options.name_` in C++ module implementation works because `options.name_`  was a public field. After this PR, `options.name_` becomes private, and to get the value of `options.name_` we should call `options.name()`, and to set the value of `options.name_` we should call `options.name(new_value)`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26419

Differential Revision: D17481507

Pulled By: yf225

fbshipit-source-id: 93e4ed0e1d79ef57104ad748809d03e25da61ed3
2019-09-19 14:48:22 -07:00
587128e3dc Use github actions for flake8 (#25824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25824

Use github actions for flake8. This is nice because it makes it easier
to create inline annotations for lint violations.

It ends up looking like this:
https://github.com/suo/pytorch/pull/21/files

Test Plan: Imported from OSS

Differential Revision: D17487007

Pulled By: suo

fbshipit-source-id: 663094ea2bbbdb1da5b7e5d294c70735a319d5e5
2019-09-19 14:37:37 -07:00
454bf21b36 port lgamma from TH to Aten (#25138)
Summary:
https://github.com/pytorch/pytorch/issues/24722
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25138

Differential Revision: D17171782

Pulled By: VitalyFedyunin

fbshipit-source-id: b0026f0ce5306debf19036f97b8624bf0a56f349
2019-09-19 14:33:03 -07:00
0705f759a3 Implement multiple dispatch (#26468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26468

Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id.

XLA companion patch at https://github.com/pytorch/xla/pull/1031

Billing of changes:
* ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there *should* have been something registered at some key, but there wasn't.)
* Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core.  There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments.
* The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into '*this'.  I think this may be duplicated with some logic somewhere else but I have to double check.

The new generated code looks like this:

```
inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const {
    static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)");
    return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(*this, src))(const_cast<Tensor&>(*this), src, non_blocking);
}
```

The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together.

After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse.

* Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++.
* One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it.
* A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message)
* `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch.
* `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity.
* c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely.

Benchmark:

Apply the following patch to the base commit and this commit:

```
 diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp
new file mode 100644
index 0000000000..b66f4d3ece
 --- /dev/null
+++ b/aten/src/ATen/native/Const.cpp
@@ -0,0 +1,10 @@
+#include <ATen/ATen.h>
+
+namespace at {
+namespace native {
+
+Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) {
+  return self;
+}
+
+}} // namespace at::native
 diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml
index b494ed7950..fddae638bb 100644
 --- a/aten/src/ATen/native/native_functions.yaml
+++ b/aten/src/ATen/native/native_functions.yaml
@@ -5878,3 +5878,9 @@
   dispatch:
     CPU: im2col_backward_cpu
     CUDA: im2col_backward_cuda
+
+# For benchmarking
+- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _const5
```

Comparisons with timeit:

One-argument, representative case:

Before:

```
In [6]: %timeit x.reshape(1, 1)
1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit x.reshape(1, 1)
1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: %timeit x.reshape(1, 1)
1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit x.reshape(1, 1)
1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit x.reshape(1, 1)
1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit x.reshape(1, 1)
1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments):

Before:

```
In [1]: import torch

In [2]: x = torch.zeros(1)

In [3]: %timeit torch._const5(x, x, x, x, x)
949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit torch._const5(x, x, x, x, x)
985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bddppq

Differential Revision: D17481256

Pulled By: ezyang

fbshipit-source-id: b3206936b4ca8938d45ea90fd71422e0d80b5f96
2019-09-19 14:29:38 -07:00
af64789cfa Fold activation permutation inside quantized conv operator (#26242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26242

According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Test Plan: Imported from OSS

Differential Revision: D17443218

Pulled By: dzhulgakov

fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
2019-09-19 13:39:26 -07:00
d5daac7223 Fold weight permutation inside quantized conv operator (#26241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26241

According to https://github.com/pytorch/pytorch/issues/19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism.

Test Plan: Imported from OSS

Differential Revision: D17443219

Pulled By: dzhulgakov

fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062
2019-09-19 13:39:22 -07:00
8c1354c31b Implement more support for per-channel quantization (#26240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Test Plan: Imported from OSS

Differential Revision: D17443220

Pulled By: dzhulgakov

fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
2019-09-19 13:39:17 -07:00
8317f75b79 Use gradle 4.10.3 for build and publish
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26473

Differential Revision: D17484698

Pulled By: IvanKobzarev

fbshipit-source-id: 8bd888f51054a5f02291938f1469ef0d0fa02cb2
2019-09-19 12:47:15 -07:00
f673def92d Enabled where for bool tensor on CUDA (#26430)
Summary:
Enabled "where_cuda" for bool tensors on CUDA
Fixing https://github.com/pytorch/pytorch/issues/26247
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26430

Differential Revision: D17464181

Pulled By: izdeby

fbshipit-source-id: cbb09925753b2e6f35e7400da3243d4d3fc86b69
2019-09-19 12:29:31 -07:00
aad8738681 Remove quantization for bias in pattern (#26415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26415

We do dynamic quantization for bias right now, remove this in pattern

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17465555

fbshipit-source-id: 5e229cbc6ae85ea4ce727b3479993d79747d7792
2019-09-19 11:57:11 -07:00
d799726474 ensure c10/macros included before using (#26439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26439

C10_MOBILE / FEATURE_TORCH_MOBILE are checked in EnableNamedTensor.h but
NamedTensor.h includes it at very beginning - for internal build it's
fine as C10_MOBILE / FEATURE_TORCH_MOBILE are set as compiler flags, but
for cmake build it relies on c10/macros/Macros.h header to derive these
macros from other macros like __ANDROID__, so it won't work as expected.

Test Plan:
- build locally;
- will check CI;

Differential Revision: D17466581

Pulled By: ljk53

fbshipit-source-id: 317510bcc077782ec2d22e23b1aaa0cb77cb73a9
2019-09-19 11:53:33 -07:00
68895eb9f4 fix flaky test (#26395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26395

This diff makes each SummaryWriter write into its own unique directory.

Reviewed By: orionr

Differential Revision: D17441500

fbshipit-source-id: d284fcf0e7e7a7214e644349e345f1de0e1a1aba
2019-09-19 11:13:31 -07:00
fe9dbbdba3 Emergency Docker upgrade to version 347. (#26466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26466

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17480533

Pulled By: ezyang

fbshipit-source-id: 5532bd50aaea284ebb208feb949b5a6aca6be458
2019-09-19 10:11:25 -07:00
4c1a2c2033 add setitem to class types (#25750)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/25664, add `class_type[ind] = val`. Like `__getitem__`, `__setitem__` has a custom compilation path so it wasn't added with the rest of the magic methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25750

Differential Revision: D17428725

Pulled By: eellison

fbshipit-source-id: ff3767ef41515baf04b0c0f5c896dbd3f1d20cd3
2019-09-19 10:01:39 -07:00
07bd76988e Revert D17265918: Implement multiple dispatch
Test Plan: revert-hammer

Differential Revision:
D17265918

Original commit changeset: 221efe4e86a4

fbshipit-source-id: f0ab90fa1201080e0d62fd140faf0fcdfd56601b
2019-09-19 09:50:17 -07:00
ece14ff473 Implement multiple dispatch (#25653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25653

Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id.

Billing of changes:
* ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there *should* have been something registered at some key, but there wasn't.)
* Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core.  There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments.
* The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into '*this'.  I think this may be duplicated with some logic somewhere else but I have to double check.

After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse.

* Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++.
* One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it.
* A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message)
* `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch.
* `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity.
* c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely.

Benchmark:

Apply the following patch to the base commit and this commit:

```
 diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp
new file mode 100644
index 0000000000..b66f4d3ece
 --- /dev/null
+++ b/aten/src/ATen/native/Const.cpp
@@ -0,0 +1,10 @@
+#include <ATen/ATen.h>
+
+namespace at {
+namespace native {
+
+Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) {
+  return self;
+}
+
+}} // namespace at::native
 diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml
index b494ed7950..fddae638bb 100644
 --- a/aten/src/ATen/native/native_functions.yaml
+++ b/aten/src/ATen/native/native_functions.yaml
@@ -5878,3 +5878,9 @@
   dispatch:
     CPU: im2col_backward_cpu
     CUDA: im2col_backward_cuda
+
+# For benchmarking
+- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _const5
```

Comparisons with timeit:

One-argument, representative case:

Before:

```
In [6]: %timeit x.reshape(1, 1)
1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit x.reshape(1, 1)
1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: %timeit x.reshape(1, 1)
1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit x.reshape(1, 1)
1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit x.reshape(1, 1)
1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit x.reshape(1, 1)
1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments):

Before:

```
In [1]: import torch

In [2]: x = torch.zeros(1)

In [3]: %timeit torch._const5(x, x, x, x, x)
949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

After:

```
In [3]: %timeit torch._const5(x, x, x, x, x)
985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit torch._const5(x, x, x, x, x)
984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit torch._const5(x, x, x, x, x)
988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17265918

Pulled By: ezyang

fbshipit-source-id: 221efe4e86a40f36abc81e2ebceaa7e251c90b3d
2019-09-19 09:30:40 -07:00
fc3e1a22da C++ API parity: at::Tensor::output_nr
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26216

Test Plan: Imported from OSS

Differential Revision: D17427576

Pulled By: pbelevich

fbshipit-source-id: 351c834c6c44a2a2f915e48a1e8aa8ad7f4274b3
2019-09-19 09:11:40 -07:00
97c8c18a21 tag files should not be deleted by "python setup.py clean".
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26416

Differential Revision: D17477606

Pulled By: ezyang

fbshipit-source-id: de36e8556981af0b2a71f17ee8e61b9deb5da024
2019-09-19 07:23:15 -07:00
d9ab78b3f0 Moves more tests to TestTorchDeviceType (#26435)
Summary:
- Moves all ROCm-requiring test_torch tests to TestTorchDeviceType
- Moves test_stft and test_lu from test_cuda
- Moves many CUDA-only test_torch tests to TestTorchDeviceType
- Combines several test_torch CPU tests with their CUDA variants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26435

Differential Revision: D17470469

Pulled By: mruberry

fbshipit-source-id: 90bb7fc09465c53eb2ab8da52eb2c2509775c16f
2019-09-19 01:49:34 -07:00
6b4bbdda37 fix JNI wrapper for IValue interface change (#26448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26448

Seems CI was broken by PR #25439 - fix based on interface change.

Test Plan: - build locally

Differential Revision: D17468987

Pulled By: ljk53

fbshipit-source-id: 3c1cb582c8d05357a94295b670b2ce61a7a5a4cd
2019-09-18 23:54:03 -07:00
8d9364ef32 Refactor emitIsInstance (#26061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26061

This is in preparation for actually emitting a dynamic isinstance check instruction.
It re-arranges the  logic so that all the types and properties to check
against are in a flat list. In the future this flat list will be encoded
into an actual instruction if we determine that we cannot perform
the check statically.

Test Plan: Imported from OSS

Differential Revision: D17332062

Pulled By: zdevito

fbshipit-source-id: 4c0b65436f8e030170d469fe747e79de24bb24eb
2019-09-18 23:27:13 -07:00
d46b982db3 Add support to call unpack for pytorch mobile quantized FC and Conv (#26211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26211

Currently QNNPACK does not have an unpack function like FBGEMM does.
In order to be able to script quantized models for mobile, we need to save unpacked weights.

This change stores the original weights and bias in the opaque struct and simply returns it when unpack is called

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_unpack
python test/test_quantized.py TestQNNPackOps.test_qlinear_unpack

Imported from OSS

Differential Revision: D17464430

fbshipit-source-id: 83ad5a2556dcf13245a1047feef6cfb489c9ef69
2019-09-18 23:05:18 -07:00
921079c5c2 flat hash map that preserves insertion and deletion order (#25675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25675

This will be used to support OrderedDict in python. Modifies the existing `flat_hash_map` to preserve insertion and deletion order.

Test Plan: Imported from OSS

Differential Revision: D17440131

Pulled By: eellison

fbshipit-source-id: c7a6a290c8471627f5a061c0cca8e98ff131c9b4
2019-09-18 22:36:31 -07:00
43b30cd5d9 make copy (#26371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26371

This is just so the PR making the flat hash map preserve order is easier to review

Replaces https://github.com/pytorch/pytorch/pull/25674 bc ghstack was poisoned and i had to resubmit

Test Plan: Imported from OSS

Differential Revision: D17440132

Pulled By: eellison

fbshipit-source-id: 8a4f640d070d85795261cb3a129518c72096e9ef
2019-09-18 22:36:27 -07:00
dcbfc3bdbf Add per channel observer (#25887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25887

ghstack-source-id: 90383258

Add per channel observer to compute the qparams for each channel.

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer'

buck test mode/dev caffe2/test:quantization -- 'test_per_channel_minmax_observer_scriptable'

Differential Revision: D17137226

fbshipit-source-id: 0b1c93e3cbcda86f5c4e30f7cd94c670f2665063
2019-09-18 22:16:45 -07:00
7042bfea1d Revert D17374409: [pytorch][PR] Implement more size-oriented opcodes in the depickler.
Test Plan: revert-hammer

Differential Revision:
D17374409

Original commit changeset: 17971b26e484

fbshipit-source-id: 527d220cd814d2228bd9439d60bf19a9ec42ed40
2019-09-18 21:58:09 -07:00
293d73fc92 Export gelu (#24475)
Summary:
Added support for gelu in symbolic opset9 + op and ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24475

Reviewed By: hl475

Differential Revision: D17088708

Pulled By: houseroad

fbshipit-source-id: 9d2f9d7d91481c57829708793d88f786d6c3956f
2019-09-18 21:18:07 -07:00
5127599152 Implement more size-oriented opcodes in the depickler. (#25786)
Summary:
These are intentionally not yet used by the encoder to
avoid backcompat issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25786

Differential Revision: D17374409

fbshipit-source-id: 17971b26e48429c68b7fa8126d7ed56ff80b5d68
2019-09-18 21:05:25 -07:00
595c1dfa74 Export clamp for opset 11 (#25797)
Summary:
- Export clamp for opset 11, which enables dynamic min/max inputs.
- Bump ONNX Runtime version in CI to enable opset 11 onnx::clip tests.
~~- Re-enable some disabled tests, now that backend impl & fixes are in.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25797

Reviewed By: hl475

Differential Revision: D17399112

Pulled By: houseroad

fbshipit-source-id: 9b8bfa86b2bddfb5e15d6812f04b31db6e701d26
2019-09-18 20:40:23 -07:00
b1ecf4bc82 Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine
Test Plan: revert-hammer

Differential Revision:
D17464904

Original commit changeset: d8f2cebb978f

fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4
2019-09-18 20:04:58 -07:00
cbc7172a02 Fix quantized::linear QuantFusion patterns (#26414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26414

Fix the patterns after changes to prepack functions(https://github.com/pytorch/pytorch/pull/25626)

Test Plan:
pytho test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17465553

fbshipit-source-id: 7df6a6aa8389bb4a7a370c65ade4c2585b45b882
2019-09-18 19:59:07 -07:00
4f7292f7ee Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330

att

Test Plan:
.

Imported from OSS

Differential Revision: D17464904

fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915
2019-09-18 19:38:59 -07:00
c3f881cdbc add script to build mobile library with host toolchain (#26440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26440

As we are optimizing build size for Android/iOS, it starts diverging
from default build on several build options, e.g.:
- USE_STATIC_DISPATCH=ON;
- disable autograd;
- disable protobuf;
- no caffe2 ops;
- no torch/csrc/api;
...

Create this build_mobile.sh script to 'simulate' mobile build mode
with host toolchain so that people who don't work on mobile regularly
can debug Android/iOS CI error more easily. It might also be used to
build libtorch on devices like raspberry pi natively.

Test Plan:
- run scripts/build_mobile.sh -DBUILD_BINARY=ON
- run build_mobile/bin/speed_benchmark_torch on host machine

Differential Revision: D17466580

Pulled By: ljk53

fbshipit-source-id: 7abb6b50335af5b71e58fb6d6f9c38eb74bd5781
2019-09-18 19:34:09 -07:00
495dbacfd1 Back out "[pytorch][PR] Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma" (#26444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26444

Original commit changeset: 6276a011a373

Test Plan: Revert of recent change that was breaking a test. Test plan is that build no longer breaks verified manually.

Differential Revision: D17467067

fbshipit-source-id: bf866f4dc0f08af249d92cebc9846623d44224f6
2019-09-18 19:12:29 -07:00
fb28014af0 Remove quantizeBias (#26388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26388

We don't need to quantize bias after https://github.com/pytorch/pytorch/pull/26057

Test Plan:
.

Imported from OSS

Differential Revision: D17465554

fbshipit-source-id: 6b3992aa3ff1c17ccef11850c2b0a008b225bf30
2019-09-18 19:07:27 -07:00
6387ffab65 Exclude libfbjni.so from pytorch_android not to have its duplicating (#26382)
Summary:
fbjni is used during linking `libpytorch.so` and is specified in `pytorch_android/CMakeLists.txt` and as a result its included as separate `libfbjni.so` and is included to `pytorch_android.aar`

We also have java part of fbjni and its connected to pytorch_android as gradle dependency which contains `libfbjni.so`

As a result when we specify gradle dep `'org.pytorch:pytorch_android'` (it has libjni.so) and it has transitive dep `'org.pytorch:pytorch_android_fbjni'` that has `libfbjni.so` and we will have gradle  ambiguity error about this

Fix - excluding libfbjni.so from pytorch_android.aar packaging, using `libfbjni.so` from gradle dep `'org.pytorch:pytorch_android_fbjni'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26382

Differential Revision: D17468723

Pulled By: IvanKobzarev

fbshipit-source-id: fcad648cce283b0ee7e8b2bab0041a2e079002c6
2019-09-18 18:40:48 -07:00
e44ea6cd5e tvm operator dynolog (#26295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295

Log the following in scuba caffe2_tvm_operator_stats:
1. everything in caffe2_operator_stats
2. fallback netdef
3. tvm module graph_json
4. whether compilation triggered this round
5. number of compilations stored in tvm_runtime_map
6. (not yet logged) last compilation time if any
7. (not yet logged) total bytes occupied by compilation
8. whether this compilation is fallback
9. batch size as observed by tvm op

Test Plan:
```
buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger=
2
```

Logs show up in the scuba:
https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats

https://fburl.com/scuba/lq2h22e4

Auto submitted adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421064436039494716
Additional adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421082681202831286/
Additional adfinder canary:
https://our.intern.facebook.com/intern/ads/canary/421082685084831037/

Reviewed By: yinghai

Differential Revision: D17358412

fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5
2019-09-18 18:37:17 -07:00
36ade9aa23 Move the CUDA implementation of rsqrt to ATen. (#25285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25285

Fix #24620

Test Plan: Imported from OSS

Differential Revision: D17397459

fbshipit-source-id: 024dc0da8085df85513fde5f1d1e0141f734b284
2019-09-18 18:17:52 -07:00
44ffbc43de C++ API parity: at::Tensor::is_leaf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26186

Test Plan: Imported from OSS

Differential Revision: D17427580

Pulled By: pbelevich

fbshipit-source-id: c01362a3b1fdb0bd1dfc158dbf6fe1cf1d928761
2019-09-18 17:56:13 -07:00
a8386d2a7d fix composite learning rate (#26227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26227

In the previous implementation of composite lr, the lr_scale for each sub policy will be rewritten by the last lr_scale.

Due to another bug in unittest (where policy_lr_scale being the same for all sub policies), this bug was not detected by unittest...

Fix: add an additional field in CompositeLearningRateItem so that we store  lr_scale values for all sub policies

If fix unittest, the error in previous implementation:
https://fburl.com/testinfra/ikdbnmey

With the fix,
https://fburl.com/testinfra/m694ehl1

Test Plan:
unittest

buck test  caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_composite_learning_rate_op

Reviewed By: chocjy, alex1o1o7cloud

Differential Revision: D17380363

fbshipit-source-id: 161e9cb71bb2ea7f0734a3361e270616057a08e4
2019-09-18 17:34:17 -07:00
f75c1e4939 Add extra filtering for scale/zero_point/dtype in FoldQuantizeCallIntoBuffer (#26224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26224

We need to make sure they are Constant before we can do folding

Test Plan:
python test/test_jit.py 'TestJit.test_fold_quantize'

Imported from OSS

Differential Revision: D17462530

fbshipit-source-id: 2e02f980e0e7f28014d2f813035975dfc69cacd9
2019-09-18 17:03:56 -07:00
b23be95558 Adding quantized::conv2d function for pytorch mobile in c10 (#26152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26152

This change adds the support to call QNNPACK using the refactored API for Conv2d operators

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qconv_qnnpack

Imported from OSS

Differential Revision: D17459892

fbshipit-source-id: d20b3e8b81dd403541cb2b9164731448ca229695
2019-09-18 16:48:42 -07:00
1f51051287 remove extra get_worker_id call in distributed rpc init (#26381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26381

Was looking through this definition and saw that it has 2 identical
definitions of get_worker_id. Tested by ensuring that all tests in
`test/test_rpc.py` still pass.
ghstack-source-id: 90347452

Test Plan: See above

Differential Revision: D17439495

fbshipit-source-id: 9a78340f7aefa5797e0ae837fbcfe24ebe3a775d
2019-09-18 16:34:54 -07:00
f29e0d70cb Add filter function to subgraph rewriter runGraph (#26223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26223

add filter function to runGraph, if the function returns false for given `Match`,
the we'll skip the rewrite.

Test Plan:
will test in later PR that adds extra filtering on Constant nodes

Imported from OSS

Differential Revision: D17462529

fbshipit-source-id: 52abe52cb3e729a3871f7a60eddd5275060af36a
2019-09-18 16:34:50 -07:00
12762cd586 Use static type information to restore type tags (#25447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25447

When we unpickle IValues, we lose type information for List[T]
and Dict[K, V]. We can restore this information using the static
type information contained in the top-level Module/Class type.

This ensures that even after serialization we can always get the
dynamic type of an ivalue using its type() method.

Test Plan: Imported from OSS

Differential Revision: D17127872

Pulled By: zdevito

fbshipit-source-id: 1ffb5e37a7c35c71ac9d3fb7b2edbc7ce3fbec72
2019-09-18 16:07:01 -07:00
ad0af1127b Add ivalue::type(), part 1 (#25439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25439

This introduces a type() method on IValue that returns the tagged type
of the IValue. The intention is that this value is always present/accurate,
making it possible for clients to recover the Type from an IValue.
Currently our APIs here are incomplete: they can sometimes recover a type but not always.

This PR adds the function, and cleans up remaining cases where Lists/Dicts are not
tagged. However, this information does not survive serialization unchanged.

A second PR will use the type information in the ClassType being serialized
to fixup the serialized ivalues to have the correct types again.
After this patch it will be save to remove our incomplete APIs for recovering types.

Test Plan: Imported from OSS

Differential Revision: D17125595

Pulled By: zdevito

fbshipit-source-id: 71c8c1a0e44762647e8f15f45d8ed73af8e6cb92
2019-09-18 16:06:58 -07:00
d02369dac2 add pass for onnx scalar type conversion (#24378)
Summary:
This pass tries to resolve scalar type mismatch issues between input tensors introduced by the implicit type conversions on scalars.

e.g. https://github.com/pytorch/pytorch/issues/23724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24378

Reviewed By: hl475

Differential Revision: D17088682

Pulled By: houseroad

fbshipit-source-id: 3de710f70c3b70b9f76fd36a7c4c76e168dbc756
2019-09-18 15:55:54 -07:00
248d5857ae Adds dtypes decorators to and allows helper methods in device generic test classes (#26375)
Summary:
- Adds dtypes, dtypesIfCPU, and dtypesIfCUDA decorators.
- Eliminates the need for nontest members to be defined in an inherited base.
- Updates one test to use the decorators and updates TestTorchDeviceType with helpers.

This PR appears to be hanging the ROCm build, which is not entirely surprising. See https://github.com/pytorch/pytorch/issues/26394, which demonstrates that the ROCm build can be hung by commenting out a Python test that was never run on ROCm.

gchanan - what type list, if any, do you want to expose? I imagine most test suites will define their own lists like today. SCALAR_TYPES, QUANTIZED_TYPES, and ALL_TYPES seem reasonable to me. DOCUMENTED_TENSOR_TYPES will be removed, of course.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26375

Test Plan: Edit is to tests themselves.

Differential Revision: D17462294

Pulled By: mruberry

fbshipit-source-id: f8259ec66709749b1bf8077efc737676af901436
2019-09-18 15:35:52 -07:00
52d999e173 Disable QNNPACK tests if pytorch is not built with it. (#26427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26427

Use the new macro USE_PYTORCH_QNNPACK to enable testing with qnnpack

Test Plan:
test caffe2/test:quantized -- TestQNNPackOps
Summary (total time 4.96s):
  PASS: 0
  FAIL: 0
  SKIP: 4
    caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Reviewed By: ljk53

Differential Revision: D17459791

fbshipit-source-id: 3798fc270d22123b8807c9c63f12b9940981b115
2019-09-18 14:51:29 -07:00
a561660241 Puts ROCm tests on default stream (#26394)
Summary:
This PR has been updated. Since ORIGINAL PR comment below.

ROCm CI builds have been hanging as we've been refactoring tests, even when these refactors seem entirely innocuous. This PR started by commenting out test_stft, for example, a Python test never run on ROCm, and that was sufficient to reliably hang the ROCm build in CI.

Putting ROCm tests back on the default stream appears to remove this hang. So this PR now does that. This is likely to unblock development.

ORIGINAL: Some test changes appear to be causing ROCm builds to hang in CI. This PR is an attempt to diagnose the source of the hang.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26394

Test Plan: Change is to test themselves.

Differential Revision: D17456678

Pulled By: mruberry

fbshipit-source-id: 38d00d01c64b5055c1dfed01687ce3e1c9372887
2019-09-18 14:07:33 -07:00
13b544602e Fix many type mismatches in the CUDA version of calc_digamma and calc_trigamma (#25791)
Summary:
- There are some missing casts.
- Functions like ::log, ::sin will potentially always invoke the double version on host. For
  example, compiling the following code:

  ```c++
  #include <cmath>

  float log_float(float f) {
      return ::logf(f);
  }

  double log_double(double f) {
      return ::log(f);
  }

  float log_float2(float f) {
      return ::log(f);
  }

  float log_float3(float f) {
      return std::log(f);
  }
  ```

  using `g++ -c -O3` leads to:

      log_float(float):
              jmp     logf
      log_double(double):
              jmp     log
      log_float2(float):
              subq    $8, %rsp
              cvtss2sd        %xmm0, %xmm0
              call    log
              addq    $8, %rsp
              cvtsd2ss        %xmm0, %xmm0
              ret
      log_float3(float):
              jmp     logf

  Note that log_float2 delegates the call to the double version of log
  (surrounded by cast), while log_float3 delegates the call correctly to
  logf. See https://godbolt.org/z/KsRWwW
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25791

Differential Revision: D17452312

Pulled By: izdeby

fbshipit-source-id: 6276a011a373cd7cb144f9ecd84116aa206e7d1b
2019-09-18 13:41:34 -07:00
18eb92e2af Add support for lists for prim::min and prim::max
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26155

Differential Revision: D17455540

Pulled By: Krovatkin

fbshipit-source-id: e3aee465d108b59691d6c68f85fbf212a5d6a125
2019-09-18 13:39:08 -07:00
76fb909beb Change "named_guard" in native_functions to "supports_named_tensor" (#26352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26352

"named_guard: P" is the same as "supports_named_tensor: !P".
Also changed the error message to be more understandable to users.

Test Plan:
- `TEST_NAMEDTENSOR=1 pytest test/test_namedtensor.py -v`
- [namedtensor ci]

Differential Revision: D17426234

Pulled By: zou3519

fbshipit-source-id: 4cab780e6e29e184e79cdd3690f41df9ebb2ecb5
2019-09-18 12:28:16 -07:00
ecb82ed5a2 clean up the PR job script for iOS build (#26353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26353

### Summary

As the new iOS building script has been landed, this PR will clean up some redundant code for the PR jobs.

### Test Plan

- Don't break any existing CI jobs
- Don't break the old iOS CI jobs

Test Plan: Imported from OSS

Differential Revision: D17457253

Pulled By: xta0

fbshipit-source-id: 0d85117533a62d0b9b7b859b0044fd4388c3c9d4
2019-09-18 12:21:19 -07:00
2801df5ba1 Add a float version of calc_erfinv (by templating) on CPU (#26070)
Summary:
Currently calc_erfinv's float version on CPU is missing. This commit adds the float version (by templating).

I also used this opportunity to clean up calc_erfinv a bit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26070

Reviewed By: ezyang

Differential Revision: D17368024

Pulled By: VitalyFedyunin

fbshipit-source-id: 00cc3097f340022b3788143e6c12b01c35d72f13
2019-09-18 11:40:40 -07:00
b0b0f2c65f Make ProcessGroupAgent take num_send_recv_threads as constructor argument (#26313)
Summary:
# Problem

If there is not enough number of thread in the RPC Agent thread pool. Some circular dependent works could cause deadlock.

The current to way to get around this deadlock is to provide abundant number of threads.

# Solution

as titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26313

Differential Revision: D17405491

Pulled By: xush6528

fbshipit-source-id: a1d9b6a84db0371cd4b63328fa00f651c0808485
2019-09-18 10:36:29 -07:00
388cfdf2ac Removes torchtest, expands generic device testing (#26374)
Summary:
- Removes torchtest
- <s>Moves test_torch tests skipped on ROCm to generic device test class</s>
- Creates test_nn generic device test class

Next: adding dtypes to generic device testing framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26374

Test Plan: Change is to tests themselves.

Differential Revision: D17442218

Pulled By: mruberry

fbshipit-source-id: d7e4451d09fc9049478b35a7efb8bb580071e8c8
2019-09-18 10:24:50 -07:00
ed09704899 use allgatherv for sparse all reduce (#23917)
Summary:
per https://github.com/pytorch/pytorch/issues/22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes.

close https://github.com/pytorch/pytorch/issues/22226
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23917

Test Plan:
buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics

buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda

buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks

Differential Revision: D16664985

Pulled By: zhaojuanmao

fbshipit-source-id: e7d3c0770cbc09f9175b3027b527e95053724843
2019-09-18 09:57:45 -07:00
98ccae09af C++ API parity: at::Tensor::grad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26150

Test Plan: Imported from OSS

Differential Revision: D17427579

Pulled By: pbelevich

fbshipit-source-id: 68d012076aa86dee9f23fad71a2d265d75f56d22
2019-09-18 09:20:38 -07:00
72aeafd3d0 Fix no tab check (#26399)
Summary:
ignore the folder and its children ios/TestApp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26399

Differential Revision: D17451239

Pulled By: houseroad

fbshipit-source-id: d6ba666bf955454eca4a10c00784ee5947a70f59
2019-09-18 09:11:32 -07:00
b8ae4d0f1c Resolve #25605 cyclic reference in _LRScheduler (#25776)
Summary:
Cyclic reference was introduced in a previous version due to runtime overwriting of the bound method `optimizer.step`. This is now avoided by keeping a weak reference to the optimizer instance.

Credit: https://stackoverflow.com/questions/26157952/why-set-a-bound-method-to-python-object-create-a-circular-reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25776

Differential Revision: D17420770

Pulled By: ezyang

fbshipit-source-id: 546ec94cf725ebfddb310b24e6a2e146ddecd1f6
2019-09-18 06:08:35 -07:00
bae7528479 Change '*' to '...' and ... for named tensor API functions. (#26350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26350

Python 3 lets us use `...` to perform indexing. Semantically, `...`
means "the rest of the unspecified dimensions". For example, while
indexing, one can do (for 5D `tensor`) `tensor[0, 0, ..., 0]` and
the `...` is expanded into `tensor[0, 0, :, :, 0]`.

Previously, we were using '*' to represent a similar behavior in names.
For example, `tensor.refine_names` supports things like the following:

```
x = torch.randn(2, 3, 4, 5, 6)
x_out = x.refine_names('*', 'H', 'W')  # refine only the last two
dimensions
```

This PR changes it so that named tensor API functions recognize `'...'`
(in Python 2 and Python 3) and `...` (in Python 3 exclusively) instead
of `'*'`.

Test Plan: - [namedtensor ci]

Differential Revision: D17424666

Pulled By: zou3519

fbshipit-source-id: 003182879fd38ced3fea051217572a457cdaf7cf
2019-09-18 05:47:13 -07:00
277d442d18 Rename torch.namedtensor -> torch._namedtensor_internals (#26349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26349

The directory holds a lot of private helper functions that help
implement named tensor functionality. Instead of naming each helper
function with a leading underscore, I change the name of the import to
`_namedtensor_internals` to signal it should not be used directly.

Test Plan: - [namedtensor ci]

Differential Revision: D17424178

Pulled By: zou3519

fbshipit-source-id: 8f7b74346765759303480e581038a661021acf53
2019-09-18 05:47:09 -07:00
f341291bfb Support unpickle py2 NetDef object in py3 (#26147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26147

We may try to unpickle a byte string in py3 that was pickled from py2. Therefore we need to add encoding latin1.

Reviewed By: kennyhorror

Differential Revision: D17305677

fbshipit-source-id: c0c8a51909629a65eb72bb81cccfbabaee9f8d01
2019-09-18 02:02:34 -07:00
f2e9622ed8 Add l2 norm minimization (#24022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24022

In histogram observer add an approximation for L2 error minimization for selecting min/max.
By selecting new min/max, we filter out outliers in input distribution.

This follows the implementation of NormMinimization::NonlinearQuantizationParamsSearch in caffe2/quantization/server/norm_minimization.cc
ghstack-source-id: 90298789

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'

Differential Revision: D16713239

fbshipit-source-id: 82631ba47974e25689c9c66bc3088117090e26d4
2019-09-18 00:07:10 -07:00
0038111019 Implement named tensor unflatten(dim, namedshape). (#25658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25658

This unflattens `dim` according to the shape specified in `namedshape`.
`namedshape` may be either an OrderedDict or an iterable of (name, size)
tuples.

Future:
- It is possible to make it take a dict in Python >= 3.6 because those are
ordered by default, but I'll leave that task for the future.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17192655

Pulled By: zou3519

fbshipit-source-id: fd9bd2f462c23a4df1c23d66f2aa95076ff1b160
2019-09-17 21:24:25 -07:00
f6203a88a3 enable xla cpp tests in CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26347

Differential Revision: D17430034

Pulled By: ailzhang

fbshipit-source-id: 4d3a07617a37aa2d1ddf4fd874c0a678c716bf3e
2019-09-17 21:14:29 -07:00
61197e94b3 Remove torch.save-related logic from pickler (#25502)
Summary:
The Pickler previously had a distinction between tensors that would be inlined in 1 pickle binary (matching the format of `torch.save()`) and tensors that are saved elsewhere with only a reference stored in the binary. This PR moves that distinction out to `torch::pickle_save` to match the eager Python interface.

The change can be seen in `register_prim_ops.cpp` where the call to `jit::pickle` is now `torch::pickle_save`
](https://our.intern.facebook.com/intern/diff/17175215/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25502

Pulled By: driazati

Differential Revision: D17175215

fbshipit-source-id: 8c9a21327cc79eaf6a0e488ea99e305be52f82b1
2019-09-17 20:38:13 -07:00
acb300fd6b Split PyTorch ROCm tests as 2 CI jobs to run in parallel (#26380)
Summary:
ROCm CI jobs are running on Jenkins. They have the "-test{1,2}" parts in "JOB_BASE_NAME", not "BUILD_ENVIRONMENT".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26380

Differential Revision: D17439523

Pulled By: bddppq

fbshipit-source-id: 31e2a986d1b7ea40c90ab399a3c1e0a328ae3a92
2019-09-17 20:31:29 -07:00
193a6a6f98 Revert D17431514: [pytorch][PR] fix schema matching of tuples to vartype lists
Test Plan: revert-hammer

Differential Revision:
D17431514

Original commit changeset: 2ad98bab15ea

fbshipit-source-id: 5cf445fd1e37629c700b9b3740fe13ca941e4db9
2019-09-17 17:23:12 -07:00
bb1efb3bee Adding quantized::linear function for pytorch mobile in c10 (#26135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135

This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected)
It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten
I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same.

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack

Imported from OSS

Differential Revision: D17434885

fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd
2019-09-17 16:16:39 -07:00
59002bb095 Kill if_true / if_false in Declarations.cwrap.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26346

Test Plan: Imported from OSS

Differential Revision: D17421345

Pulled By: gchanan

fbshipit-source-id: 03b3c61edc13994d96b1d60648da7335fb090531
2019-09-17 15:36:02 -07:00
a06e1c3af7 min(li) max(li) (#26351)
Summary:
Add min and max of a list to JIT. Fixes https://github.com/pytorch/pytorch/issues/26036
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26351

Differential Revision: D17427547

Pulled By: eellison

fbshipit-source-id: 45796b4076eef0b496b01c2cc710ec4dc15a1ee6
2019-09-17 14:50:33 -07:00
be976413f7 Skip testing triangular_solve_batched on non-default CUDA stream (#26115)
Summary:
This is for testing purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26115

Differential Revision: D17433122

Pulled By: zou3519

fbshipit-source-id: bf41327e6141e9ae589fcf18254c2a8cdd868dd7
2019-09-17 14:45:53 -07:00
71d3457a1f Fix compiler unwrapping step in jenkins build scripts for Caffe2/PyTorch on ROCm (#25409)
Summary:
Fix the regex (requires enabling extglob) for two digit clang releases.

While there, also fix it for three digit releases with the hope that I
do not need to touch it for some time.

Unfortunately, this regex requires extglob to be enabled in the shell.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25409

Differential Revision: D17431786

Pulled By: bddppq

fbshipit-source-id: a50b2ff525d9b6046deae9c8725c92d67119599a
2019-09-17 13:50:42 -07:00
a8073f34af fix schema matching of tuples to vartype lists (#25944)
Summary:
In schema matching we allow a homogenous tuple to be matched to list arguments. This logic wasn't yet extended for vartype lists, causing stuff like `len((1, 2, 3))` to fail.

Fix for https://github.com/pytorch/pytorch/issues/20500
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25944

Differential Revision: D17431514

Pulled By: eellison

fbshipit-source-id: 2ad98bab15eaa496471df651572735eb35183323
2019-09-17 13:47:46 -07:00
9181b9c73e Enable basic GPU profiling capability on ROCm. (#26300)
Summary:
Inserting markers using the nvtx-equivalent API is not supported yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26300

Differential Revision: D17425573

Pulled By: bddppq

fbshipit-source-id: 4df6c695ba07ab68e7f4dc2f77edde06f78fdac7
2019-09-17 12:11:27 -07:00
b63f8ef2c9 Rebase CircleCI to master if it is gcc5_4 (#26321)
Summary:
This is the first step of adding CI for bc breaking changes detection of function shcemas.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26321

Reviewed By: hl475

Differential Revision: D17425468

Pulled By: houseroad

fbshipit-source-id: b4bb36e5597043407c943b5b8dfe2b1ac3248cb2
2019-09-17 12:04:15 -07:00
cc61af3c3d Add iOS test app skeleton (#26261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26261

### Summary

Previously we have enabled the CI jobs for Pull Requests and nightly build

-  **#25840 [iOS][Circle CI] Add PR jobs for iOS builds**
-  **#26074 [IOS][CIRCLE CI] Nightly jobs for iOS builds**

The testing phase is missing in the nightly build process.  Although we are able to generate the build and upload it to the AWS,  there is no way to know whether the binary is valid or not (there could be a linking error). To add the test phase to the process, we need

1. Put a dummy test App in the repo.
2. After the build jobs finishes, manually link the static libs to the dummy app to produce an executable using the xcode tool chain.
3. If there is no linking error, then upload the binaris to AWS. If there is an error, then stops the following process and reports an error in CI.

The second and third steps depends on the first step which needs to be landed first.

### Test Plan
- Don't break any existing CI jobs

Test Plan: Imported from OSS

Differential Revision: D17408929

Pulled By: xta0

fbshipit-source-id: e391da242639943005453d1318795f981034cc72
2019-09-17 11:06:57 -07:00
0ad8c679ae Enable support for dilated convolutions (#26205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26205

Enabling quantized dilated convolutions.

test:quantized

```
Summary (total time 14.01s):
  PASS: 43
  FAIL: 0
  SKIP: 5
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```
ghstack-source-id: 90244587

Test Plan: buck test mode/dev caffe2/test:quantized

Differential Revision: D17375370

fbshipit-source-id: cff0ba9a77cabac3ad164b2e133bfa466865afd4
2019-09-17 10:55:23 -07:00
3ce2ceca05 fix ctc_loss argument check error message (#26325)
Summary:
Was confused by the wrong message while debugging.
Turns out cpu version is wrong on comparison direction and gpu version is printing wrong number in addition to that.
This fix should make the error message correct.

jjsjann123 for tracking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26325

Differential Revision: D17408969

Pulled By: soumith

fbshipit-source-id: 0d9330e00aaabcb3e8e893b37a6a53fb378171c5
2019-09-17 10:48:37 -07:00
a76403f609 Revert D17367016: [pytorch][PR] Enabled bfloat16 dtype on CUDA
Test Plan: revert-hammer

Differential Revision:
D17367016

Original commit changeset: 7e6ae7c6aa4e

fbshipit-source-id: 6ca4e1dec5357232e224bf6d6f957ac80005c77c
2019-09-17 10:39:59 -07:00
958d627288 Remove dead function (#26259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26259

This wasn't called from anywhere (confirmed by grep)
ghstack-source-id: 90222268

Test Plan: waitforsandcastle

Differential Revision: D17391417

fbshipit-source-id: 77c395f2f7104995f6af6e3e20d3f615223085b3
2019-09-17 10:31:38 -07:00
2470031f33 Fixed size arrays (#23695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23695

Fix JIT schema inference for fixed sized arrays like `int[3]` and move corresponding ops to the c10 dispatcher.

ghstack-source-id: 90222271

Test Plan: waitforsandcastle

Differential Revision: D16611697

fbshipit-source-id: b20a479ffcd8fe8421b11bb259802745923e3b0d
2019-09-17 10:31:34 -07:00
2b20ba7bb4 Move more ops to c10 (#26255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26255

Add some more ops that work fine without needing fixes
[namedtensor ci]
ghstack-source-id: 90222272

Test Plan: unit tests

Differential Revision: D17390980

fbshipit-source-id: 0eeae69a409a8cfd9195b71053c1f6202ddd3509
2019-09-17 10:31:30 -07:00
57a4b7c55d Re-organize C++ API torch::nn folder structure (#26262)
Summary:
This PR aims to re-organize C++ API `torch::nn` folder structure in the following way:
- Every module in `torch/csrc/api/include/torch/nn/modules/` (except `any.h`, `named_any.h`, `modulelist.h`, `sequential.h`, `embedding.h`) has a strictly equivalent Python file in `torch/nn/modules/`. For  example:
`torch/csrc/api/include/torch/nn/modules/pooling.h` -> `torch/nn/modules/pooling.py`
`torch/csrc/api/include/torch/nn/modules/conv.h` -> `torch/nn/modules/conv.py`
`torch/csrc/api/include/torch/nn/modules/batchnorm.h` -> `torch/nn/modules/batchnorm.py`
`torch/csrc/api/include/torch/nn/modules/sparse.h` -> `torch/nn/modules/sparse.py`
- Containers such as  `any.h`, `named_any.h`, `modulelist.h`, `sequential.h` are moved into `torch/csrc/api/include/torch/nn/modules/container/`, because their implementations are too long to be combined into one file (like `torch/nn/modules/container.py` in Python API)
- `embedding.h` is not renamed to `sparse.h` yet, because we have another work stream that works on API parity for Embedding and EmbeddingBag, and renaming the file would cause conflict. After the embedding API parity work is done, we will rename `embedding.h` to  `sparse.h` to match the Python file name, and move the embedding options out to options/ folder.
- `torch/csrc/api/include/torch/nn/functional/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/functional/pooling.h` contains the functions for pooling, which are then used by the pooling modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`.
- `torch/csrc/api/include/torch/nn/options/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/options/pooling.h` contains MaxPoolOptions, which is used by both MaxPool modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`, and max_pool functions in `torch/csrc/api/include/torch/nn/functional/pooling.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26262

Differential Revision: D17422426

Pulled By: yf225

fbshipit-source-id: c413d2a374ba716dac81db31516619bbd879db7f
2019-09-17 10:07:29 -07:00
caed485873 Turn on BUILD_NAMEDTENSOR permanently (#26060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26060

This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan: - [namedtensor ci]

Differential Revision: D17331635

Pulled By: zou3519

fbshipit-source-id: deed901347448ae2c26066c1fa432e3dc0cadb92
2019-09-17 08:25:00 -07:00
1accc38b75 Enabled bfloat16 dtype on CUDA (#26148)
Summary:
Enabled basic functionality for bfloat16 dtype on CUDA.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26148

Differential Revision: D17367016

Pulled By: izdeby

fbshipit-source-id: 7e6ae7c6aa4e21f076d8b70b91e26b50063c6875
2019-09-17 08:17:36 -07:00
19b4314f30 Fix typo (#26298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26298

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17420724

Pulled By: ezyang

fbshipit-source-id: b8e651d0dfe7abec5615e849bdd5d1a19feb7b40
2019-09-17 08:02:11 -07:00
a3915bdb9d Replace simple if_true / if_false cases in Declarations.cwrap. (#26285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26285

I renamed:
THTensor_(std / var) -> THTensor(std_single / var_single)
THTensor(stdall / varall) -> THTensor(std_all, var_all)

because I reversed the meaning of the bias/unbiased parameters (to match ATen) and type checking wouldn't catch failures.

Test Plan: Imported from OSS

Differential Revision: D17397227

Pulled By: gchanan

fbshipit-source-id: 244fe878d4e1045620137c00fbaea6e6f919fc8d
2019-09-17 07:49:30 -07:00
e5d9a5e5be Fix typo in docs.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26263

Differential Revision: D17397190

Pulled By: ezyang

fbshipit-source-id: 62e3c4c3021c728a3314262528579676d605a81e
2019-09-17 07:46:49 -07:00
1b4951d3a5 Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104)
Summary:
Follow-up to gh-25483, more of the same fixes for warnings like:

```
../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* (*)(THPVariable*)’ {aka ‘_object* (*)(THPVariable*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
  503 |   {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines.

`clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104

Differential Revision: D17396831

Pulled By: ezyang

fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7
2019-09-17 07:43:37 -07:00
30f31c66ba Kill declared_type and ignore_check from THFormal. (#26284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26284

They aren't used anymore.

Test Plan: Imported from OSS

Differential Revision: D17397182

Pulled By: gchanan

fbshipit-source-id: 3f1cc0fd12aa8f8589548640421b206fa7c571e1
2019-09-17 07:40:33 -07:00
925131a85e Fix race in CUDA initialization (#25788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25788

Previously, I thought that _lazy_init held the GIL throughout initialization, so
I could write the code in a single-threaded manner.  This is not true; it
releases the GIL at various points, which make it possible for another thread to
race with initialization.

The correct fix is to add locking for the initialization section, so other
threads wait until the first thread finishes initializing before being let
in.  There is some subtlety with how to handle lazy calls, which will call
_lazy_init reentrantly; this is handled using TLS that lets you know if you
are the initializing thread (and therefore reentrant calls are OK.)

Fixes #16559

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17366348

Pulled By: ezyang

fbshipit-source-id: 99b982709323e2370d03c127c46d87be97495916
2019-09-17 07:40:29 -07:00
2ce8c83f67 Enable CPU fused kernel on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25578

Differential Revision: D17397156

Pulled By: ezyang

fbshipit-source-id: b243528c2bfd5a0d401897833048429e67fe40ef
2019-09-17 07:29:40 -07:00
bebc3d6aad Automatic update of fbcode/onnx to 1316afc9f972f81340faa05763e2898f38bcc3b0 (#26309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26309

Previous import was 95252c2adec185e305e34486c6756ece9aa8f57f

Included changes:
- **[1316afc9](https://github.com/onnx/onnx/commit/1316afc9)**: Update IR doc to clarify initializers are permitted as node inputs (#2320) <G. Ramalingam>
- **[5e920d0c](https://github.com/onnx/onnx/commit/5e920d0c)**: Avoid uses of special chars (#2315) <Wei-Sheng Chin>
- **[2fa08b0f](https://github.com/onnx/onnx/commit/2fa08b0f)**: Regenerate ONNX proto and add release date to ver 6 IR (#2316) <Wei-Sheng Chin>
- **[adf9c7a3](https://github.com/onnx/onnx/commit/adf9c7a3)**: Add description of default type about y_zero_point (#2110) <Takeshi Watanabe>
- **[ee7072c7](https://github.com/onnx/onnx/commit/ee7072c7)**: Support make_attribute empty string (#2129) <shjwudp>
- **[f913b6e7](https://github.com/onnx/onnx/commit/f913b6e7)**: More unsqueeze tests (#2200) <James Allingham>
- **[57b51937](https://github.com/onnx/onnx/commit/57b51937)**: Fix resize shape inference issue in opset10 (#2294) <Bowen Bao>
- **[d7595f34](https://github.com/onnx/onnx/commit/d7595f34)**: Sequence related ops (#2249) <Bowen Bao>
- **[599f3da9](https://github.com/onnx/onnx/commit/599f3da9)**: Add helper function update_inputs_outputs_dims to tools (#2148) <Bowen Bao>
- **[3e6382bc](https://github.com/onnx/onnx/commit/3e6382bc)**: Update documentation about required input output types (#2310) <G. Ramalingam>
- **[0c765d9b](https://github.com/onnx/onnx/commit/0c765d9b)**: Shape inference for NMS (#2269) <Hariharan Seshadri>
- **[89266710](https://github.com/onnx/onnx/commit/89266710)**: Fix extra collect_snippets warning (#2277) (#2307) <Lutz Roeder>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17403954

fbshipit-source-id: 78a9c3ecf5aa7f7a0ba8ea30286eab61ee903772
2019-09-17 06:46:59 -07:00
28d3eb8156 Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908

Original commit changeset: f6e961e88c01

device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device).
This diff is trying to fix this issue.

Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass

Test Plan:
1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components.
2. New unit-test.
3. Verify that previously broken benchmark is now passing

ezyang do you have suggestions what else I should test?

Reviewed By: ezyang

Differential Revision: D17281528

fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa
2019-09-17 04:01:36 -07:00
9ef86b04e5 Make TORCH_WARN_ONCE capture variables by reference (#26289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26289

It's not possible to refer to values of local variables otherwise.
ghstack-source-id: 90160797

Test Plan: The code compiles.

Differential Revision: D17397702

fbshipit-source-id: 49c74c44c88f197264603e4978e3d60bf199f6ac
2019-09-17 03:49:17 -07:00
8b7a12dd39 Average Pooling 3D AVX2 Implementation (#26111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26111

3D AveragePool AVX2 implementation.
ghstack-source-id: 89997917

Test Plan:
buck test mode/dev //caffe2/caffe2/quantization/server:pool_dnnlowp_op_test
```
[jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/aten/src/ATen/native] $ buck test mode/dev //caffe2/caffe2/quantization/server:pool_dnnlowp_op_test
Building: finished in 7.3 sec (100%) 9885/9885 jobs, 5 updated
  Total time: 7.8 sec
Trace available for this run at /tmp/testpilot.20190912-113555.187864.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 56b41ca91d4d1fbda32ec1f4d992fa85f9215fd1 fbpkg 8ef46ee301e64eb1aab58fe98a6a0777 at Wed Sep 11 23:27:02 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/698/t.par
Discovering tests
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174704896894
      ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - test_dnnlowp_max_pool (caffe2.caffe2.quantization.server.pool_dnnlowp_op_test.DNNLowPOpPoolTest) 0.358 1/2 (passed)
      ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - test_dnnlowp_average_pool (caffe2.caffe2.quantization.server.pool_dnnlowp_op_test.DNNLowPOpPoolTest) 0.331 2/2 (passed)
      ✓ caffe2/caffe2/quantization/server:pool_dnnlowp_op_test - main 0.000 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3659174704896894
Summary (total time 7.91s):
  PASS: 3
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: dskhudia

Differential Revision: D17346452

fbshipit-source-id: a3342b2fa22c8eaed8426a110ad7e2cc056ed373
2019-09-17 03:41:34 -07:00
2dac673861 Enable batching for pinverse (#26095)
Summary:
Changelog:
- Modify existing implementation of pinverse to support batching on inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26095

Test Plan: - Added tests in test_pinverse to test batched implementation

Differential Revision: D17408092

Pulled By: soumith

fbshipit-source-id: bba95eb193ce33a94ecfaf74da270d34b435e4af
2019-09-16 23:19:16 -07:00
81d7675301 Ensure that n is non-negative in polygamma.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26294

Differential Revision: D17416847

Pulled By: soumith

fbshipit-source-id: 17d5576e019e31e85c0308fb956524484e526cf6
2019-09-16 23:16:11 -07:00
13a07f163e fix test_arange and bump ort ci version (#26320)
Summary:
It appears to be a bug with test_arange, which wasn't revealed with older version of onnxruntime.

TLDR. The test tries to update exported onnx model to accept dynamic sized input, however it is written incorrectly such that the exported model input is still fixed sized. Meanwhile, the version of ort in CI doesn't validate if model input size matches with input data, so this error was not found.

Affecting ci in https://github.com/pytorch/pytorch/pull/25797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26320

Reviewed By: hl475

Differential Revision: D17406442

Pulled By: houseroad

fbshipit-source-id: a09ad4b925ccbed0b71342f5aaa7878e1c4a5a2d
2019-09-16 22:25:00 -07:00
dc851ab5d4 Integrate forked QNNPACK into mobile PyTorch builds. (#25844)
Summary:
Enable forked QNNPACK builds in PyTorch mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25844

Differential Revision: D17336458

Pulled By: AshkanAliabadi

fbshipit-source-id: 6ea09dd6c114b64313e9159bf7f17253bc87bfdb
2019-09-16 20:50:43 -07:00
226ee7a889 Adds generic device tests to test_autograd.py (#26248)
Summary:
- Adds new decorators for skipping on ROCm, skipping on MKL, running only on the CPU and running only on CUDA
- Makes decorator skip semantics consistent
- Adds CUDA default stream requirement to MAGMA decorator
- Creates TestAutogradDeviceType

Note this PR originally moved test_cdist, but moving it caused failures in CI. There may be an undiagnosed issue with cdist or the test. The issue does not reproduce locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26248

Test Plan: Change is to tests themselves.

Differential Revision: D17410386

Pulled By: mruberry

fbshipit-source-id: 8459df44f2a00f0e71680fbe713587a01d4b0300
2019-09-16 20:25:25 -07:00
b07991f7f5 Fix error messages; tensor creation method names with type (#26219)
Summary:
After offline discussion with dzhulgakov :
 - In future we will introduce creation of byte signed and byte unsigned dtype tensors, but java has only signed byte - we will have to add some separation for it in method names ( java types and tensor types  can not be clearly mapped) => Returning type in method names

- fixes in error messages

- non-static method Tensor.numel()

- Change Tensor toString() to be more consistent with python

Update on Sep 16:

Type renaming on java side to uint8, int8, int32, float32, int64, float64
```
public abstract class Tensor {
  public static final int DTYPE_UINT8 = 1;
  public static final int DTYPE_INT8 = 2;
  public static final int DTYPE_INT32 = 3;
  public static final int DTYPE_FLOAT32 = 4;
  public static final int DTYPE_INT64 = 5;
  public static final int DTYPE_FLOAT64 = 6;
```
```
  public static Tensor newUInt8Tensor(long[] shape, byte[] data)
  public static Tensor newInt8Tensor(long[] shape, byte[] data)
  public static Tensor newInt32Tensor(long[] shape, int[] data)
  public static Tensor newFloat32Tensor(long[] shape, float[] data)
  public static Tensor newInt64Tensor(long[] shape, long[] data)
  public static Tensor newFloat64Tensor(long[] shape, double[] data)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26219

Differential Revision: D17406467

Pulled By: IvanKobzarev

fbshipit-source-id: a0d7d44dc8ce8a562da1a18bd873db762975b184
2019-09-16 18:27:16 -07:00
448c53747a CircleCI android nightly (snapshot) build publishing (#26069)
Summary:
To publish android snapshots to sonatype repository:
1. set gradle properties SONATYPE_NEXUS_USERNAME, SONATYPE_NEXUS_PASSWORD, ANDROID_SIGN_KEY, ANDROID_SIGN_PASS
these variables are stored as context environment variables in 'org-member' circleCI context
2. gradle -p ~/workspace/android/ uploadArchives

Due to gradle bugs in version 5 uploadArchives task works correctly with gradle version 4.10.3
That is also the reason of changes  `archiveClassifier.set('sources')` -> `classifier = 'sources'` as archiveClassifier was introduced in version 5

Registering nightly build job that publishes *-SNAPSHOT version of android api

Testing:
CircleCI successful snapshot publishing run https://circleci.com/gh/pytorch/pytorch/2786503?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
Corresponding published artifacts can be seen: https://oss.sonatype.org/#nexus-search;quick~pytorch_android
<img width="1316" alt="Screenshot 2019-09-16 09 36 14" src="https://user-images.githubusercontent.com/6638825/64976167-7f447480-d865-11e9-95c5-874c5cd62b6d.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26069

Differential Revision: D17406399

Pulled By: IvanKobzarev

fbshipit-source-id: c3dc1e68f02aacbb60d21f8355f676e6e5fc2897
2019-09-16 18:07:53 -07:00
31960e8872 Add missing argument for failing function call (#26311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26311

We are currently unable to deploy models due to D16955662 changing function signature of ```quantized_lstm(``` but the function call here (https://fburl.com/diffusion/e4wrmx83) not passing the newly added ```use_dynamic``` param.

Here is the details of the error: P111215482

```
E0916 12:36:16.423516 1149877 ExceptionTracer.cpp:214] exception stack complete
terminate called after throwing an instance of 'torch::jit::script::ErrorReport'
  what():
Arguments for call are not valid.
The following operator variants are available:

  aten::quantized_lstm(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first, *, int? dtype=None) -> (Tensor, Tensor, Tensor):
  Keyword argument use_dynamic unknown.
```

This diff fixes that.

Test Plan:
Running quantization tests after.

```buck test mode/dev caffe2/test:jit -- 'test_quantization_modules \(test_jit\.TestScript\)'```

https://our.intern.facebook.com/intern/testinfra/testrun/5910974518872494

Also, currently building a package (language_technology.translation.jedi.scripts:35c3643) and testing this (f138747078).

f138771702

Reviewed By: jhcross

Differential Revision: D17404451

fbshipit-source-id: 390d2ce1ecbdd63a07a8f16c80e4c3ac25ab0a99
2019-09-16 17:04:14 -07:00
fcb100a3e0 Export round (#26126)
Summary:
Added round export in opset 11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26126

Reviewed By: hl475

Differential Revision: D17403589

Pulled By: houseroad

fbshipit-source-id: f9ac3f7602c50019b9feadda8d5d944a058c5455
2019-09-16 16:40:10 -07:00
5aff3dbaf6 Kill 'default_init', which isn't needed anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281

Test Plan: Imported from OSS

Differential Revision: D17397097

Pulled By: gchanan

fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3
2019-09-16 16:20:49 -07:00
03e3f130c6 Add derivative of cholesky_solve (#26185)
Summary:
Changelog:
- Add derivative of cholesky_solve. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26185

Test Plan:
- Added tests for cholesky_solve in test_autograd.py

Closes half of https://github.com/pytorch/pytorch/issues/4669.

Differential Revision: D17408123

Pulled By: soumith

fbshipit-source-id: f9668c8d4d758c0dc658941a8b730a17683091aa
2019-09-16 16:18:26 -07:00
a96e41b7c0 Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306)
Summary:
This will honor user's preference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306

Differential Revision: D17408030

Pulled By: soumith

fbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052
2019-09-16 16:12:29 -07:00
2b52c1d982 Dynamic quantization for bias. (#26057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057

bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm.

TODO: Add some performance numbers.

Tests:

test:quantization
```
Summary (total time 8.41s):
  PASS: 24
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8

  OMIT: 0
```

test:quantized
```
Summary (total time 13.21s):
  PASS: 43
  FAIL: 0
  SKIP: 5
    caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```
ghstack-source-id: 90166254

Test Plan:
buck test mode/dev caffe2/test:quantization

buck test mode/dev caffe2/test:quantized

Differential Revision: D17328028

fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09
2019-09-16 14:43:06 -07:00
4a947b607c Clarified ambiguous docstring in NegativeBinomial
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25923

Differential Revision: D17392848

Pulled By: soumith

fbshipit-source-id: 2833e72fe449c74dfd8273a7b1eb46c05c63d999
2019-09-16 14:38:32 -07:00
327e94f51b Add __s390x__ compiler define for s390 builds. (#26233)
Summary:
pytorch builds fail on 390 architecture because
in simd.h the ifdef macros default to an x86 asm instruction.
This patchs adds an ifdef __s390x__ to be able to build on s390.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26233

Differential Revision: D17392714

Pulled By: soumith

fbshipit-source-id: 037672bfea64fc5e52da2390d93b973534137c12
2019-09-16 14:31:51 -07:00
06c69ad8ed Whiltelist and fusion support for quantized::linear - matmul(with bias) (#26204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26204

Support quant fusion for `matmul` with bias to `quantized::linear`.

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380073

fbshipit-source-id: 00014469a852cc5d5b66469fc4b8d05eafba1e3e
2019-09-16 14:05:50 -07:00
6f87a1891e Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26260

Differential Revision: D17391902

Pulled By: bddppq

fbshipit-source-id: 89ab3dedf05ba398acb7300fac95f03cfb31f0ba
2019-09-16 13:24:31 -07:00
ffbffb69c6 Kill defaults in nn.yaml. (#26282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26282

Since this isn't the end-user API anymore, we shouldn't have defaults.

Test Plan: Imported from OSS

Differential Revision: D17397153

Pulled By: gchanan

fbshipit-source-id: d44040bec0ee9c70734a53ebcc10a96f12226a29
2019-09-16 12:22:55 -07:00
6df70db807 Disable broken unit tests (#26301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301

-
ghstack-source-id: 90176419

Test Plan: waitforsandcastle

Differential Revision: D17400971

fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f
2019-09-16 12:12:39 -07:00
f43a2c9c2f Add ProcessGroupGloo::createDefaultDevice (#26166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26166

There were 2 variants to create a new device. One to do so based the
name of a network interface, and one to do so based on a hostname or
address. In the latter, if the address was not specified, it would
lookup the local hostname and try to resolve that. If that failed, the
process would crash.

In this default path, we now try to lookup and use the local hostname,
and if that fails we fallback to using the loopback address.

If the local hostname doesn't resolve to an address that we can bind
to, it is very likely that this process won't join other processes
over the network, and that the user is trying to run a local test.

If this assumption is wrong, the user can override the default
interface selection by setting the environment variable
`GLOO_SOCKET_IFNAME` to the name of the external network interface.

I tested this by changing the local hostname to a bogus name and
confirmed that default initialization works as expected.

Closes #26049.

Test Plan: Imported from OSS

Differential Revision: D17397898

Pulled By: pietern

fbshipit-source-id: 95a2467761d89df87b520d6e5837b92184b0dc12
2019-09-16 12:00:43 -07:00
7a7425cc48 Updating submodules
Summary:
GitHub commits:

653434b898
b74fbefc1a
9bd5fce6e8
6efcef720f
cb7830b6b3
53f0c0d175

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 78d0e24f5601aa990391a2404ae9d23b325de93f
2019-09-16 11:44:28 -07:00
fd3cc36fab Whiltelist and fusion support for quantized::linear - matmul(without bias) (#26209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26209

Support quant fusion for `matmul`(without bias) -> `quantized::linear`

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380075

fbshipit-source-id: 290caee7f7bcf94d2731c0ee9bd40054f0fb9b07
2019-09-16 11:33:48 -07:00
f95d2b61d1 Whiltelist and fusion support for quantized::linear - addmm (#26208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26208

Supporing `addmm` -> `quantized::linear` quant fusion

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380074

fbshipit-source-id: fae88f118f85663d777648695768b0504ed7ccf9
2019-09-16 10:48:20 -07:00
c92ed8dd44 Move the CUDA implementation of round to ATen. (#25041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041

Fix #24617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041

Test Plan: Imported from OSS

Differential Revision: D17114368

Pulled By: VitalyFedyunin

fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d
2019-09-16 09:54:30 -07:00
b6d1105eb6 Enabled conv methods for the bfloat16
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26167

Differential Revision: D17367728

Pulled By: izdeby

fbshipit-source-id: 0a7bd9a6dbc15815af195d644c9372af2135e93a
2019-09-16 09:47:42 -07:00
4e538ebcf3 Migrate away from using Variable( in test_nn.py (#26077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26077

As per #26071, we would like to get rid of the calls to Variable(
where possible. This diff removes the calls in the test file test_nn.py. The
unit tests should all still pass as expected.
ghstack-source-id: 90086624

Test Plan: tests in `test_nn.py` should all pass.

Differential Revision: D17336484

fbshipit-source-id: 43fc7bd0b0be835ae89d06162ce1cbe4e0056d91
2019-09-16 09:37:54 -07:00
c006356034 fix hypothesis timeout (#26280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26280

ghstack-source-id: 90160270

Test Plan: testinprod

Differential Revision: D17396861

fbshipit-source-id: ee2348ffa7f6092e2c5647a42d0e17879dcfacd0
2019-09-16 09:09:44 -07:00
38b2bc1451 Upgrade MKLDNN to v0.20.5 (#25757)
Summary:
1. Fix issues exposed by below posts.
https://github.com/pytorch/pytorch/issues/25242
https://github.com/pytorch/pytorch/issues/25101
https://github.com/pytorch/pytorch/issues/23825
2. Fix RNN support issue in mkldnn-bridge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25757

Differential Revision: D17367948

Pulled By: VitalyFedyunin

fbshipit-source-id: d8430d3909ecbf853afa0ce3d968735f86f1da31
2019-09-16 09:01:56 -07:00
df9d8f9032 Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065)
Summary:
see title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065

Differential Revision: D17392851

Pulled By: soumith

fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf
2019-09-16 07:22:31 -07:00
24ae9b5040 Fix binary size of OpsAlreadyMovedToC10.cpp (#26237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26237

Calling a lot of `std::string` constructors is horrible for binary size, see t53997334.

Using `const char*` instead should make the binary size much smaller.
ghstack-source-id: 90145501

Test Plan: size checks on the diff

Differential Revision: D17386002

fbshipit-source-id: c5420adf225e535396e806a0df92419a7e2ad3e8
2019-09-16 00:28:23 -07:00
976cefdb41 Switch to the new profiler infrastructure (#26174)
Summary:
The ones supported going forward are rocprofiler and roctracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26174

Differential Revision: D17387538

Pulled By: bddppq

fbshipit-source-id: 19d9828d9d07b5073ab5fa288e24fd65a8b18b52
2019-09-15 17:52:18 -07:00
91fc6f3b94 Fix namedtensor ci (#26257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26257

In native_functions.yaml, all overloads must have unique overload names.
This PR fixes `flatten` to have unique names for the overloads.

Test Plan: - tested locally, but also [namedtensor ci]

Differential Revision: D17391243

Pulled By: zou3519

fbshipit-source-id: aaef654953b4275c43b9d7bd949c46bd011f6c73
2019-09-15 17:41:30 -07:00
31139b5f9a Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252

Original commit changeset: 1375774f24c2

Testing to see if this is somehow the source of hangs on ROCm builds.

Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however.

Differential Revision: D17390575

fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6
2019-09-15 13:42:25 -07:00
21ba320cd5 Fix CI (#26250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26250

Exclude some ops from the c10 dispatcher that don't work with it yet.
ghstack-source-id: 90138046

Test Plan: waitforsandcastle

Reviewed By: zou3519

Differential Revision: D17390117

fbshipit-source-id: a87fb3048aeba2c3293b95d610ddb8e94369f8fe
2019-09-15 12:15:40 -07:00
a2e5445fcf Fix Windows build (#26246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26246

Broken due to https://github.com/pytorch/pytorch/issues/12117. Try fixing it.
ghstack-source-id: 90137033

Test Plan: waitforsandcastle

Reviewed By: zou3519

Differential Revision: D17387317

fbshipit-source-id: 705998c0b1608668d510b47f4fe20cecf5057c5f
2019-09-15 11:24:21 -07:00
b6b2b4c18f Refines test_torch.py generic device testing (#26244)
Summary:
- Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests
- Changes "SkipIf" input semantics for consistency
- Removes torchtest, which has been replaced with this new generic framework
- Refactors some common parts out of CUDA tests to TestTorchDeviceType
- Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244

Differential Revision: D17389060

Pulled By: mruberry

fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8
2019-09-15 03:35:23 -07:00
26d537d744 Remove unboxedAutogradKernel from c10 (#26130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26130

Since we now just use TensorTypeId::VariableTensorId, there's no need to treat autograd kernels any differently.
ghstack-source-id: 90130457

Test Plan: unit tests

Differential Revision: D17353873

fbshipit-source-id: d4468506a5366bc5e7429144b090b3e78af9de62
2019-09-15 01:18:11 -07:00
0e30e6570d Call aten ops through c10 dispatcher (#23668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668

- The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch().
- These operators aren't registered with globalAtenDispatch anymore, only on c10 now.
- Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them.

ghstack-source-id: 90130455

Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit#

Differential Revision: D16603133

fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82
2019-09-15 01:18:07 -07:00
e86d99ae88 Use MIOpen for transpose convolutions (#26172)
Summary:
Provides significant performance uplift where used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26172

Differential Revision: D17374862

Pulled By: bddppq

fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a
2019-09-14 23:23:53 -07:00
df338f80a6 Add a wrapper for inspect in JIT to produce better error message (#25415)
Summary:
If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415

Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message.

Differential Revision: D17118238

Pulled By: dzhulgakov

fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0
2019-09-14 21:27:51 -07:00
7f3c423541 Add type hint for cuda.set_rng_state (#26200)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26199
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200

Differential Revision: D17386885

Pulled By: soumith

fbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef
2019-09-14 19:29:42 -07:00
b4b8f53a5d Ports most of test_torch.py to generic device type framework (#26232)
Summary:
This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written.

One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232

Test Plan:
While this PR edits the tests itself, it was validated using two independent methods:

(1) The code was reviewed and it was verified that all deleted functions were actually moved.
(2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR.

Differential Revision: D17386370

Pulled By: mruberry

fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219
2019-09-14 17:10:47 -07:00
9f6b6b8101 Back out "[quant][observer] Add histogram observer" (#26236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236

Original diff broke oss CI. Reverting.

Original commit changeset: 0f047d3349cb
ghstack-source-id: 90125990

Test Plan: testinprod

Reviewed By: hx89

Differential Revision: D17385490

fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a
2019-09-14 12:48:46 -07:00
3051e36e05 Remove armv7s build from iOS (#26222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222

### Summary

The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices.

### Test plan

- CI finishes successfully
- Builds can be run only on X86_64 and arm64 devices

Test Plan: Imported from OSS

Differential Revision: D17385308

Pulled By: xta0

fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34
2019-09-14 11:07:37 -07:00
5f9cbfa1d6 Added possible out of shared memory error message (#25730)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/5040
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25730

Differential Revision: D17226214

Pulled By: pbelevich

fbshipit-source-id: 92278272aab74e6690f14fc9597acfd1a98854b7
2019-09-14 05:27:48 -07:00
4160b8cd77 adds sync to flaky test_events_multi_gpu_query (#26231)
Summary:
This test can sometimes fail in CI.

I suspect this flakiness is because the test asks a CUDA stream to record an event, fails to synchronize the CPU with that stream, then checks if the event is recorded on the CPU. There is no guarantee this will have happened.

This one-line change preserves the intent of the test while ensuring the GPU has recorded the event before the CPU queries it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26231

Differential Revision: D17382110

Pulled By: mruberry

fbshipit-source-id: 35b701f87f41c24b208aafde48bf10e1a54de059
2019-09-14 00:34:44 -07:00
fbf991d062 Creates generic device type testing framework (#25967)
Summary:
This PR addresses https://github.com/pytorch/pytorch/issues/24851 by...

1. lets device types easily register themselves for testing
2. lets tests be written to run on multiple devices and with multiple dtypes
3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest

It refactors three tests from test_torch.py to demonstrate how to use it.

`test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument.

`test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`.

`test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run.

These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination.

See the note "Generic Device-Type Testing" for more detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967

Differential Revision: D17381987

Pulled By: mruberry

fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb
2019-09-13 23:34:28 -07:00
dc6939ebff Add isBackwardCompatibleWith for Argument and FunctionSchema (#23409)
Summary:
we intend to be conservative, and will relax the checks in future if necessary.
So far, we consider the following three conditions as backward compatible:
   1) two schemas are equal
   2) two schemas have same number of arguments, and this schema's
      arguments are backward compatible with the corresponding ones in
      argument list of old_schema.
   3) this schema has m argument, old_argument has n argument, m > n.
      the first n arguments of this schema are backward compatible with
      the corresponding arguments of old_schema. the remaning arguments
      must be either OptionalType or provide default values.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23409
ghstack-source-id: 90111021

Test Plan: buck test //caffe2/test:function_schema

Reviewed By: hl475

Differential Revision: D16505203

fbshipit-source-id: e4099537776a60e8945e5c3cd57fa861f3598a9b
2019-09-13 20:37:14 -07:00
1563fdb591 Add histogram observer (#23959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959

Add histogram observer that records the running histogram of tensor values along with min/max values.
ghstack-source-id: 90076996

Test Plan:
Added a test test_histogram_observer
buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'

Differential Revision: D16692835

fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4
2019-09-13 19:24:04 -07:00
c6b75cea6e fix circle CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26225

Test Plan: Imported from OSS

Differential Revision: D17379899

Pulled By: xta0

fbshipit-source-id: 4077aa0149b23560f3a9e29531ca9bc612a2c09c
2019-09-13 18:19:41 -07:00
6d3ac7f85c use whitelist for selecting observed values (#25974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25974

Previously we observe all the Tensor values, but what we want is actually
observing only the ones that can be quantized.

Test Plan:
python test/test_jit.py
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17348986

fbshipit-source-id: 55be0d73862a0e7eb1e7fd882d16e0d830618b63
2019-09-13 15:38:31 -07:00
d250f01060 Tensor renaming to dtype, shape; support long, double (#26183)
Summary:
Applying dzhulgakov  review comments

org.pytorch.Tensor:
  - dims renamed to shape
  - typeCode to dtype
  - numElements to numel

newFloatTensor, newIntTensor... to newTensor(...)

Add support of dtype=long, double
Resorted in code byte,int,float,long,double
For if conditions order float,int,byte,long,double as I expect that float and int branches will be used more often

Tensor.toString() does not have data, only numel (data buffer capacity)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26183

Differential Revision: D17374332

Pulled By: IvanKobzarev

fbshipit-source-id: ee93977d9c43c400b6c054b6286080321ccb81bc
2019-09-13 15:18:41 -07:00
1114b05122 Updating submodules
Summary:
GitHub commits:

97631357aa
2f1477dfee

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 33029d2e8c6a3664a35823829670f6ed9dfc3b44
2019-09-13 15:09:51 -07:00
b5a3a8b427 Change the source link in podspec (#26089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26089

### Summary

A couple of changes

1. Replace the source link with the newly nightly build address
2. Remove module support for Swift and Objective-C
3. Expose all static libraries instead of archiving them into one single library. This is because those static libraries might contain object files that have the same name, e.g. `init.c.o` in both `libcupinfo.a` and `libqnnpack.a`. If we archive them into one using this `libtool -static` command, by default, it only picks one object file and discards the others, which could result in undefined symbols when linking the executable. The change here is to expose all the static libraries and let the linker decide which one to use.

### Test Plan

- pod spec lint succeed
 - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`

Test Plan: Imported from OSS

Differential Revision: D17363037

Pulled By: xta0

fbshipit-source-id: ba77b0001b58e6e2353d8379d932db598166d37d
2019-09-13 15:00:31 -07:00
16605ef2eb Nightly build for for iOS (#26074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26074

### Summary

This PR creates a nightly job for iOS builds. The job will generate a couple of static libraries that contains three architectures(x86, arm64, armv7s) and upload them to AWS s3.

### Note

The test phase in this job is missing right now, meaning if there is a linking error, we won't be able to know it. To add the test jobs, we have to put a dummy test App in the repo and manually link the libraries to the app after the build finishes. This will be done in the next following PRs

Test Plan: Imported from OSS

Differential Revision: D17363066

Pulled By: xta0

fbshipit-source-id: 5beeb4263af5722f0a852297023f37aaea9ba4b1
2019-09-13 14:24:52 -07:00
8c46061e2c Updating submodules
Summary:
GitHub commits:

83a6a614e9
c8cac64995

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 1f5bc1e065fe13d89eeb42539f21a8ab0ab8b8a1
2019-09-13 14:21:17 -07:00
8321f2592e Register ATen ops with c10 (#26131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131

Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet

These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748

Test Plan: a diff stacked on top uses these registrations to call these ops from ATen

Differential Revision: D16603131

fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb
2019-09-13 13:52:40 -07:00
cadf836cbc Allow overwriting catch-all kernels (#25947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947

Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time.
This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants.

This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered.
This is also the current behavior of globalATenDispatch.
ghstack-source-id: 90049398

Test Plan: unit tests

Differential Revision: D17293036

fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c
2019-09-13 13:52:36 -07:00
b01520ac9c Make schema part of RegisterOperators::Options (#26114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26114

With this diff, the operator schema or name can be specified as part of the options objects:

```
static auto registry = torch::RegisterOperators()
  .op(torch::RegisterOperators::options().schema("my_op").kernel(&kernel))
  .op(...);
```

This does not break backwards compatibility, all old APIs are kept as shorthands.

This (a) makes the API more consistent, accumulating all options into the options objects and not treating schema special anymore, and (b) this is required for allowing the c10 dispatcher to forward registration calls to ATenDispatch for ops that are still on that dispatcher, see plan in https://github.com/pytorch/pytorch/issues/24132
ghstack-source-id: 90049402

Test Plan: unit tests

Differential Revision: D17350383

fbshipit-source-id: cbb8f33a52dccb2a4522753e7b5ac8ba35b908fd
2019-09-13 13:52:32 -07:00
0ea59786e8 Use torch::from_blob instead of shareExternalPointer, nits (#25973)
Summary:
The main part is to switch at::Tensor creation from usage of `torch::empty(torch::IntArrayRef(...))->ShareExternalPointer(...) to torch::from_blob(...)`
Removed explicit set of `device CPU` as `at::TensorOptions` by default `device CPU`
And renaming of local variables removing `input` prefix to make them shorter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25973

Differential Revision: D17356837

Pulled By: IvanKobzarev

fbshipit-source-id: 679e099b8aebd787dbf8ed422dae07a81243e18f
2019-09-13 13:40:11 -07:00
a3f0d988d9 Revert D17349760: Change schedulers to chainable form
Test Plan: revert-hammer

Differential Revision:
D17349760

Original commit changeset: 0a6ac01e2a6b

fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715
2019-09-13 12:54:59 -07:00
43335cddb7 Fold quantize op into module (#25625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25625

We want to fold the quantize op for weights/bias into module to avoid quantizing weights on the fly.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17208889

fbshipit-source-id: 1854b8953b065855d210bc1166533c08ca264354
2019-09-13 12:27:16 -07:00
27b5a6c577 Add documentation to logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26175

Differential Revision: D17371085

Pulled By: Krovatkin

fbshipit-source-id: ea06f4e16fc320940a299e8e1d4f4d7c76f5950a
2019-09-13 12:13:16 -07:00
20124c4814 guard dyndep with a lock (#26153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153

I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro.

Test Plan: sandcastle

Reviewed By: bddppq

Differential Revision: D17361310

fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e
2019-09-13 11:38:14 -07:00
e293c4ea73 Fix 'in' return true incorrectly (#24156)
Summary:
Because of 'return NotImplemented', __contains__ return True when the element is not a number.
bool(NotImplemented) == True
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156

Differential Revision: D16829895

Pulled By: zou3519

fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f
2019-09-13 09:27:58 -07:00
079cd4e1fc Remove requests as dependency (#26083)
Summary:
local build is slow... test in CI...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083

Differential Revision: D17346949

Pulled By: ailzhang

fbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446
2019-09-13 08:39:53 -07:00
07e7c7eb9f Kill remaining defaults in Declarations.cwrap.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25612

Test Plan: Imported from OSS

Differential Revision: D17172499

Pulled By: gchanan

fbshipit-source-id: f99e813a4a90e8576541da317027e6f8ae76079b
2019-09-13 08:06:55 -07:00
10f1d3e37b Get rid of more defaults in Declarations.cwrap.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25611

Test Plan: Imported from OSS

Differential Revision: D17172493

Pulled By: gchanan

fbshipit-source-id: 0f4319f8024ac4eca62576231214227b341f56c4
2019-09-13 08:06:51 -07:00
fef2d2e3c4 Kill most defaults in Declarations.cwrap. (#25610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25610

They don't do anything anymore, since this isn't the end-user interface.

Test Plan: Imported from OSS

Differential Revision: D17172495

Pulled By: gchanan

fbshipit-source-id: a380d970f0836ed85eb9ac2aa42eb73655d775aa
2019-09-13 08:06:48 -07:00
6276958de1 Turn setup_ci_environment into command
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26163

Test Plan: Imported from OSS

Differential Revision: D17366536

Pulled By: pietern

fbshipit-source-id: 07181a77aaeba5457aa716ceac9cc404aacefe5f
2019-09-13 07:59:22 -07:00
12086a6593 Turn setup_linux_system_environment into command
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26162

Test Plan: Imported from OSS

Differential Revision: D17366537

Pulled By: pietern

fbshipit-source-id: 98413daa344812f06578c3373d8516292d2f21f5
2019-09-13 07:59:18 -07:00
0303ecf070 Turn should_run_job into command
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26160

Test Plan: Imported from OSS

Differential Revision: D17366539

Pulled By: pietern

fbshipit-source-id: a870d6da21925764986c6c748ad291440b78e6fd
2019-09-13 07:59:14 -07:00
219a04ee82 Use CircleCI commands for brew update/install (#26159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26159

The snippets for working with Homebrew were duplicated across binary
builds, macOS builds, and iOS builds. In #25336, the CircleCI
configuration version was updated to version 2.1, which supports
parameterized commands. This means we no longer have to use YAML
tricks to duplicate stanzas and instead can natively define a series
of reusable steps.

Motivation for doing this is that the macOS binary builds were still
using the slow `brew update` instead of `git fetch` (see #25988).

[test macos]
[test wheel]

Test Plan: Imported from OSS

Differential Revision: D17366538

Pulled By: pietern

fbshipit-source-id: 194c0f37c1dc999705f3ba97fdabf4ff18728d93
2019-09-13 07:59:10 -07:00
0963e1705b Run PyTorch macOS CPU-only build/test on all PRs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26096

Test Plan: Imported from OSS

Differential Revision: D17366419

Pulled By: pietern

fbshipit-source-id: 138659dae346aad3cde52d488cd1780614e7692f
2019-09-13 07:45:57 -07:00
939ae80de1 Change schedulers to chainable form (#24352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352

Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).

* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

# #20527

### Before

The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
  lr_scheduler.step(epoch)
  print(optimizer.param_groups[0]['lr'])
```

### After

If the user wants to step
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:

  # Check if epoch number has changed manually
  if epoch-last_epoch > 0:
    lr_scheduler.step()
  last_epoch = epoch

  print(epoch, scheduler.get_computed_values())
```

# #22107

### Before

```
import torch
from torchvision.models import resnet18
net = resnet18()

optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
  # Scheduler computes and returns new learning rate, leading to unexpected behavior
  print(i, scheduler.get_lr())
  scheduler.step()
```

### After

```
import torch
from torchvision.models import resnet18

net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
    # Returns last computed learning rate by scheduler
    print(i, lr_scheduler.get_computed_values())
    lr_scheduler.step()
```

Test Plan: Imported from OSS

Differential Revision: D17349760

Pulled By: vincentqb

fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f
2019-09-13 07:36:05 -07:00
2503fdc116 Add data field to Tensor pyi. (#26093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26093

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: vsiles

Differential Revision: D17366320

Pulled By: ezyang

fbshipit-source-id: 025f1c3d75d294fc1b51ddc540e542a05dc72b6a
2019-09-13 07:32:03 -07:00
babaac3e08 Fix bug with named tensors and (no) tracer support (#26106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26106

Previously, in the named tensors build, an operator is marked as
non-traceable if ANY of its overloads are named tensor overloads. This
breaks the tracer for things like torch.full (has a names= overload for
named tensor) and tensor.sum (has a Dimname overload for named tensor).

This PR fixes the problem by putting the "no tracer support" logic into
the location where the tracer attempts to construct a graph by adding a
Dimname/DimnameList argument to a node.

Test Plan:
- new test in test_jit.py to check if torch.full is traceable
- new test in test_namedtensor.py to check what happens when someone
tries to trace a function that uses named tensor APIs.
- [namedtensor ci]

Differential Revision: D17353452

Pulled By: zou3519

fbshipit-source-id: b0b843c8357ffe54baee6e8df86db914f0b1ece4
2019-09-13 06:45:00 -07:00
33221b19ac C++ API parity: at::Tensor::data
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008

Test Plan: Imported from OSS

Differential Revision: D17343488

Pulled By: pbelevich

fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d
2019-09-12 23:33:34 -07:00
5e2d25af34 Implement tensor.align_as(other), change tensor.align_to(names) (#25843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843

`tensor.align_to(*names)` permutes the dimensions of `tensor` and adds
additional 1-sized dimensions such that the output tensor has dimensions
in the same order as `names`. All dimensions of `tensor` must be
present in `names`, in addition, this function requires that all dims of
`tensor` be named.

`tensor.align_as(other)` is equivalent to
`tensor.align_to(*other.names)`.

I'm planning on changing `torch.align_tensors(*tensors)` to align closer
to these semantics because there didn't seem to be a clear use case for the old
semantics that preserve unnamed dimensions. That will come in a future
change.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17255549

Pulled By: zou3519

fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3
2019-09-12 22:53:44 -07:00
e544f88590 Implement tensor.refine_names (#25842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842

`tensor.refine_names(*names)` takes `tensor` and attempts to name its
dimensions `names` out-of-place. If a dimension `i` already had a name,
then it cannot be changed (so tensor.names[i] must equal names[i]);
if the original dimension did not have a name, then the new name
(names[i]) can be anything.

`tensor.refine_names(*names)` also accepts a glob '*' that greedily selects
names from `tensor`. Here are some examples:

- `Tensor[None].refine_names('N') -> Tensor[N]`
- `Tensor[N].refine_names('N') -> Tensor[N]`
- `Tensor[N].refine_names('D') -> Error!`
- `Tensor[N].refine_names(None) -> Error!`
- `Tensor[None, None].refine_names('*', D) -> Tensor[None, D]`

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17255548

Pulled By: zou3519

fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928
2019-09-12 22:53:40 -07:00
94964a9ba2 Add fusion for quantized linear (#25624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25624

First fuse the splitted op into aten::linear and then fuse
`dequant - aten::linear - quant` into quantized linear op

Test Plan:
python test/test_jit.py 'TestJit.quant_fusion'

Imported from OSS

Differential Revision: D17208891

fbshipit-source-id: 864b19fabab2e8e6f8f8ad35eb3dbbf2d5fdb8c4
2019-09-12 20:52:37 -07:00
e9e7e9d466 Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (#26137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26137

Previous import was 7988d8360b11e6003560076e9b1d4aa426db3244

Included changes:
- **[95252c2a](https://github.com/onnx/onnx/commit/95252c2a)**: Fix shapeinference function (#2296) <jignparm>
- **[414285bb](https://github.com/onnx/onnx/commit/414285bb)**: fix the buffer overflow problem in shape inference logic of Squeeze op <Lu Fang>
- **[797cdd0f](https://github.com/onnx/onnx/commit/797cdd0f)**: Support for negative indices in 'Gather', 'GatherElements', 'ScatterElements', 'OneHot' (#2260) <Negin Raoof>
- **[7636978d](https://github.com/onnx/onnx/commit/7636978d)**: Fix collect_snippets warnings (#2277) <Lutz Roeder>
- **[fa70c33b](https://github.com/onnx/onnx/commit/fa70c33b)**: Update printable_graph in helper.py to output details of initializers that do not have matching graph inputs. (#2135) <Scott McKay>
- **[428d09b0](https://github.com/onnx/onnx/commit/428d09b0)**: test int64 input type for 'where' op (#2253) <Negin Raoof>

Test Plan: ci

Reviewed By: bddppq

Differential Revision: D17353795

fbshipit-source-id: 6d4f39754863a30f427f4512c7b228e45d3ce84f
2019-09-12 20:49:08 -07:00
ff7921e85b Create TensorBoard test classes in all cases (#26005)
Summary:
To give better signal to the user, we will now always create the TensorBoard tests classes and just  disable tests if TensorBoard is not installed.

cc lanpa sanekmelnikov natalialunova pietern
[test macos]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26005

Reviewed By: sanekmelnikov

Differential Revision: D17352430

Pulled By: orionr

fbshipit-source-id: 87a592064f4768ffded76a3d666a8e508a1ef164
2019-09-12 19:40:35 -07:00
3acab233b5 Add device check before accessing data_ptr in PackLayer (#26056)
Summary:
fixes https://github.com/pytorch/xla/issues/927
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26056

Differential Revision: D17331859

Pulled By: ailzhang

fbshipit-source-id: bdc334f03c8dcbb4ef4f5e059a63ef188a0b8b61
2019-09-12 19:25:42 -07:00
be82239c86 Port fuse_linear from pytorch/tvm (#25623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623

Port over fuse_linear pass from pytorch/tvm project, we'll need this
in backend specific quantization pass to match aten::linear and swap
it with quantized linear

Test Plan:
python test/test_jit.py 'TestJit.test_fuse_linear'

Imported from OSS

Differential Revision: D17208890

fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116
2019-09-12 18:51:13 -07:00
18a0040fec C++ unregister_module function for Module (#26088)
Summary:
This PR adds ```unregister_module``` to ```nn::Module``` and ```erase``` function to ```OrderedDict```.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26088

Differential Revision: D17360058

Pulled By: yf225

fbshipit-source-id: f1f375b4751317da85b8da1458e092fe2405ceec
2019-09-12 18:38:57 -07:00
1d87090051 Support quantizing any methods called (#25505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25505

Support for quantizing all the methods called by forward method, including
child module methods and other methods in the current module

It relies on module level constant prop, we need to figure out a way to do constant prop
for these methods as well. We can either do constant prop in the module level or do constant
prop in the quantization function, but this will need some discussion.

Test Plan:
python test/test_jit.py 'TestJit.insert_quant_dequant'
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17208887

fbshipit-source-id: 21749457b21b00a6edada290c26324e2fb210b10
2019-09-12 18:09:44 -07:00
5fce76961c Kill kwarg_only declarations in Declarations.cwrap. (#25609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25609

They don't do anything anymore.

Test Plan: Imported from OSS

Differential Revision: D17172497

Pulled By: gchanan

fbshipit-source-id: 5cf7fdcf7d2da0054ac1bd7d8d2b70a2264b8c93
2019-09-12 17:38:48 -07:00
e2e1f5effd Fix build warning in vec256_qint.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26121

Test Plan: Imported from OSS

Differential Revision: D17351960

Pulled By: jamesr66a

fbshipit-source-id: 12389729fe5fb8d863cf47288920ea375a3e74ab
2019-09-12 17:38:44 -07:00
784c4a91ea Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (#25970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25970

ConstantThenLinearWarmupLRPolicy:
* first use a constant warm up
* then ramp up to the fixed learning rate linearly

CompositeCyclicalLRPolicy:
* first use a constant warm up
* then ramp up to the fixed learning rate linearly
* then use cyclical learning rates for the rest of time

Pull Request resolved: https://our.intern.facebook.com/intern/opensource/shipit/preview/D17302632/

Test Plan:
* buck test
 * https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5910974518377039/
 * https://our.intern.facebook.com/intern/testinfra/testrun/1407375027118303
* checked the consistency of learning rates w.r.t. iterations with offline simulations n143987

Reviewed By: swatirallapalli

Differential Revision: D17302632

fbshipit-source-id: 1098d4dd9109a48932b76e36d78239e49f8077a1
2019-09-12 17:38:40 -07:00
f559c1d85d Skip inserting duplicate observers (#25504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25504

Skip inserting duplicate observers for values observed
in forward method of a child module or other methods in
the current module.

Test Plan:
python test/test_jit.py -- 'TestJit.insert_observers'
python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig'
python test/test_jit.py -- 'TestJit.insert_observers_skip_values'

Imported from OSS

Differential Revision: D17208888

fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323
2019-09-12 16:22:51 -07:00
135bbc261d fix base_lr overridden in cyclic lr (#26105)
Summary:
base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105

Reviewed By: yf225

Differential Revision: D17346724

Pulled By: vincentqb

fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c
2019-09-12 15:53:03 -07:00
f9a8b8ada3 Stop reordering TH random function arguments.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25608

Test Plan: Imported from OSS

Differential Revision: D17172494

Pulled By: gchanan

fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a
2019-09-12 15:43:08 -07:00
369064fa0d remove "build_deps" arg from setup.py command in (#26113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26113

After https://github.com/pytorch/pytorch/pull/16914, passing in an
argument such as "build_deps" (i.e. python setup.py build_deps develop) is
invalid since it gets picked up as an invalid argument.
ghstack-source-id: 90003508

Test Plan:
Before, this script would execute "python setup.py build_deps
develop", which errored. Now it executes "python setup.py develop" without an
error. Verified by successfully running the script on devgpu. In setup.py,
there is already a `RUN_BUILD_DEPS = True` flag.

Differential Revision: D17350359

fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081
2019-09-12 15:34:21 -07:00
ffee507d36 change gradle build to use static libtorch + gc-sections (#25984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25984

Link static libtorch libraries into pytorch.so (API library for android)
with "-Wl,--gc-sections" flag to remove unused symbols in libtorch.

Test Plan:
- full gradle CI with stacked PR;
- will check final artifacts.tgz size change;

Differential Revision: D17312859

Pulled By: ljk53

fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5
2019-09-12 15:12:45 -07:00
fbc038ab35 simplify build_android_gradle.sh (#25897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897

It doesn't hurt to set all variables unconditionally.
And we can create link to lib directory instead of specific files - this
way it's easier to switch between dynamic/static library names.

Test Plan:
- check android gradle CI;
- use stack diff to check all 4 architectures on PR;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897

Differential Revision: D17307240

Pulled By: ljk53

fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a
2019-09-12 15:12:41 -07:00
771cb628eb Kill TH(C)Blas kwarg_only declarations. (#25607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25607

Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property.

Test Plan: Imported from OSS

Differential Revision: D17172500

Pulled By: gchanan

fbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844
2019-09-12 15:01:38 -07:00
ad91d0285b Stop re-ordering TH(C)Blas arguments. (#25606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25606

This just complicates the codegen for no benefit.

Test Plan: Imported from OSS

Differential Revision: D17172498

Pulled By: gchanan

fbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc
2019-09-12 15:01:34 -07:00
1eae6355d8 tracing with an opt-in by file name (#25895)
Summary:
This basically works a simple filter as you suggested ZolotukhinM

`export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements.
`export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` **and** `GRAPH_DEBUG` statements in `guard_elimination.cpp` **and** in `alias_analysis.cpp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25895

Differential Revision: D17309090

Pulled By: Krovatkin

fbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444
2019-09-12 14:16:53 -07:00
f928994968 make sure all out stringstreams start out empty in jit_log.hpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25863

Differential Revision: D17347386

Pulled By: Krovatkin

fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875
2019-09-12 12:39:10 -07:00
076eaf4ccf Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (#26080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080

Will be used in c2 ctr_mbl_feed model to PyTorch conversion

Test Plan: Unit test

Reviewed By: yinghai

Differential Revision: D17337604

fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753
2019-09-12 12:36:33 -07:00
f91fbf90c7 Skip test_triangular_solve_batched (#26108)
Summary:
cc: gchanan zou3519

I will look into why this is failing spuriously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26108

Differential Revision: D17348399

Pulled By: zou3519

fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3
2019-09-12 12:36:29 -07:00
7e4ac8b851 Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (#25959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25959

Previous import was 28ca699b69b5a31892619defca2391044a9a6052

Included changes:
- **[7988d836](https://github.com/onnx/onnx/commit/7988d836)**: Supporting negative axes for all existing onnx ops (#2281) <Negin Raoof>
- **[5ca0a09e](https://github.com/onnx/onnx/commit/5ca0a09e)**: Update managingexperimentalops.md (#1981) <Joseph Spisak>
- **[bc0495c1](https://github.com/onnx/onnx/commit/bc0495c1)**: Fix link to community docs in readme (#2261) <Prasanth Pulavarthi>
- **[2fdb3ef6](https://github.com/onnx/onnx/commit/2fdb3ef6)**: move map and sequence types to onnx domain, (#2244) <Ke Zhang>
- **[568b65aa](https://github.com/onnx/onnx/commit/568b65aa)**: Improve compatiblity with proto3 and enable reading attributes (#2288) <Dmitri Smirnov>
- **[1f350f2c](https://github.com/onnx/onnx/commit/1f350f2c)**: Remove type info for loop variadic input in Loop op used to compose the Range op (#2287) <Hariharan Seshadri>
- **[eb139446](https://github.com/onnx/onnx/commit/eb139446)**: Add Foundation WG to working-groups.md (#2276) <Ryan Loney>
- **[4eabc4b3](https://github.com/onnx/onnx/commit/4eabc4b3)**: Fix testdata model for CumSum. Add exclusive attribute. (#2271) <jignparm>
- **[1a62afdb](https://github.com/onnx/onnx/commit/1a62afdb)**: Support GatherND operator in ONNX (#2106) <Hariharan Seshadri>
- **[0e330e9d](https://github.com/onnx/onnx/commit/0e330e9d)**: Support ScatterND operator in ONNX (#2220) <Bowen Bao>
- **[733f7a6a](https://github.com/onnx/onnx/commit/733f7a6a)**: Add Det to ONNX (#2233) <Bowen Bao>
- **[52187738](https://github.com/onnx/onnx/commit/52187738)**: Update the description of nearest_mode of resize op (#2257) <daquexian>
- **[64b4b686](https://github.com/onnx/onnx/commit/64b4b686)**: Adding sparse tensor to ONNX (#2019) <G. Ramalingam>
- **[c8a8b7cc](https://github.com/onnx/onnx/commit/c8a8b7cc)**: Support Range operator in ONNX (#2242) <Hariharan Seshadri>
- **[44b0d6d5](https://github.com/onnx/onnx/commit/44b0d6d5)**: Update resize op (#2057) <daquexian>
- **[7d907964](https://github.com/onnx/onnx/commit/7d907964)**: Add function to fuse dynamic quantization graph into 1 node (#2187) <Ashwini Khade>
- **[36f8e6d9](https://github.com/onnx/onnx/commit/36f8e6d9)**: Update logo_request.md (#2231) <Prasanth Pulavarthi>
- **[4eb737c8](https://github.com/onnx/onnx/commit/4eb737c8)**: Update Clip in opset 11 to support min/max as inputs instead of attributes (#2096) <Bowen Bao>
- **[a25e1388](https://github.com/onnx/onnx/commit/a25e1388)**: Fix segfault in tile shape inference (#2221) <daquexian>
- **[2dc273c7](https://github.com/onnx/onnx/commit/2dc273c7)**: update onehot shape inference to reflect the spec for depth input (#2224) <Ashwini Khade>
- **[665211c1](https://github.com/onnx/onnx/commit/665211c1)**: Add GatherElements Op and Rename ScatterElements (#2143) <Lara Haidar>
- **[3ba2e31a](https://github.com/onnx/onnx/commit/3ba2e31a)**: Unique (#2141) <liqunfu>
- **[5a5588ad](https://github.com/onnx/onnx/commit/5a5588ad)**: Clarify dimension variable scoping (#2211) <G. Ramalingam>
- **[fabe39d5](https://github.com/onnx/onnx/commit/fabe39d5)**: Liqun/topk sort (#2126) <liqunfu>
- **[453aa644](https://github.com/onnx/onnx/commit/453aa644)**: Update document for NMS (#2193) <Hector Li>
- **[34e28ec2](https://github.com/onnx/onnx/commit/34e28ec2)**: Handle negative 'axis' value in Split type and shape inferencing (#2177) <Scott McKay>
- **[28ec4583](https://github.com/onnx/onnx/commit/28ec4583)**: depth to space shuffle order (#2163) <Negin Raoof>
- **[98f72629](https://github.com/onnx/onnx/commit/98f72629)**: minor updates to fix links in readme (#2189) <Prasanth Pulavarthi>
- **[321d1467](https://github.com/onnx/onnx/commit/321d1467)**: Add check to disallow squeezing input axes which are not 1 (#2204) <Ashwini Khade>
- **[573f0dc9](https://github.com/onnx/onnx/commit/573f0dc9)**: fix a bug in fun shape inference (#2188) <Tang, Cheng>
- **[36dc7110](https://github.com/onnx/onnx/commit/36dc7110)**: Clarify ambiguity in gather spec regarding indices expectation (#2202) <Ashwini Khade>
- **[a2449673](https://github.com/onnx/onnx/commit/a2449673)**: Fix some minor issues in IR.md and Versioning.md (#2108) <edgchen1>
- **[349aff69](https://github.com/onnx/onnx/commit/349aff69)**: Skip install typing package for python >=3.5 (#2199) <bddppq>

Test Plan: ci

Reviewed By: bddppq, benoitsteiner

Differential Revision: D17296390

fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71
2019-09-12 12:15:03 -07:00
bdc656da70 TorchScript Serialization for dynamic LSTM
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26084

Test Plan: Imported from OSS

Differential Revision: D17339315

Pulled By: jamesr66a

fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c
2019-09-12 11:04:47 -07:00
827d71d769 Disable test_cuda.test_stream_event_nogil on ROCm (#26087)
Summary:
Was recently enabled in https://github.com/pytorch/pytorch/pull/26055, it's flaky on master:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577
```
05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3:
05:39:40 Traceback (most recent call last):
05:39:40   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
05:39:40     self.run()
05:39:40   File "/usr/lib/python2.7/threading.py", line 754, in run
05:39:40     self.__target(*self.__args, **self.__kwargs)
05:39:40   File "test_cuda.py", line 1894, in _test_stream_event_nogil
05:39:40     c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES))
05:39:40   File "test_cuda.py", line 1882, in _event_wait
05:39:40     self.assertTrue(s1.query())
05:39:40   File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
05:39:40     raise self.failureException(msg)
05:39:40 AssertionError: False is not true
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26087

Differential Revision: D17340891

Pulled By: bddppq

fbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd
2019-09-12 10:06:26 -07:00
f3fdbba666 print source code when a function is executed (#25868)
Summary:
While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25868

Differential Revision: D17318386

Pulled By: Krovatkin

fbshipit-source-id: 928ba6543204042924ab41a724635594709630de
2019-09-12 10:03:59 -07:00
4fb5a7c5b8 Experimental warning for named tensors (#26050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26050

Throws a warning once when someone attempts to attach names to a tensor.
This is guaranteed to happen at the callsite `set_named_tensor_meta`.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D17331634

Pulled By: zou3519

fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0
2019-09-12 06:34:12 -07:00
03bb7969be Move NamedTensorMetaInterface definitions to TensorImpl.h (#26030)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030

Test Plan:
- [namedtensor ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030

Differential Revision: D17322383

Pulled By: zou3519

fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96
2019-09-12 06:34:08 -07:00
a996b1d653 Make regular softmax warp size aware (#25956)
Summary:
Enable one unit test that passes now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25956

Differential Revision: D17298150

Pulled By: bddppq

fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4
2019-09-11 23:16:16 -07:00
e09c5e69f4 Dynamic registration of RPC backends (#25734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25734

[pytorch] Dynamic registration of RPC backends
Allow non-pg rpc backends to be plugged in as a backend.
ghstack-source-id: 89938296

Differential Revision: D17183789

fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089
2019-09-11 21:48:44 -07:00
24d5b5f5f9 Add Runtime flag for quantized backend. (#25680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680

Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both.

The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine)
ghstack-source-id: 89935643

Test Plan: Verified torch.backends.quantized.engine works

Differential Revision: D17198233

fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672
2019-09-11 21:37:36 -07:00
83ecdf76da Revert "TorchScript Serialization for dynamic LSTM module" (#26079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26079

This reverts commit e3039612d851d0fbd337546c8debc27ec7cfc4e4.

Test Plan: Imported from OSS

Differential Revision: D17337585

Pulled By: jamesr66a

fbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889
2019-09-11 21:23:19 -07:00
ead14a6bd4 Use BytesIO instead of tempfile (#25976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976

As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051:

> We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability.
ghstack-source-id: 89929947

Test Plan: CI

Differential Revision: D17310441

fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922
2019-09-11 19:35:49 -07:00
abb7e1365c Upgrade the naming for fbgemm quantized op (#26064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26064

Just changing the names after https://github.com/pytorch/pytorch/pull/25678.
ghstack-source-id: 89944542

Test Plan: CI

Differential Revision: D17332068

fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7
2019-09-11 19:33:18 -07:00
e3039612d8 TorchScript Serialization for dynamic LSTM module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25877

Test Plan: Imported from OSS

Reviewed By: jianyuh

Differential Revision: D17275746

Pulled By: jamesr66a

fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8
2019-09-11 19:17:25 -07:00
d4757afbe5 remove verbose in pytorch_ci hypothesis profile (#26075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26075

att, remove verbose argument to reduce noice in the logs

Test Plan:
ci

Imported from OSS

Differential Revision: D17335935

fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481
2019-09-11 18:16:30 -07:00
5376ee51fd Enable more mGPU tests (#26055)
Summary:
Enable mGPU tests that pass on ROCm as of 2.7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055

Differential Revision: D17331484

Pulled By: bddppq

fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c
2019-09-11 17:54:35 -07:00
6b7ea23d5b Add new API for Fully Connected and Convolution Operators in QNNPACK (#25862)
Summary:
This change adds a new prepack and run function for FC and Convolution operators in QNNPACK.
The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25862

Test Plan:
QNNPACK unit tests
fully-connected-test
convolution-test

Differential Revision: D17299260

Pulled By: supriyar

fbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54
2019-09-11 17:51:48 -07:00
a6a7f35481 Better error messages in C2 ONNX backend (#25809)
Summary:
Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809

Reviewed By: zrphercule

Differential Revision: D17329957

Pulled By: houseroad

fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae
2019-09-11 17:00:42 -07:00
28a2dafc15 C++ Average Pool Module (#25800)
Summary:
This PR adds Average Pool module to C++ front-end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800

Differential Revision: D17318094

Pulled By: yf225

fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db
2019-09-11 16:39:56 -07:00
a7eb18e243 Enable Unique operator tests on ROCm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26046

Differential Revision: D17331522

Pulled By: bddppq

fbshipit-source-id: 729624d1df15a1c0c7ba2b7e7e3c3a903fb13abf
2019-09-11 16:36:14 -07:00
276bde302e Enables _do_cuda_non_default_stream (#25989)
Summary:
Now that backward reuses forward streams calls to backward no longer need to be explicitly synced (in the great majority of cases). This is an opportunity to enable the _do_cuda_non_default_stream flag, which this PR does for test_cuda.py and test_distributions.py, where the flag was previously defined but set to false.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25989

Test Plan: Test changes the entire test suite, so the test suite is the test plan.

Differential Revision: D17329233

Pulled By: mruberry

fbshipit-source-id: 52f65b5ed53de26e35e6d022658d7fac22609f6a
2019-09-11 16:00:50 -07:00
ad2ec71695 Add TEST_NAMEDTENSOR flag to namedtensor ci (#25948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25948

Previously, test/test_namedtensor.py is skipped if pytorch was not
compiled with BUILD_NAMEDTENSOR. Now, we skip test/test_namedtensor.py
if pytorch was not compiled with BUILD_NAMEDTENSOR or if
TEST_NAMEDTENSOR is not set.

This is done in preparation for turning on BUILD_NAMEDTENSOR=1 permanently;
at that point we will use TEST_NAMEDTENSOR to differentiate between the
named tensor ci and the regular ci.

Test Plan:
- [namedtensor ci] (and check that the named tensor tests are actually
running).

Differential Revision: D17300132

Pulled By: zou3519

fbshipit-source-id: 928f71f4d50445680b6ae1aa54b8857bc92e4d08
2019-09-11 14:53:20 -07:00
eee58f8284 Refactor torch.*solve tests (#25733)
Summary:
Changelog:
- De-duplicate the code in tests for torch.solve, torch.cholesky_solve, torch.triangular_solve
- Skip tests explicitly if requirements aren't met for e.g., if NumPy / SciPy aren't available in the environment
- Add generic helpers for these tests in test/common_utils.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25733

Test Plan:
- All tests should pass to confirm that the change is not erroneous

Clears one point specified in the discussion in https://github.com/pytorch/pytorch/issues/24333.

Differential Revision: D17315330

Pulled By: zou3519

fbshipit-source-id: c72a793e89af7e2cdb163521816d56747fd70a0e
2019-09-11 14:30:00 -07:00
100ad48ced Remove unnecessary BUILD_NAMEDTENSOR from interned_strings.h (#25938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25938

It doesn't matter whether or not we expose these for namedtensor /
non-namedtensor builds.

Test Plan: - [namedtensor ci]

Differential Revision: D17291249

Pulled By: zou3519

fbshipit-source-id: a5aac77469e28198f63967396e2bdb1ec15bad97
2019-09-11 14:18:46 -07:00
68f40fb2c8 Add in membership checks for lists (#25796)
Summary:
Since it requires an equality operator, it's only implemented for lists
of `int`, `float`, and `str`.

Fixes some of #25758
](https://our.intern.facebook.com/intern/diff/17296216/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25796

Pulled By: driazati

Differential Revision: D17296216

fbshipit-source-id: 561245bfa75b65cee4e3395e242b2439b3c87b2e
2019-09-11 14:10:38 -07:00
d546c069a4 Preserve module names in recursive script (#24505)
Summary:
Turns
```
ScriptModule(
  (conv): ScriptModule()
  (lin): ScriptModule()
  (sub): ScriptModule()
)
```

into

```
ScriptModule(
  original=MyModule
  (conv): ScriptModule(original=Conv2d)
  (lin): ScriptModule(original=Linear)
  (sub): ScriptModule(original=Submodule)
)
```
](https://our.intern.facebook.com/intern/diff/16862032/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24505

Pulled By: driazati

Differential Revision: D16862032

fbshipit-source-id: 76dc4e5252bbf746f5cc26450b577dab10477732
2019-09-11 14:07:04 -07:00
d1d336168d Skip TestHub on macOS (#26033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26033

[test macos]

Test Plan: Imported from OSS

Differential Revision: D17323698

Pulled By: pietern

fbshipit-source-id: 1b5805d2b0f693d05a807299df4941a6bb528801
2019-09-11 13:56:03 -07:00
32b7b8994f Delay external imports until we're ready to test tensorboard (#25993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25993

These imports fail the test suite if they're not installed, even if we
don't end up testing tensorboard.

[test macos]

Test Plan: Imported from OSS

Differential Revision: D17318588

Pulled By: pietern

fbshipit-source-id: febad497ecb3fd292317f68fc2439acd893ccd67
2019-09-11 13:55:58 -07:00
6e3a8483a2 Skip TestAutograd.test_deep_reentrant on macOS (#25942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25942

See #25941 for tracking issue.

[test macos]

Test Plan: Imported from OSS

Differential Revision: D17318586

Pulled By: pietern

fbshipit-source-id: 43f61b8487210b032960b1a12516ab2f428c5e03
2019-09-11 13:55:54 -07:00
8ec80531b8 Refactor macOS build and test (#25930)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25930

This commit temporarily enables the following builds for this PR:

* pytorch_macos_10_13_py3_build
* pytorch_macos_10_13_py3_test

[test macos]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25930

Test Plan: Imported from OSS

Differential Revision: D17318583

Pulled By: pietern

fbshipit-source-id: d12f04b148318711e8a15def7dca12b8d7ef65d3
2019-09-11 13:55:49 -07:00
66e3f080ad Change brew update logic to run much faster (#25988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25988

Running `brew update` used to take over 6 minutes. Now it completes in
about 30 seconds.

Test Plan: Imported from OSS

Differential Revision: D17318585

Pulled By: pietern

fbshipit-source-id: 75956aebc887cb29dbc2bc7efbf823243f18ab01
2019-09-11 13:55:45 -07:00
5b0c2fe127 Remove trailing whitespace in CircleCI configuration files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25987

Test Plan: Imported from OSS

Differential Revision: D17318584

Pulled By: pietern

fbshipit-source-id: 64ed5fa5a45d16b7920556e16ff60c31176420f9
2019-09-11 13:55:40 -07:00
8ca93ec351 Fix torch.arange traced as constant (#25363)
Summary:
torch.arange is always traced as a constant which makes it impossible to trace correctly TestModel() from the example below.

class TestModel(torch.nn.Module):
  def forward(self, input):
    return torch.arange(input.shape[0])
input = torch.randn(5,3,2)
print(torch.jit.trace(TestModel(), input).graph)

Currently the trace of TestModel() looks like:

graph(%self : ClassType<TestModel>,
      %input : Float(5, 3, 2)):
  %11 : int = prim::Constant[value=5]()
  %12 : int = prim::Constant[value=4]()
  %13 : int = prim::Constant[value=0]()
  %14 : Device = prim::Constant[value="cpu"]()
  %15 : bool = prim::Constant[value=0]()
  %16 : Long(5) = aten::arange(%11, %12, %13, %14, %15)
  return (%16)

This PR will allow the trace to have a variable value for %11.
The trace of TestModel() with this PR's modifs looks like:

graph(%self : ClassType<TestModel>,
      %input : Float(5, 3, 2)):
  %2 : int = prim::Constant[value=0]()
  %3 : int = aten::size(%input, %2)
  %4 : Long() = prim::NumToTensor(%3)
  %11 : Scalar = prim::ImplicitTensorToNum(%4)
  %12 : int = prim::Constant[value=4]()
  %13 : int = prim::Constant[value=0]()
  %14 : Device = prim::Constant[value="cpu"]()
  %15 : bool = prim::Constant[value=0]()
  %16 : Long(5) = aten::arange(%11, %12, %13, %14, %15)
  return (%16)

More info : https://github.com/pytorch/pytorch/issues/20075
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25363

Reviewed By: zrphercule

Differential Revision: D17301934

Pulled By: houseroad

fbshipit-source-id: d9907763742cb51d8c761bf63fc2e4918f7b9941
2019-09-11 13:39:54 -07:00
62767077c3 add the tensor_observer to record the runtime tensor for quantization … (#25830)
Summary:
…accuracy analsyis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25830

Differential Revision: D17327147

Pulled By: llyfacebook

fbshipit-source-id: 095d5537a31b8d7541081000eaeb8b8474dfb8d0
2019-09-11 13:36:28 -07:00
ec48280afa Improve error message when input is not in the right format (#25928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25928

Improved error message
ghstack-source-id: 89854172

Test Plan:
if given the input of wrong dimension, the message earlier
```
[QConv2D] each dimension of output tensor should be greater than 0
```
message now
```
Given groups=1, weight of size 20, 5, 5, 1, expected input (NHWC) 10, 1, 32, 32 to have 1 channels, but got 32 channels instead
```

Reviewed By: jianyuh

Differential Revision: D17287290

fbshipit-source-id: d91573d6d69f2a5e0e615ffbd47a0bd233636a0b
2019-09-11 13:33:40 -07:00
00d967c39d enable unit tests (#25963)
Summary:
These unit tests pass after landing all the warp size awareness patches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25963

Differential Revision: D17319124

Pulled By: bddppq

fbshipit-source-id: 22f5d5f1ca9c67e66a7ccf983b2d2f889a74e729
2019-09-11 12:31:43 -07:00
075adb4d2d remove pthreadpool.a from install directory (#25977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Call add_subdirectory() explicitly before NNPACK/QNNPACK with
EXCLUDE_FROM_ALL property so that pthreadpool target won't be installed
by default for libtorch mobile build.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25977

Test Plan: Imported from OSS

Differential Revision: D17312083

Pulled By: ljk53

fbshipit-source-id: 79851d0aa9402c5b9287ef4bbd8d7fd3a341497d
2019-09-11 12:27:56 -07:00
54f3cb8f79 Updating submodules
Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 0643fdce9eee21b5caea3af56f9638cf10cd1756
2019-09-11 12:24:11 -07:00
b9bf91feb8 Add torch.backends.mkldnn.enabled flag (#25459)
Summary:
This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459

Differential Revision: D17258926

Pulled By: ezyang

fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598
2019-09-11 12:09:40 -07:00
c79a13b7b6 Simply code generation - phase 1 (#25961)
Summary:
remove special casing in YAML renderer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25961

Differential Revision: D17322658

Pulled By: kostmo

fbshipit-source-id: 2e44e075d97262790c7a5773abf0afa70e0b24cb
2019-09-11 12:03:29 -07:00
76487e16a8 indentation for hypothesis profile and proper inheritance for QuantizationTestCase (#25934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25934

att

Test Plan:
python test test/test_quantized_nn_mods.py

Imported from OSS

Differential Revision: D17318270

fbshipit-source-id: afb39f79e01e4d36a55dd17648c25e0743de1d42
2019-09-11 10:00:25 -07:00
c475ef72f9 Change order of activation and weight in QConfig (#25950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25950

I feel that is a more natural order

Test Plan:
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17294963

fbshipit-source-id: ed8ffdfe788a5e81966bda856e8d046ab68ee229
2019-09-11 09:51:01 -07:00
63df9ffd0b Fix typo in OpenBLAS cmake detection
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25966

Differential Revision: D17315925

Pulled By: albanD

fbshipit-source-id: 55c6b4a1ddeaf95714034ec66a4d59b0f00ba634
2019-09-11 09:10:42 -07:00
2080a15860 Add VariableTensorId, store it in TensorTypeSet (#25597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25597

We now take advantage of the new bitset representation TensorTypeSet to store "Variable-ness" of a tensor directly in the dispatch key. We introduce a new thread local TensorTypeSet "excluded" and replace the previous thread local boolean with it; we no longer have to query `is_variable()` to do dispatch (I didn't delete `is_variable`, because there are still a lot of uses of it). The key change is in `dispatchTypeId`.

Knock-on effects:
* Because Variable is now a TensorTypeId, I can eliminate the out-of-line registration `registerVariableOp` for variables; instead, make the registrar take a TensorTypeId (instead of a Backend) and you just register under the Variable key.
* Tensors aren't really ever created with Variable information initialized correctly at the start; instead, a tensor "becomes" a Variable because we set its `autograd_meta_`. These setters now correctly setup invariants on the dispatch type set. The new invariant is that if `autograd_meta_ != nullptr`, then `type_set().has(TensorTypeId::VariableTensorId)`.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17265919

Pulled By: ezyang

fbshipit-source-id: a90a7ed14f5cb1086137483ae3d0646fcd4c42d0
2019-09-11 08:59:48 -07:00
ba9fda14a7 C++ MaxPool Module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24860

Differential Revision: D17260361

Pulled By: yf225

fbshipit-source-id: 4b8c894d3bdf675cfeb9fc84934fe0339a048c1e
2019-09-11 08:56:57 -07:00
e04836004d L1Loss module (#25902)
Summary:
yf225 This is L1Loss module. I don't think that ```_Loss``` and ```_WeightedLoss``` as base Python classes do anything. First one sets reduction type and also takes in ```reduce``` parameter which is deprecated. The second one only registers ```weight``` parameter. I don't think that we should keep this structure. What do you think?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25902

Differential Revision: D17307045

Pulled By: yf225

fbshipit-source-id: ad3eda2ee8dcf4465054b376c1be89b39d11532f
2019-09-11 07:18:17 -07:00
1a58a9e441 The float version of calc_digamma should return float type. (#25488)
Summary:
Besides common understanding, the only occurrence of calc_digamma is in UnaryOpsKernel.cpp, which clearly sees the float version of calc_digamma as returning float type (and the double version of calc_digamma as returning double type).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25488

Reviewed By: ezyang

Differential Revision: D17172379

Pulled By: VitalyFedyunin

fbshipit-source-id: 56facd45564cff019d572138c0d541a0bdded5c8
2019-09-11 07:10:41 -07:00
ebdb32c749 Remove global group name tracking for ProcessGroupNCCL (#25905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25905

Now that we can detect and recover from failures in NCCL we should
allow processes that are started at different times (and perhaps have
had previous NCCL process group instances), to eventually be part of
the same process group. Keeping track of group names in global
variables prevents that, because the processes will be out of sync.

This commit removes the global group name maps and defers
responsibility of isolating access to the same store from multiple
process groups to the store itself. Users can use `c10d::PrefixStore`
to derive new store instances whose keyspace is scoped to some
prefix. Functionally, this is identical to keeping a global map and
using a group name, but also gives more flexibility to the front-end
API to reset state and have processes that have started at different
times to join the same process group.
ghstack-source-id: 89804865

Test Plan: Tests pass.

Differential Revision: D17281416

fbshipit-source-id: eab3b48463a9b0ef24aedeca76e2bb970b9f33ef
2019-09-11 06:56:33 -07:00
929764ac2a Remove superfluous check for POLLIN in TCPStore (#25911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25911

The check is practically equivalent to checking for equivalence with
POLLIN (because the constant is a single bit and poll(2) is asked to
check for POLLIN). On macOS, if a client disconnects, POLLHUP will be
set as well, and the check fails. Instead of performing the check and
letting it fail, we can simply run the `query` function and catch
exceptions, in case we see EOF.

Test Plan: Imported from OSS

Differential Revision: D17313301

Pulled By: pietern

fbshipit-source-id: 00c5a69043f70848ef632d53f8e046dc69e15650
2019-09-11 02:23:34 -07:00
e4cd807cdb Make running Gloo tests conditional on availability
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25913

Test Plan: Imported from OSS

Differential Revision: D17313283

Pulled By: pietern

fbshipit-source-id: f07cb456e79a0067eac0f7abbc378fbd05c5565f
2019-09-11 02:20:32 -07:00
ebeb2a35ce Increase failure threshold for timing based assert (#25867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25867

The test can fail if this is run as a stress test. Increase the
threshold to significantly decrease the probability of failure.
ghstack-source-id: 89743661

Test Plan: Tests pass.

Differential Revision: D17266101

fbshipit-source-id: af514eff305783e4a970ac30c3ebdb02fbdcf4c5
2019-09-11 02:17:37 -07:00
87a2c92615 Updates autograd engine to respect streams set in forward (#8354)
Summary:
This PR addresses issue https://github.com/pytorch/pytorch/issues/7601.

Currently models that use streams explicitly in forward have to do a lot of extra work to make backwards respect those streams. This PR extends the (recently added) input tracing (see TypeAndShape) to record the devices and streams of inputs. The autograd engine then uses this metadata to enact the expected stream parallelism without extra work from the user.

For example, a model with forward declared like (original example courtesy of ngimel):

```
def forward(self,x):
        x0 = x.clone()
        torch._C._cuda_setStream(self.stream1._cdata)
        y0 = self.fc1(x0)
        self.event1.record(stream = torch.cuda.current_stream())

        torch._C._cuda_setStream(self.stream2._cdata)
        y1 = self.fc2(x)
        self.event2.record(stream = torch.cuda.current_stream())
        self.stream2.wait_event(self.event1)
        return y0 + y1
```

currently will backward on a single stream. With this change the kernels will go on the streams they are assigned in forward and both forward and backward will (for appropriate sizes) run the fc1 and fc2 kernels simultaneously.

The crux of this change is, as mentioned, an expansion of the TypeAndShape tracing and a relatively simple change to the autograd engine to use cuda events for stream synchronization. To make this efficient I also added a new AutoGPUAndStream class, exposed getting and setting streams on devices, and removed InputBuffer's AutoGPU (it's now redundant). While making these modifications I also fixed AutoGPU to check before setting the GPU when it's destroyed and to use THCudaCheck instead of its custom error handler. These changes mean that an often excessive cudaSetDevice() is not being called when inputs are added to a buffer.

In addition to allowing users to easily set and use streams that are respected in both forward and backward, this change may encourage modules to do the same and the expanded tracing might allow further optimizations in the autograd engine. (apaszke, for example, now after initial enumeration we know the number of devices that will be used by a graph task, which might help provide a sense of the "level of parallelism" we should expect.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8354

Test Plan: Two tests were added specifically for this behavior.

Differential Revision: D17275980

Pulled By: mruberry

fbshipit-source-id: 92bd50ac782ffa973b159fcbbadb7a083802e45d
2019-09-10 23:46:51 -07:00
6b3f968957 Updating submodules
Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: b644007606b1c53d030e596f84ad5c09eedf2a9e
2019-09-10 22:19:47 -07:00
9815739d83 Fix LBFGS on GPU (#25909)
Summary:
Changelog:
- Fixes mismatch of device in LBFGS, and possibly that of data type as well.

Fixes https://github.com/pytorch/pytorch/issues/25854
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25909

Differential Revision: D17285583

Pulled By: soumith

fbshipit-source-id: 68df9326c1c40803494ee0693a0eddcd98c30ce7
2019-09-10 21:41:53 -07:00
3185b455c6 Add assert to ensure the divisor is not 0 (#25960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25960

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/124

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D17292372

fbshipit-source-id: 71a72f87b99c65b3b956bd8361694b1de05fc333
2019-09-10 21:03:15 -07:00
9b4f3fd7d3 Add torch.nn.LSTM into the default dynamic quantize mappings (#25954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25954

Add torch.nn.LSTM into the default dynamic quantize mappings. We will by default dynamic quantize LSTM when we apply the quantize_dynamic API.
ghstack-source-id: 89839673

Test Plan: CI

Differential Revision: D17294958

fbshipit-source-id: 824aceef821276b3e28c52ce3bebafaf9b0a0833
2019-09-10 21:03:12 -07:00
21e9d1144e fix use-after-free bug
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25965

Test Plan: Imported from OSS

Differential Revision: D17300835

Pulled By: zdevito

fbshipit-source-id: dd22d71687f03a5900aec4e36b795e1b13904eee
2019-09-10 20:18:14 -07:00
dc015a1afb Delete tools/autograd/env.py (#25920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25920

It's not necessary anymore

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17285053

Pulled By: zou3519

fbshipit-source-id: 56dc18d1d49a0df7dacda880189e6c2fb09bc5f6
2019-09-10 20:08:24 -07:00
e69a6bab8c compute common dtype based on inputs only (#25593)
Summary:
Currently we compute common dtype for TensorIterator based on all inputs and outputs. It can be a problem when dtype of the outputs should be different from dtype of inputs. (Example torch.eq)
We also have `dont_compute_common_dtype` method that allows us to avoid a computation of a common dtype for all inputs and outputs.

This PR will give the ability to compute common dtype based only on inputs using `compute_common_dtype_only_for_inputs`. Also it will provide a simple method `input_dtype(int arg=0) that will give the ability to dispatch based on input's dtype.

```
AT_DISPATCH_ALL_TYPES(iter.input_dtype(), ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25593

Differential Revision: D17286352

Pulled By: ifedan

fbshipit-source-id: a94fb608acd2763120992fe85b8dfd02ff21f9ba
2019-09-10 19:30:08 -07:00
8f7020bbdb add support for ModuleDict (#25715)
Summary:
Add support for nn.ModuleDict in script. This is needed to support torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25715

Differential Revision: D17301826

Pulled By: eellison

fbshipit-source-id: 541b5477e980f519a8c3bbb1be91dac227f6d00f
2019-09-10 18:43:49 -07:00
a88f310151 Simplify header inclusion in test/cpp/api/modules.cpp (#25921)
Summary:
This PR simplifies header inclusion in `test/cpp/api/modules.cpp`, so that when we add a new `torch::nn` module and add the test in `modules.cpp`, we can check that the new module's header is included in `torch/torch.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25921

Differential Revision: D17303220

Pulled By: yf225

fbshipit-source-id: 327db0ff2f075d52e7b594b3dffc5a59441e0931
2019-09-10 18:37:39 -07:00
74b48f21c1 remove protobuf from Dependencies.cmake for libtorch mobile build (#25958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25958

Should have cleaned up the remaining protobuf dependencies before landing PR #25896.

Test Plan: - CI build;

Reviewed By: dreiss

Differential Revision: D17296949

Pulled By: ljk53

fbshipit-source-id: 20c444e63900c7fa054db3cc757d3f18614af630
2019-09-10 18:23:20 -07:00
fc93d1ae6b Add ONNX export support for torch.log1p. (#25808)
Summary:
`torch.log1p` operator is not supported in ONNX exporter. This PR adds the support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25808

Reviewed By: zrphercule

Differential Revision: D17298092

Pulled By: houseroad

fbshipit-source-id: 65a919a07797722d7d4df8caf284bd89acd0bb02
2019-09-10 18:17:08 -07:00
1897440e02 add torch.jit.is_scripting api (#25955)
Summary:
The PR https://github.com/pytorch/pytorch/pull/25263 was based on got reverted and ghimport got confused. Relanding here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25955

Differential Revision: D17296727

Pulled By: eellison

fbshipit-source-id: 96200d3ef4c86f0d9907dc41b05619cb33bf2bab
2019-09-10 17:28:59 -07:00
5d3267cd30 Remove some more BUILD_NAMEDTENSOR flags
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25919

Test Plan: - [namedtensor ci]

Differential Revision: D17285052

Pulled By: zou3519

fbshipit-source-id: 52dce616104248bd36a1ddbefe51ce83163eae51
2019-09-10 17:22:28 -07:00
2655b2710c Disable flaky test_invalid_names in test_rpc.py (#25916)
Summary:
pietern discovered that `test_invalid_names` is flaky on master. https://github.com/pytorch/pytorch/issues/25656 is potentially the fix. Disable this test for now and will try to add it again when https://github.com/pytorch/pytorch/issues/25656 is in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25916

Differential Revision: D17287496

Pulled By: mrshenli

fbshipit-source-id: 9313958d3480c2bab20cd2341837c7821e3bb1b5
2019-09-10 17:00:22 -07:00
4231287504 Add names= argument to torch.tensor ctor (#25424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25424

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17120399

Pulled By: zou3519

fbshipit-source-id: 93d7944f2ec4c5a7256f505323b879af706131df
2019-09-10 16:58:01 -07:00
2856fd6c22 make python rpc handler to be singleton class (#25742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25742

python rpc handler right now is namespace + global variable, changed it to be singleton class as it can gurantee deterministic order of variable destruction. for namespace + global variable, we hit a process exit crash issue because global variables have dependencies and they are not destructed as expected
ghstack-source-id: 89809889

Test Plan: unit test passed

Differential Revision: D17097999

fbshipit-source-id: 5a5d003925dba1a7ea1caf3b7c28ff9e24c94a21
2019-09-10 15:30:20 -07:00
16c1907830 update build_android.sh to not build host protoc for libtorch (#25896)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25896

Similar change as PR #25822.

Test Plan:
- Updated CI to use the new script.
- Will check pytorch android CI output to make sure it builds libtorch
  instead of libcaffe2.

Reviewed By: dreiss

Differential Revision: D17279722

Pulled By: ljk53

fbshipit-source-id: 93abcef0dfb93df197fabff29e53d71db5674255
2019-09-10 15:19:43 -07:00
4bd9ddb0b7 remove pthreadpool dependency in aten/CMake (#25894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25894

NNPack/QNNPack both depend on a third-party library "pthreadpool". There
are two versions of "pthreadpool" implementation, one is the default
implementation under third-party/pthreadpool, the other is caffe2 custom
implementation under caffe2/utils/threadpool. Both implementations share
the same interface (as defined by pthreadpool headers).

Usually only one version of pthreadpool should be linked into libtorch.
If QNNPACK_CUSTOM_THREADPOOL/NNPACK_CUSTOM_THREADPOOL are set to true,
then QNNPack/NNPack will not link third-party/pthreadpool - they will
expect the caller (libtorch) to link correct version of pthreadpool;
otherwise they will bring in the default pthreadpool implementation.

Looks like libtorch cmake already sets both macros to true in
Dependencies.cmake and External/nnpack.cmake. And currently libtorch
mobile build includes the caffe2/utils/threadpool pthreadpool
implementation. So it shouldn't try to explicitly link default
pthreadpool target in aten/CMake in this AT_NNPACK_ENABLED section.

Test Plan:
- Before this diff, libtorch.so links libpthreadpool.a:
```
LINK_LIBRARIES = lib/libc10.so lib/libqnnpack.a lib/libnnpack.a
lib/libcpuinfo.a -llog -ldl -lm lib/libnnpack.a lib/libpthreadpool.a
lib/libcpuinfo.a lib/libclog.a -llog -latomic -lm
```

- After this diff, libtorch.so no longer links libpthreadpool.a:
```
LINK_LIBRARIES = lib/libc10.so lib/libqnnpack.a lib/libnnpack.a
lib/libcpuinfo.a -llog -ldl -lm lib/libnnpack.a lib/libcpuinfo.a
lib/libclog.a -llog -latomic -lm
```

- Tried the following combinations to make sure things work as expected:
* remove caffe2/utils/threadpool, remove libpthreadpool: link error;
* keep caffe2/utils/threadpool, remove libpthreadpool: no link error;
* remove caffe2/utils/threadpool, add back libpthreadpool: no link error;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25894

Reviewed By: dreiss

Differential Revision: D17279723

Pulled By: ljk53

fbshipit-source-id: ae5aa7ca7283a276ecf1e2140bad0a6af3efdb3a
2019-09-10 15:03:13 -07:00
ec8e75ea92 Fix int32 overflow in SummaryOps.cu getBin #25747 (#25748)
Summary:
Fixes issue https://github.com/pytorch/pytorch/issues/25747 by upcasting to int64 before multiplication. Should be good enough for all reasonable nbins
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25748

Differential Revision: D17269111

Pulled By: ezyang

fbshipit-source-id: 484be39080571203264a1bb9898ecf23d1aeafab
2019-09-10 15:00:45 -07:00
a7eaec6cf2 add set_grad_enabled to TorchScript and fix data attribute
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25350

Test Plan: Imported from OSS

Differential Revision: D17100829

fbshipit-source-id: d85d6f3b03218b9c77e144365940eeaa5b4cce9a
2019-09-10 14:36:26 -07:00
387d5a4459 Add ONNX Export Support to rsqrt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24153

Reviewed By: zrphercule

Differential Revision: D17231150

Pulled By: houseroad

fbshipit-source-id: 621fa9069238a74101bb2a7f4792a6feb1f89606
2019-09-10 14:33:54 -07:00
d377556f08 Make persistent softmax WARP_SIZE aware. (#25937)
Summary:
Also change documentation to reflect both the CUDA and ROCm facts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25937

Differential Revision: D17291453

Pulled By: bddppq

fbshipit-source-id: ee1d7a34f3ad6c05a8f1564d4f9e516e497f2199
2019-09-10 14:12:40 -07:00
a14e884546 Migrate pow from TH to Aten (CUDA) (#25517)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24613

```
DEBUG = 0
OMP_NUM_THREADS = 1
Tesla M40

import torch

base = torch.randn(1000000, device='cuda:1')
exp  = torch.randn(1000000, device='cuda:1')
out  = torch.empty_like(base)

timeit base.pow(0)
old 53.1 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 18.7 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

timeit base.pow(1/3)
old 53.3 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 51.1 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1/3)
old 53.3 µs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 51.1 µs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(1/2)
old 53.2 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 34.8 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1/2)
old 53.3 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 42 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(1)
old 38.3 µs ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 40.1 µs ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1)
old 38.4 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 35 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(2)
old 38.1 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 34.8 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-2)
old 38.3 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 35.2 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(3)
old 38.3 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 34.9 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-3)
old 53.3 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 51.4 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(123456.789)
old 53.3 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 51.2 µs ± 24.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-123456.789)
old 53.5 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 51.3 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(exp)
old 58.2 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 54.5 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(0, exp)
old 49.1 µs ± 89.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 58.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(1, exp)
old 48.7 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 18.7 µs ± 88.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

timeit torch.pow(-1, exp)
old 50.7 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 59.8 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(42, exp)
old 49.4 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 58.6 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-42, exp)
old 50.4 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 59.8 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(0, exp, out=out)
old 49 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 59.2 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(1, exp, out=out)
old 49.3 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 18.8 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

timeit torch.pow(-1, exp, out=out)
old 50.4 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 60.2 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(42, exp, out=out)
old 49.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 58.9 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-42, exp, out=out)
old 50.5 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 60.1 µs ± 89.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

base = (torch.rand(1000000, device='cuda:1') * 10).to(int)
exp  = (torch.rand(1000000, device='cuda:1') * 10).to(int)
out  = torch.empty_like(base)

timeit base.pow(0)
old 75.5 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 33.8 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(1/3)
old 75.5 µs ± 78.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 842 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1/3)
old 75.5 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 843 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(1/2)
old 75.7 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 123 µs ± 71.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1/2)
old 76 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 180 µs ± 55.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(1)
old 74.1 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 72.3 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-1.0)
old Integers to negative integer powers are not allowed.
new 86.9 µs ± 84.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(2)
old 74.2 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 66.5 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-2.0)
old Integers to negative integer powers are not allowed.
new 87.3 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(3)
old 74.3 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 66.5 µs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-3.0)
old Integers to negative integer powers are not allowed.
new 861 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(123456.789)
old 256 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 863 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(-123456.789)
old Integers to negative integer powers are not allowed.
new 863 µs ± 57.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit base.pow(exp)
old 111 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 98.8 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(0, exp)
old 81.9 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 92.9 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(1, exp)
old 81.9 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 33.6 µs ± 56.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-1, exp)
old 82.2 µs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 93.6 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(42, exp)
old 82.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 93.8 µs ± 75.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-42, exp)
old 82.3 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 94 µs ± 68.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(0, exp, out=out)
old 81.6 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 93.8 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(1, exp, out=out)
old 81.6 µs ± 26.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 33.7 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-1, exp, out=out)
old 82.7 µs ± 119 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 93.9 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(42, exp, out=out)
old 82.6 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 93.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

timeit torch.pow(-42, exp, out=out)
old 82.5 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
new 94 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25517

Differential Revision: D17251364

Pulled By: pbelevich

fbshipit-source-id: 20904c073c311e76285eaa1b68e67e67ea3c62d8
2019-09-10 13:46:22 -07:00
55219d55a6 Only create a new clone of observer when we actually insert it.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25931

Test Plan: Imported from OSS

Differential Revision: D17288546

Pulled By: ZolotukhinM

fbshipit-source-id: 01584d27c0ebd127b845e560aedd1f1d9d298c5e
2019-09-10 13:46:18 -07:00
618804f237 Make lookup table warp size aware
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25926

Differential Revision: D17286446

Pulled By: bddppq

fbshipit-source-id: d25515f25f9df309a08ae7f948bb6a087e45134e
2019-09-10 13:22:17 -07:00
3680cef44e C++ Fold nn module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24160

Differential Revision: D17260740

Pulled By: yf225

fbshipit-source-id: f0c7769316bed330289ca3d948f2e39c72ec928b
2019-09-10 13:19:37 -07:00
2ab0f221ba Make spatial depthwise convolution warp size aware (#25922)
Summary:
Use new macro and remove hard-coded path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25922

Differential Revision: D17286444

Pulled By: bddppq

fbshipit-source-id: 21bfb6053258af3ccfe1f2a6e5c17faa31602e28
2019-09-10 13:08:46 -07:00
c749be9e9f Make arguments of Module::dump easier to remember. (#25740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25740

Previously we had `omit_method_bodies`, `omit_attr_values` and
`omit_param_values`. They were called the same in the python bindings
and it was hard to remember their proper spelling. This PR changes them
to `code`, `attrs`, and `params` which are might easier to remember. It
also flips their meaning - now they enable printing instead of disabling
it. I also changed the default values to 'print all' from 'print
nothing', as that's the most usual way of using it.

Test Plan: Imported from OSS

Differential Revision: D17217517

Pulled By: ZolotukhinM

fbshipit-source-id: fa56e478a732ffd685d885f11c9da0457cd03d16
2019-09-10 11:42:26 -07:00
26f67e7aa7 fix scatter CPU kernel when (input size, src size) > index size (#25839)
Summary:
fixes https://github.com/pytorch/pytorch/issues/25836
According to doc, https://pytorch.org/docs/stable/tensors.html#torch.Tensor.scatter_ `index` must have the smallest size and we should iterate over `index` instead of `tensor`.
cc: dlibenzi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25839

Differential Revision: D17269116

Pulled By: ailzhang

fbshipit-source-id: 0e8569fed6c0d2dd70e4e3ec5d29d8730cd2ae8f
2019-09-10 11:41:41 -07:00
5dfef472fb make sparse coalesce warp size aware (#25918)
Summary:
Use the new C10_WARP_SIZE macro to make the sparse coalesce kernel warp size aware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25918

Differential Revision: D17286442

Pulled By: bddppq

fbshipit-source-id: a079f012c32e5786b49b2a6973019d847ee11897
2019-09-10 11:10:07 -07:00
9c10f729de Add Dropout to blacklist (#25881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25881

Add Dropout to blacklist to avoid the error in eager mode quantization.
ghstack-source-id: 89759536

Test Plan: Test locally in python notebook.

Reviewed By: jianyuh

Differential Revision: D17270826

fbshipit-source-id: bcf43483976740564d7f407838f25c2dbb67b016
2019-09-10 10:57:38 -07:00
26675b507f Enable libflame as a LAPACK choice (#25795)
Summary:
libflame is BLIS's companion LAPACK from the FLAME project

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/libflame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25795

Differential Revision: D17286461

Pulled By: bddppq

fbshipit-source-id: 7cd0d27127c78563574791415e4a34f045df30df
2019-09-10 10:34:55 -07:00
aa49aa856c Tensor type set (#25308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308

Instead of storing a single TensorTypeId in a Tensor, we store a bitset of tensor type IDs in a Tensor, TensorTypeSet. This class comes with some unit tests.  This is in preparation for making Variable a TensorTypeId. In order to help flush out places where this makes a semantic difference, we rename `Tensor::type_id()` to `Tensor::type_set()` and smoke out all of the locations where this was semantically meaningful.

Because the new tensor type set is 64-bits, this increases the size of Tensor by a word.

Listing of semantic changes:
* Many TensorImpl related constructors just propagate TensorTypeId to a parent constructor. These are pretty simple to adjust.
  * Backend extensions are now in the business of explicitly constructing a TensorTypeSet and then passing it in. This is probably OK for now but when Variable drops, these dispatch IDs may get immediately overwritten to have Variable set.
* `sparseTensorSetToDeviceType` and similar functions previously did an equality test with TensorTypeId, to determine what an appropriate device type is. This equality is now replaced with a set inclusion test. This is valid, under the assumption that we don't ever have weird sets like "this tensor is simultaneously a sparse CPU tensor and a sparse CUDA tensor", which will be true in the short term plan of adding Variable to the dispatch ID.
* `impl::dispatchTypeId` was generally introduced for cases where we legitimately need to convert from `TensorTypeSet -> TensorTypeId` in a dispatch related manner. At the moment, the implementation is trivial, but they will soon be adjusted to handle TLS. I've tried to make these call sites as forwards compatible as possible:
  * `checked_tensor_unwrap` and co now use `dispatchTypeId`. When Variable is added to the type set, these will always be called in a context where the Variable type ID is disabled, so we will get the correct underlying tensor type ID.
  * Uses of `Backend` in dispatch are now replaced with `TensorTypeSet`. The general heuristic here for whether or not to accept a `TensorTypeId` or `TensorTypeSet` is that we want to make the generated code as simple as possible. It is easier to retrieve a `TensorTypeSet`, so that's a more appropriate API in these cases.
* In some cases, I could not conveniently switch an implementation to the new semantics, because it was blocked on some other refactor. In this case, I introduced `legacyExtractTypeId`, which gives what would be a BC-compatible `TensorTypeSet` to `TensorTypeId` implementation that will continue to report the same values it would have prior to this change. This is **different** from `dispatchTypeId`, because this function does NOT respect TLS; it always ignores Variable type IDs.
  * c10 dispatcher tests, which are oblivious to Variable dispatch, use this BC function (actually, they use `extractTypeId`, an overload for Tensor.
  * The implementation of `new_*` methods heavily relies on tensor type ID, I chose not to unwind this. PR to refactor this at https://github.com/pytorch/pytorch/pull/25475
  * Slicing also relies on tensor type ID, see `torch/csrc/autograd/python_variable_indexing.cpp` (though in some cases in this file, I was able to replace use of tensor type ID with TensorOptions)
* In some cases, there is an equality test on tensor type ID which would be better done by testing "tensor axes". In those cases, I replaced those equality tests with more equality tests.
  * Example: `torch/csrc/nn/type_checks.h`
  * There is a total punt in `torch/csrc/tensor/python_tensor.cpp` where "instance of" checking is done via dispatch ids. In general, the Variable-ness of a tensor doesn't participate in instanceof testing. It's not entirely clear what to do here.
  * Instead of storing `Backend` in `VariableInfo`, we now just store Layout.

c10 dispatcher test updates were done with:

```
:%s/\([^ ]\+\)\.type_id()/extractTypeId(\1)/g
:%s/\([^( ]\+\)->type_id()/extractTypeId(*\1)/g
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308

Differential Revision: D17092791

Test Plan: sandcastle and ossci

Reviewed By: bwasti

Pulled By: ezyang

fbshipit-source-id: 22207d14fe62dd31ee19cc5011af22e3d9aabb5b
2019-09-10 10:30:54 -07:00
6630c3f379 add NO_EXPORT macro to unset __visibility__ attribute (#25816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25816

On Android we will release a small set of native APIs designed for mobile use
cases. All of needed libtorch c++ APIs are called from inside this JNI bridge:
android/pytorch_android/src/main/cpp/pytorch_jni.cpp

With NO_EXPORT set for android static library build, it will hide all
original TORCH, CAFFE2, TH/ATen APIs, which will allow linker to strip
out unused ones from mobile library when producing DSO.

If people choose to directly build libtorch DSO then it will still keep
all c++ APIs as the mobile API layer is not part of libtorch build (yet).

Test Plan:
- build libtorch statically and link into demo app;
- confirm that linker can strip out unused APIs;

Differential Revision: D17247237

Pulled By: ljk53

fbshipit-source-id: de668216b5f2130da0d6988937f98770de571c7a
2019-09-10 10:20:21 -07:00
8485710143 introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile
Summary:
This is the first of a series of changes to reduce build size by cutting
autograd functions from mobile build.

When INTERN_DISABLE_AUTOGRAD is set:
* On CMake side we exclude Functions.h/cpp, VariableType*.h/cpp,
  VariableTypeManual.cpp from the build process. Still keep variable_factories.h
  as we rely on it to create variables instead of tensors.
* In source code we gate a couple autograd references (in autograd/variable.cpp)
  with C10_MOBILE (technically we should use a dedicated c macro but its
  maintenance cost is higher than cmake macro as we have several build systems
  to change).
* Pass --disable-autograd flag to codegen script, which will stop generating
  Functions/VariableType code. And for variable_factories.h it will stop
  generating tracing code.

Edit: in this diff we will keep Functions.h/cpp to avoid changing source code.

Why we need this change if it's already not calling VariableType and autograd
stuff with USE_STATIC_DISPATCH=ON for mobile?
It's trying to reduce static library size for iOS build, for which it's
relatively harder to strip size with linker approach.

Why we need make involved change into codegen script?
There isn't a global config system in codegen - autograd/env.py provides similar
functionality but it says not adding anything there.

Test Plan:
- will check CI;
- test mobile build in sample app;

Differential Revision: D17202733

Pulled By: ljk53

fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299
2019-09-10 10:20:17 -07:00
41cf5564fe gate static aten registerer with USE_STATIC_DISPATCH (#25815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25815

Don't need call these global registerers when USE_STATIC_DISPATCH is
set as they will keep all aten functions at link time.

Should solely rely on jit/generated/register_aten_ops* to keep "interface"
aten functions (which are directly called from JIT), and rely on
STATIC_DISPATCH + linker to keep all other aten functions that are
transitively needed by the "interface" functions.

Test Plan:
- build and run in the demo app;
- with stacked diff to shrink registered "interface" functions, linker
  can strip out unused aten implementations;

Differential Revision: D17247236

Pulled By: ljk53

fbshipit-source-id: 1feb5fbb8b9cfa057b9ba8bf3f2967f40980c917
2019-09-10 10:20:13 -07:00
76ee02f10d Rename packed tensor accessor (#25654)
Summary:
Closes https://github.com/pytorch/pytorch/issues/19268

This does the renaming suggested by ezyang in https://github.com/pytorch/pytorch/issues/19268#issuecomment-490478887 except that the templated version of `packed_accessor` is also renamed to `generic_packed_accessor`.

Additionally, all of the users I could find in `ATen/native/cuda` are updated without changing their index types.

The corresponding tutorial update is in pytorch/tutorials#644
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25654

Differential Revision: D17259208

Pulled By: ezyang

fbshipit-source-id: 172a46f623d544ca16f7ed5077b6e4f57a3d1f21
2019-09-10 09:18:54 -07:00
e8cc1fddb7 Fix cpp_extensions test failures with GCC 9.1 from ArrayRef(initializer_list) (#25384)
Summary:
These are test failures due to `-Werror` in `test/cpp_extensions/setup.py` that look like:
```
$ python test/run_test.py -i cpp_extensions
Test executor: ['/home/rgommers/anaconda3/envs/pytorch-gcc91/bin/python']
Running test_cpp_extensions ... [2019-08-29 02:19:03.421117]
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/torch_test_cpp_extension
copying torch_test_cpp_extension/__init__.py -> build/lib.linux-x86_64-3.6/torch_test_cpp_extension
running build_ext
building 'torch_test_cpp_extension.cpp' extension
creating build/temp.linux-x86_64-3.6
gcc -pthread -B /home/rgommers/anaconda3/envs/pytorch-gcc91/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/rgommers/code/pytorch/torch/include -I/home/rgommers/code/pytorch/torch/include/torch/csrc/api/include -I/home/rgommers/code/pytorch/torch/include/TH -I/home/rgommers/code/pytorch/torch/include/THC -I/home/rgommers/anaconda3/envs/pytorch-gcc91/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -Werror -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpp -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/rgommers/code/pytorch/torch/include/c10/core/MemoryFormat.h:5,
                 from /home/rgommers/code/pytorch/torch/include/ATen/core/Tensor.h:5,
                 from /home/rgommers/code/pytorch/torch/include/ATen/Tensor.h:2,
                 from /home/rgommers/code/pytorch/torch/include/ATen/Context.h:4,
                 from /home/rgommers/code/pytorch/torch/include/ATen/ATen.h:5,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/rgommers/code/pytorch/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/rgommers/code/pytorch/torch/include/torch/extension.h:4,
                 from extension.cpp:1:
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = long int]’:
/home/rgommers/code/pytorch/torch/include/c10/core/TensorImpl.h:1464:34:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<long int>::Data’ from ‘std::initializer_list<long int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
  103 |       : Data(Vec.begin() == Vec.end() ? static_cast<T*>(nullptr) : Vec.begin()),
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = unsigned char]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<unsigned char>::Data’ from ‘std::initializer_list<unsigned char>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = signed char]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<signed char>::Data’ from ‘std::initializer_list<signed char>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = short int]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<short int>::Data’ from ‘std::initializer_list<short int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = int]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<int>::Data’ from ‘std::initializer_list<int>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = float]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<float>::Data’ from ‘std::initializer_list<float>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = double]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<double>::Data’ from ‘std::initializer_list<double>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = bool]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<bool>::Data’ from ‘std::initializer_list<bool>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = c10::Half]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<c10::Half>::Data’ from ‘std::initializer_list<c10::Half>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h: In instantiation of ‘constexpr c10::ArrayRef<T>::ArrayRef(const std::initializer_list<_Tp>&) [with T = c10::BFloat16]’:
/home/rgommers/code/pytorch/torch/include/ATen/NativeFunctions.h:47:1:   required from here
/home/rgommers/code/pytorch/torch/include/c10/util/ArrayRef.h:103:39: error: initializing ‘c10::ArrayRef<c10::BFloat16>::Data’ from ‘std::initializer_list<c10::BFloat16>::begin’ does not extend the lifetime of the underlying array [-Werror=init-list-lifetime]
cc1plus: all warnings being treated as errors
error: command 'gcc' failed with exit status 1
Traceback (most recent call last):
  File "test/run_test.py", line 438, in <module>
    main()
  File "test/run_test.py", line 430, in main
    raise RuntimeError(message)
RuntimeError: test_cpp_extensions failed!
```

The warnings look valid, the code isn't guaranteed to work (although in practice it does seem to).  Using `std::begin` keeps the underlying array for the `initializer_list` going out of scope.

Note that the same warning is reported in https://github.com/pytorch/vision/issues/1173#issuecomment-517308733 (Cc ShahriarSS)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25384

Differential Revision: D17113146

Pulled By: ezyang

fbshipit-source-id: 477c414481fb3664a8cb92728f4111e6317b309e
2019-09-10 09:09:52 -07:00
c60dddbb9f Store bias in PackedConvWeight in fbgemm (#25626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25626

Add bias as an optional parameter in the packed conv weight struct.
ghstack-source-id: 89780639

Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer

Reviewed By: raghuramank100

Differential Revision: D17177723

fbshipit-source-id: e502f2196cb1c002db8b691124db740368944c92
2019-09-10 08:43:55 -07:00
57b23c61c5 In the CUDA implementation of erfinv, erfinv() should be used for double (#25337)
Summary:
This best preserves accuracy, while erfinvf() should be used for half and float.

This is also consistent with the implementation before the migration: https://github.com/pytorch/pytorch/issues/24943
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25337

Differential Revision: D17102333

Pulled By: zou3519

fbshipit-source-id: 5178cff534cf5f10d86ab04d4b6c1779ffedf49e
2019-09-10 06:30:33 -07:00
bf04c2ca2f Make torch checks same for both CPU and CUDA multinomial (#25595)
Summary:
Currently we have different checks for multinomial method on CPU and CUDA. This PR will make them consistent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25595

Differential Revision: D17236163

Pulled By: ifedan

fbshipit-source-id: 7718173bdaf216e8eb636c2a5b9c5939b975325b
2019-09-10 05:29:58 -07:00
8a026d4f74 Remove tools/setup_helpers/dist_check.py (#25879)
Summary:
What dist_check.py does is largely merely determining whether we should
use set "USE_IBVERBS" to ON or OFF when the user sets "USE_GLOO_IBVERBS"
to ON. But this is unnecessary, because this complicated determination
will always be overrided by gloo:

2101e02cea/cmake/Dependencies.cmake (L19-L28)

Since dist_check.py becomes irrelevant, this commit also simplifies the
setting of `USE_DISTRIBUTED` (by removing its explicit setting in Python scripts), and deprecate `USE_GLOO_IBVERBS` in favor
of `USE_IBVERBS`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25879

Differential Revision: D17282395

Pulled By: pietern

fbshipit-source-id: a10735f50728d89c3d81fd57bcd26764e7f84dd1
2019-09-10 04:33:28 -07:00
a8d4bb34ea Unify treatment of warp size / wave size (#25884)
Summary:
Introduce a C10_WARP_SIZE define in Macros.h

For kernels that had ifdef-ing of WARP_SIZE for ROCm vs CUDA, use said macro. This is no functional change - we merely refactor to unify on one WARP_SIZE definition.

I hope we can encourage use of this macro over more WARP_SIZE definitions being sprinkled across the code base (or numerically hard-coded).

Some kernels remain that have their own WARP_SIZE definitions but did not satisfy above condition. They will be fixed in follow-up PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25884

Differential Revision: D17276662

Pulled By: bddppq

fbshipit-source-id: cef8e77a74ae2e5de10df816ea80b25cb2bab713
2019-09-10 00:11:09 -07:00
c47ccfd01d Enable variable size embedding (#25782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25782

Enable variable size embedding for dot processor. We split the embedding matrix into multiple towers, based on the embedding size and perform dot product in a loop over each of the towers and finally concatenate all the dot product outputs.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:
https://our.intern.facebook.com/intern/testinfra/testrun/3659174703037560

Specific unit tests --
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_per_feature_emb_dim
https://our.intern.facebook.com/intern/testinfra/testrun/3377699726358808

Reviewed By: chenshouyuan

Differential Revision: D16690811

fbshipit-source-id: 8f5bce5aa5b272f5f795d4ac32bba814cc55210b
2019-09-09 22:08:32 -07:00
2a917616a8 remove cosh_ op test (#25893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25893

as title

Test Plan: waitforsandcastle

Reviewed By: mingzhe09088

Differential Revision: D17278340

fbshipit-source-id: 81b7e8658d5919e865754ae4d834dc44494cb2e3
2019-09-09 20:34:35 -07:00
7ab4ad7b6d add torch.jit.is_scripting() api (#25263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25263

This adds an api to return true in script and false in eager, which together with ignore allows guarding of not yet supported JIT features. Bikeshedding requested please.

cc zou3519

```
def foo():
   if not torch.jit.is_scripting():
      return torch.linear(...)
   else:
      return addmm(...)
```

Test Plan: Imported from OSS

Differential Revision: D17272443

Pulled By: eellison

fbshipit-source-id: de0f769c7eaae91de0007b98969183df93a91f42
2019-09-09 20:24:36 -07:00
36bdde255e Fix test_det_logdet_slogdet_batched on PowerPC (#25773)
Summary:
Changelog:
- Simplify generation of singular matrices to just constructing a constant matrix instead of a random singular matrix using random_square_matrix_of_rank, which is susceptible to numerical issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25773

Test Plan:
- test_det_logdet_slogdet_batched should pass

Fixes https://github.com/pytorch/pytorch/issues/25172

cc: branfosj hartb

Apologies for the delay.

Differential Revision: D17261059

Pulled By: soumith

fbshipit-source-id: 8f991e2cb8c0e9dccad363d4785075213088e58a
2019-09-09 19:23:42 -07:00
b27bcda851 argument 't', mis referenced to 'torch.t()' (#25885)
Summary:
Instead, it referenced to `:func:t`. This is caused by a misuse of `:attr:`.

Fix https://github.com/pytorch/pytorch/issues/25834
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25885

Differential Revision: D17276501

Pulled By: soumith

fbshipit-source-id: 6485a628b0e169a8b4b8bd956d9de3686017f02e
2019-09-09 19:21:00 -07:00
79bcf6e5ba Test scripting and tracing for dynamic linear modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25870

Test Plan: Imported from OSS

Differential Revision: D17275747

Pulled By: jamesr66a

fbshipit-source-id: ed8eaf7e9af3127c987e56d17d60b52d039d5ae8
2019-09-09 19:00:35 -07:00
20204d1fe7 Fix c10 tracing (#25869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25869

The c10 code for tracing was not disabling tracing when calling the op like it should have. This caused really weird errors where we were recording tensors for ops called within a given c10 op implementation, and making tracing fail

Test Plan: Imported from OSS

Differential Revision: D17275748

Pulled By: jamesr66a

fbshipit-source-id: b4e89ae5a954a1f476c9e5b8bf405bdc621f0323
2019-09-09 19:00:32 -07:00
67281deec0 Fix missing str to int conversion in the commit f71ddd42 (#25861)
Summary:
Came up in internal testing w/ python 2.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25861

Differential Revision: D17261070

Pulled By: soumith

fbshipit-source-id: 412fe5e53ef4d8f2564d77dd17b480bb58cc391e
2019-09-09 18:30:41 -07:00
1777eb2ed9 fix typo: toDense --> to_dense #25706 (#25832)
Summary:
Only fixes a minor typo in [torch.sparse.FloatTensor docs](https://pytorch.org/docs/stable/sparse.html#torch.sparse.FloatTensor.toDense).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25832

Differential Revision: D17276700

Pulled By: soumith

fbshipit-source-id: cf3d550d5756b000a4e864170ecd4b31826b40f8
2019-09-09 18:27:03 -07:00
e8f316c024 SubgraphMatcher: add logging to a check missed previously.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25735

Test Plan: Imported from OSS

Differential Revision: D17216869

Pulled By: ZolotukhinM

fbshipit-source-id: 64431134dff63cb5e22fa70110ceecc56e9031e7
2019-09-09 18:21:47 -07:00
d7d3aedd2c Make various improvements to C++ API parity test harness (#25828)
Summary:
This PR makes the following improvements to C++ API parity test harness:
1. Remove `options_args` since we can get the list of options from the Python module constructor args.
2. Add test for mapping `int` or `tuple` in Python module constructor args to `ExpandingArray` in C++ module options.
3. Use regex to split up e.g. `(1, {2, 3}, 4)` into `['1', '{2, 3}', '4']` for `cpp_default_constructor_args`.
4. Add options arg accessor tests in `_test_torch_nn_module_ctor_args`.

We will be able to merge https://github.com/pytorch/pytorch/pull/24160 and https://github.com/pytorch/pytorch/pull/24860 after these improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25828

Differential Revision: D17266197

Pulled By: yf225

fbshipit-source-id: 96d0d4a2fcc4b47cd1782d4df2c9bac107dec3f9
2019-09-09 15:43:55 -07:00
115494b00b Cocoapods for iOS OSS release (#25847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25847

### Summary

The Podspec file for iOS OSS release. This podspec contains the C++ header files and a static library that supports three architectures.

Please ignore the link for `s.source` for now, as I'm still working on the CI nightly build. This is a temporary link for testing purpose.

### Note

Previously I have a cocoapods release proposal  -  https://github.com/pytorch/pytorch/pull/25543 which contains two podspec files. However, for the time being, we haven't decided whether we want to release the Objective-C API wrapper or not. Please review and refer to this one if you have questions.

Test Plan: Imported from OSS

Differential Revision: D17262459

Pulled By: xta0

fbshipit-source-id: 4cc60787a41beab14cf9b1c0e9ab62b8b14603c5
2019-09-09 14:50:03 -07:00
773b949a97 Remove NULL arguments that have been marked deprecated by rocBLAS (#25866)
Summary:
Silences compile-time warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25866

Differential Revision: D17265421

Pulled By: bddppq

fbshipit-source-id: 72c70a1aad655ff782f6e1dbb1002bc59b1eb9f3
2019-09-09 13:29:07 -07:00
001ba1c504 Clean up the iOS build script (#25822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25822

### Summary

Since protobuf has been removed from mobile, the `build_host_protoc.sh` can be removed from `build_ios.sh` as well. However, the old caffe2 mobile build  still depend on it, therefore, I introduced this `BUILD_PYTORCH_MOBILE` flag to gate the build.

- iOS device build

```
BUILD_PYTORCH_MOBILE=1 IOS_ARCH=arm64 ./scripts/build_ios.sh
BUILD_PYTORCH_MOBILE=1 IOS_ARCH=armv7s ./scripts/build_ios.sh
```

- iOS simulator build

```
BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh
```

### Test Plan

All device and simulator builds run successfully

Test Plan: Imported from OSS

Differential Revision: D17264469

Pulled By: xta0

fbshipit-source-id: f8994bbefec31b74044eaf01214ae6df797816c3
2019-09-09 11:59:50 -07:00
13292ec3c7 Add PR jobs for iOS builds (#25840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25840

### Summary:

The CI jobs for iOS builds are missing, this PR creates a workflow which will two PR jobs:

- pytorch_ios_10_2_1_x86_64_build
- pytorch_ios_10_2_1_arm64_build

### Note:

Those two jobs will not store any artifacts nor upload any binary files, which will be done in the next PR.

Test Plan:
- The jobs can be triggered by any PR.
- The jobs can be run successfully.

Differential Revision: D17255504

Pulled By: xta0

fbshipit-source-id: 5c56e85c7ccf6339a3e0ffd11eedd925f137adc8
2019-09-09 10:48:15 -07:00
378881e903 Enable log_softmax and CrossEntropyLoss for bfloat16 (#24457)
Summary:
Enabled torch.nn.functional.log_softmax and torch.nn.CrossEntropyLoss for bfloat16 data type.
In order to do that, following dependency have to be enabled.
- RNE (round to nearest even)
- AccumulateType
- bfloat16 arithmetic operator overload

Also, we implement std::numeric_limits fully support for bfloat16 data type

background for dependency:
- RNE vs truncate
From torch.nn.CrossEntropyLoss test. input_size=(128, 1000)
RNE result:
float    output:  tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>)
bfloat16 output:  tensor(7.3125, dtype=torch.bfloat16, grad_fn=<NllLossBackward>)
truncate result:
float    output:  tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>)
bfloat16 output:  tensor(5.8750, dtype=torch.bfloat16, grad_fn=<NllLossBackward>)

- scalar_t vs AccumulateType (AccumulateType of bfloat16 is float)
AccumulateType is essential to keep accuracy, especially for reduction related operation.
we have verified it with both local case and real topology. It turns out that bfloat16 type accumulator would cause huge relative error when elements number is large, even more than 50%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24457

Differential Revision: D17113018

Pulled By: ezyang

fbshipit-source-id: 8d61297ca118f9b5c6730a01efcf3a3704d2f206
2019-09-09 09:19:47 -07:00
c5accd1486 More accurately describe field invariants in OperatorEntry (#25793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25793

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17259049

Pulled By: ezyang

fbshipit-source-id: 03bf2f28bfd584250ae8feddf4933522ea331b0b
2019-09-09 09:02:57 -07:00
0eacd3cc5c Upgrade NVIDIA driver on CI to 430.40 (#24242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24242

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17162837

Pulled By: ezyang

fbshipit-source-id: 7bfa92eb151d13fd60cb525475056b363d1254f9
2019-09-09 08:59:45 -07:00
d1496183f5 Fix cuDnn build error with CC3.0 platform(#25820) (#25825)
Summary:
__ldg is only available for CC3.5 and above, add default implementation for CC3.0 platform.

This PR along with jcooky's PR of ecdf4564d4. make the pytorch master HEAD build and runs properly for CC3.0 platform(such as Retina MacBook Pro of Late 2013).

I test the mnist example from pytorch/examples with the wheel built, the test accuracy ends with 99% after 10 Epochs with GT 750M CC3.0 platform. CC3.0 platform decrease training time into about 1/5 of its cpu counterpart.

```
(pytorch) SamuelFdeMBP:mnist sfeng$ pip list | grep torch
pytorch-sphinx-theme          0.0.24               /Users/sfeng/GH/pytorch_110/docs/src/pytorch-sphinx-theme
torch                         1.3.0a0+a332583
torchvision                   0.5.0a0+0bd7080
(pytorch) SamuelFdeMBP:mnist sfeng$ date && time python main.py && date
日  9  8 07:17:38 CST 2019
/Users/sfeng/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/cuda/__init__.py:132: UserWarning:
    Found GPU0 GeForce GT 750M which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability that we support is 3.5.

  warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
Train Epoch: 1 [0/60000 (0%)]	Loss: 2.300039
......
Train Epoch: 10 [59520/60000 (99%)]	Loss: 0.007440

Test set: Average loss: 0.0322, Accuracy: 9895/10000 (99%)

real	2m39.962s
user	4m13.625s
sys	0m9.672s
日  9  8 07:20:18 CST 2019

(pytorch) SamuelFdeMBP:mnist sfeng$ date && time python main.py --no-cuda && date
日  9  8 07:20:40 CST 2019
Train Epoch: 1 [0/60000 (0%)]	Loss: 2.300039
Train Epoch: 1 [640/60000 (1%)]	Loss: 2.213470
Train Epoch: 1 [1280/60000 (2%)]	Loss: 2.170460
......
Train Epoch: 10 [58880/60000 (98%)]	Loss: 0.005681
Train Epoch: 10 [59520/60000 (99%)]	Loss: 0.007686

Test set: Average loss: 0.0319, Accuracy: 9894/10000 (99%)

real	12m6.604s
user	75m53.129s
sys	3m41.744s
日  9  8 07:32:47 CST 2019
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25825

Differential Revision: D17252176

Pulled By: soumith

fbshipit-source-id: 70bf84ae6380be86b56344b161a52fb06a53a1b2
2019-09-09 08:23:28 -07:00
4fac61a886 Fix typing on nn.Parameter (#25586)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25399

As per https://github.com/pytorch/pytorch/issues/25580 I'm pushing this to test my changes on the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25586

Differential Revision: D17259178

Pulled By: ezyang

fbshipit-source-id: d48cdd602bfda60c213f79a4f124df54a68ca698
2019-09-09 07:54:27 -07:00
f70ef229ce Back out "[Caffe2] Fix device_option propagation"
Summary: Original commit changeset: 916551b93346

Test Plan: none

Reviewed By: nairbv

Differential Revision: D17259017

fbshipit-source-id: f6e961e88c01126393ed2b6be0adeb6fcc68cb3c
2019-09-09 07:22:42 -07:00
97b432bdf0 Back out "[pytorch][PR] remove tools/setup_helpers/cudnn.py"
Summary:
Original commit changeset: abd9cd0244ca

(Note: this ignores all push blocking failures!)

Test Plan: none

Reviewed By: nairbv

Differential Revision: D17259003

fbshipit-source-id: d7e067eeb36192766c639bfcbc66f540ce8eb77e
2019-09-09 06:47:45 -07:00
4299faa10b Fix invalid function cast warnings that show up with GCC 8/9 (#25483)
Summary:
Fixes ~5000 lines of warnings like:

```
In file included from ../aten/src/TH/TH.h:4,
                 from ../torch/csrc/Storage.cpp:11:
../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject* (*)(THPStorage*)’ {aka ‘_object* (*)(THPStorage*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                                       ^~~
caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
  154 | #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
      |                                     ^
../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                           ^~~~~~~~~~~
../torch/csrc/generic/Storage.cpp:299:22: note: in expansion of macro ‘THPStorage_’
  299 |   {"device", (getter)THPStorage_(device), nullptr, nullptr, nullptr},
      |                      ^~~~~~~~~~~
../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject* (*)(THPStorage*)’ {aka ‘_object* (*)(THPStorage*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                                       ^~~
caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
  154 | #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
      |                                     ^
../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                           ^~~~~~~~~~~
```

This issue and the fix is very similar to how CPython fixed it, see https://bugs.python.org/issue33012.

There's still more of these warnings left, but this fixes the majority of them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25483

Differential Revision: D17149824

Pulled By: ezyang

fbshipit-source-id: 353560a4f76070fa7482608e9532b60205d16798
2019-09-09 06:35:11 -07:00
bf4a28175d Retry connecting to TCP store on ECONNRESET (#25707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25707

The retry logic dealt with ECONNREFUSED to deal with the client being
started before the server. It didn't yet deal with the server being
started but having its listen backlog exhausted. This may happen when
starting many processes that all try to connect at the same time.

The server implementation uses blocking I/O to read and write entire
messages, so it may take a bit longer to call `accept(2)` on new
connections compared to a fully event driven approach.

This commit both increases the default listen backlog on the server
side and implements retries on ECONNRESET after `connect(2)`.

Test Plan: Imported from OSS

Differential Revision: D17226958

Pulled By: pietern

fbshipit-source-id: 877a7758b29286e06039f31b5c900de094aa3100
2019-09-09 02:54:20 -07:00
73855ecd43 fix cudnn static linkage (#25848)
Summary:
Fix regression caused by https://github.com/pytorch/pytorch/pull/24938

This fixes CUDA nightly breakages
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25848

Differential Revision: D17256348

Pulled By: soumith

fbshipit-source-id: dded577717947d0f092e9d76b423b2bc7c56070a
2019-09-08 21:41:57 -07:00
74fa53995d Fix assertion if NamedTensorMeta's num_names != tensor.dim (#25778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25778

I don't know how this ever compiled, it was caught by an internal test.
Do we not set DEBUG when compiling in debug mode in OSS?

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17228393

Pulled By: zou3519

fbshipit-source-id: 441ad716a369ee99be4723318cf78e394f98becf
2019-09-08 13:49:20 -07:00
294cf096bf Name inference for unbind (#25585)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25585

Test Plan:
- new tests [namedtensor ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25585

Differential Revision: D17185070

Pulled By: zou3519

fbshipit-source-id: 85512b194f5b7c62a00aa81d048b5351e098bdb0
2019-09-08 11:35:58 -07:00
d7a1152ee9 Fix error message stack overflow (#25146)
Summary:
When the given input size is larger than expected, `weight_sizes` is `k`-length but only has `weight_dim` numbers. And it causes the confusing error message:
```
RuntimeError: Expected 4-dimensional input for 4-dimensional
weight 256 5 3 3 3987964488216321853 94670871813000,
but got 6-dimensional input of size [1, 61, 1, 5, 64, 64] instead
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25146

Differential Revision: D17233651

Pulled By: soumith

fbshipit-source-id: c6ddfa45e854f9b95ca253052f8bc358e35fd9d4
2019-09-07 22:47:55 -07:00
825f4714f9 Fork QNNPACK into aten/src/ATen/native/quantized/cpu/qnnpack (#25500)
Summary:
The motivation for this move, and our long-term commitment to maintaining and integrating this code into ATen is described in the issue below:

https://github.com/pytorch/pytorch/issues/25621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25500

Test Plan:
QNNPack unit tests, as follows:
OSS:
x86:
mkdir build; cd build; cmake ..; make all -j16 && make test
All 26 unit tests pass, both when built with ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=0) and ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1)
ARM:
Make sure you have an android device available to adb either through one world or directly connected.
To compile and push do
$> adb shell mkdir /data/qnnpack && ./scripts/build-android-arm64.sh && adb push ./build/android/arm64-v8a/*-test /data/qnnpack
To execute tests, first $> adb shell to login into the device, then run all the tests by
$> for t in $(ls /data/qnnpack); do /data/qnnpack/$t; done
Repeat the exact same process with ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=0), and ADD_DEFINITIONS(-DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1)
Repeat the exact same process with ./scripts/build-android-armv7.sh for AARCH32.

Reviewed By: ljk53

Differential Revision: D17194732

Pulled By: AshkanAliabadi

fbshipit-source-id: 9e627338ebd63aa917a36b717618c0643ccf40c8
2019-09-07 16:45:30 -07:00
45bfa6a5c6 Fix missing newline in compiled from source range highlihgt (#25802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25802

Test script

```
import torch

def foo(x, y):
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    return x

scripted = torch.jit.script(foo)
scripted.save('foo.zip')

loaded = torch.jit.load('foo.zip')
loaded(torch.rand(3, 4), torch.rand(4, 5))
```

Before this change
```
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
The above operation failed in interpreter, with the following stack trace:
at code/__torch__.py:7:9
op_version_set = 1
class PlaceholderModule(Module):
  __parameters__ = []
  def forward(self: __torch__.PlaceholderModule,
    x: Tensor,
    y: Tensor) -> Tensor:
    x0 = torch.add(x, y, alpha=1)
         ~~~~~~~~~ <--- HERE
    x1 = torch.add(x0, y, alpha=1)
    x2 = torch.add(x1, y, alpha=1)
    x3 = torch.add(x2, y, alpha=1)
    x4 = torch.add(x3, y, alpha=1)
    x5 = torch.add(x4, y, alpha=1)
    x6 = torch.add(x5, y, alpha=1)
    x7 = torch.add(x6, y, alpha=1)
    x8 = torch.add(x7, y, alpha=1)
    x9 = torch.add(x8, y, alpha=1)Compiled from code at /home/jamesreed/print_test.py:5:8
def foo(x, y):
    x = x + y
        ~~~~~ <--- HERE
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
```

After this change
```
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
The above operation failed in interpreter, with the following stack trace:
at code/__torch__.py:7:9
op_version_set = 1
class PlaceholderModule(Module):
  __parameters__ = []
  def forward(self: __torch__.PlaceholderModule,
    x: Tensor,
    y: Tensor) -> Tensor:
    x0 = torch.add(x, y, alpha=1)
         ~~~~~~~~~ <--- HERE
    x1 = torch.add(x0, y, alpha=1)
    x2 = torch.add(x1, y, alpha=1)
    x3 = torch.add(x2, y, alpha=1)
    x4 = torch.add(x3, y, alpha=1)
    x5 = torch.add(x4, y, alpha=1)
    x6 = torch.add(x5, y, alpha=1)
    x7 = torch.add(x6, y, alpha=1)
    x8 = torch.add(x7, y, alpha=1)
    x9 = torch.add(x8, y, alpha=1)
Compiled from code at /home/jamesreed/print_test.py:5:8
def foo(x, y):
    x = x + y
        ~~~~~ <--- HERE
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
    x = x + y
```

Test Plan: Imported from OSS

Differential Revision: D17250599

Pulled By: jamesr66a

fbshipit-source-id: 56266dcbf2c2287dc8ced7b9463ed42ef5f1167c
2019-09-07 14:38:53 -07:00
1c81d9006a increase input shape to reduce variance (#25812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25812

as title

Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of the output P109238440

Reviewed By: mingzhe09088

Differential Revision: D17246792

fbshipit-source-id: d93ee5f404164d32210968997c6ea63b82058d2a
2019-09-07 06:25:26 -07:00
a332583c59 Quick fixes for named tensor for windows (#25728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25728

Two quick fixes:
1) windows doesn't seem to like std::locale, so that got removed.
2) at::empty should call the non-named-tensor overload if the tensor
doesn't have names to avoid re-dispatching. In the long term we'll merge
the at::empty names and no-names overloads.

Test Plan
- [namedtensor ci], but the windows thing isn't easy to test without
running BUILD_NAMEDTENSOR=1 on windows.

Test Plan: Imported from OSS

Differential Revision: D17212059

Pulled By: zou3519

fbshipit-source-id: 58da5ab96d53c4844237ca10fa1b2de4b1052a0c
2019-09-06 21:59:20 -07:00
6257c8d634 Add flatten for named tensors. (#25672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25672

There are three overloads:
1) flatten(tensor, int start_dim, int end_dim, Dimname out_dim)
2) flatten(tensor, Dimname start_dim, Dimname end_dim, Dimname out_dim)
3) flatten(tensor, DimnameList dims, Dimname out_dim)

`flatten` joins all the dimensions between start_dim and end_dim into
one dimension. The name of the output dimension is specified by
`out_dim`.

In the case where flatten takes a list of `dims` to flatten, all the
dimensions in `dims` must be in consecutive order.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17192656

Pulled By: zou3519

fbshipit-source-id: 55d2b23358bd77cbef299f66701a8da8cd194f4f
2019-09-06 21:16:44 -07:00
bc6eec1db8 Factor unnecesary work out of add inner loop (#25751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25751

This PR does several things:
1) Factor unnecessary scale inversion out of quantize function in the inner loop. This saves cycles in the inner kernel (unfortunately the compiler couldn't hoist it out automatically for some reason)
2) Use FMA in the dequantize routine when possible. This also necessitates having the user pass in a pre-multiplied (scale * -zero_point) vector.

Benchmark Script
```
import torch
import time

x = torch.rand(1, 256, 56, 56)
y = torch.rand(1, 256, 56, 56)

print('dtype', 'ms/iter (float)', 'ms/iter (quant)', 'quant / float', sep='\t')

for dtype in [torch.quint8, torch.qint8, torch.qint32]:
    qX = torch.quantize_linear(x, 0.1, 5, dtype).permute([0, 3, 1, 2])
    qY = torch.quantize_linear(y, 0.1, 5, dtype).permute([0, 3, 1, 2])

    _x = x.permute([0, 3, 1, 2])
    _y = y.permute([0, 3, 1, 2])

    NITER = 10000

    # Test float
    s = time.time()
    for i in range(NITER):
        _x + _y
    elapsed_float = time.time() - s
    ms_per_iter_float = elapsed_float / NITER * 1000

    # Test quantized
    s = time.time()
    for i in range(NITER):
        torch.ops.quantized.add(qX, qY, 0.1, 5)
    elapsed = time.time() - s
    ms_per_iter = elapsed / NITER * 1000

    print(str(dtype), ms_per_iter_float, ms_per_iter, ms_per_iter / ms_per_iter_float, sep='\t')
    print('float gbps', 'quant gbps', sep='\t')
    print((x.numel() + 2 * y.numel()) * x.element_size() / ms_per_iter_float / 1e6,
          (qX.numel() + 2 * qX.numel()) * qX.element_size() / ms_per_iter / 1e6,
          sep = '\t')

```

Before this change
```
dtype	ms/iter (float)	ms/iter (quant)	quant / float
torch.quint8	0.47297704219818115	0.1909616231918335	0.403743958278252
float gbps	quant gbps
20.368413560257675	12.612209509659206
torch.qint8	0.4638909578323364	0.18829500675201416	0.40590359344764254
float gbps	quant gbps
20.767363185988053	12.79082245219568
torch.qint32	0.4605833768844605	4.219791603088379	9.161840862847583
float gbps	quant gbps
20.916499560114787	2.2830018413585225

```

After this change
```
dtype	ms/iter (float)	ms/iter (quant)	quant / float
torch.quint8	0.465389084815979	0.1516613483428955	0.3258807593282176
float gbps	quant gbps
20.70051128038237	15.880433784319726
torch.qint8	0.4630591154098511	0.15664465427398683	0.3382821956443757
float gbps	quant gbps
20.804669812996085	15.375232631861083
torch.qint32	0.4726278781890869	4.103795266151429	8.682931023610927
float gbps	quant gbps
20.38346116380751	2.347532314650444

```

Test Plan: Imported from OSS

Differential Revision: D17222302

Pulled By: jamesr66a

fbshipit-source-id: fffc819f565dfd3b85fb6496c7c6635ec2c237a4
2019-09-06 19:27:43 -07:00
03d4198a67 Use more efficient specialized Quantize routine (#25731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25731

I didn't notice this before, but the QuantizeAvx2 routine was requantizing only a single vector of 8 floats into 1/4 of a 256-bit int8 register. This switches it to use a specialization that goes from 4 float vectors into a whole int8 vector, borrowed from C2

Test Plan: Imported from OSS

Differential Revision: D17214413

Pulled By: jamesr66a

fbshipit-source-id: 1d6fc556e43739e9a4b0dba5df2332beb1b3795b
2019-09-06 19:27:39 -07:00
bd0e564d40 Fix device_option propagation (#25203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25203

device_option propagation is completely broken in Caffe2 for cases when pass
through operators are used. As an example Gather operator don't have gradient
and passes through it's inputs, which results in incorrect detection of the
components for sparse parameter aggregation (component will be empty instead of
the real device).

This diff is trying to fix this issue.

Test Plan:
net_transform is finally working with Gather + FloatToHalf transformed model
instead of failing because of incorrect number of components.

Reviewed By: dzhulgakov

Differential Revision: D16936041

fbshipit-source-id: 916551b933469f04e32ddf86ec4b2c07f76c9176
2019-09-06 19:05:04 -07:00
a9e56c2e68 Make Python RPC handler does not hold module in global variable (#25458)
Summary:
# Problem

ProcessGroupAgent used in test_rpc has SIGSEGV on exiting.

# Solution

It was because the python module was unpexceted destructed twice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25458

Test Plan: Run prototype tests on top of this diff.

Differential Revision: D17127093

Pulled By: xush6528

fbshipit-source-id: 4b86cd8465e8cca6fce1c163e78160a2386fa9c3
2019-09-06 17:35:21 -07:00
17c1b2c715 Relax scale to prevent saturation in conv/linear. Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in (#25667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25667

Relax scale and zero-point for activations to ensure that fbgemm implementations of conv and linear do not saturate due to 16 bit intermediate accumulation.

Add test to verify precision of numerics of quantized model with updated observer. This test catches errors in
handling layouts for quantized ops in addition to saturation/quantization errors.
ghstack-source-id: 89587942

Test Plan:
buck test caffe2/test:quantized -- 'test_float_quant_compare \(test_quantized_models\.ModelNumerics\)' --print-passing-details

Passes when SQNR > 35 dB

buck test caffe2/test:quantization -- 'test_minmax_observer \(test_quantization\.ObserverTest\)' --print-passing-details
Passes with additional coverage for observer changes

Differential Revision: D17140498

fbshipit-source-id: 42c58e726bb0b0f51890590ee2525428f9a8d24e
2019-09-06 17:18:01 -07:00
5d7fff5d03 Fixed nondeterministic RG for ORT RNN tests (#25205)
Summary:
Relaxing tolerance for ORT RNN tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25205

Reviewed By: BIT-silence

Differential Revision: D17238862

Pulled By: houseroad

fbshipit-source-id: 8d55b23a6a5c7edfe5998592ddc51e0ae2c5bbf7
2019-09-06 16:35:43 -07:00
75cac0fe69 expose parse_schema and __eq__ function to python and add round trip tests (#23208)
Summary:
expose necessary functions to python, and add round-way tests for
function schema str() and parsing functions.
We iterate over all the registered function schemas and get the string,
then parse the string. We compare the schema generated from parsing with
the original one, and make sure they are equal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23208
ghstack-source-id: 89638026

Test Plan: buck test //caffe2/test:function_schema

Reviewed By: zrphercule

Differential Revision: D16435471

fbshipit-source-id: 6961ab096335eb88a96b132575996c24090fd4c0
2019-09-06 15:50:56 -07:00
f2f804dccc Move BUILD_NAMEDTENSOR in NamedTensorUtils.h (#25781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25781

To prepare for removing the BUILD_NAMEDTENSOR flag, I am attempting to
remove BUILD_NAMEDTENSOR out of header areas.

Test Plan:
- [namedtensor ci]
- Tested building locally with USE_STATIC_DISPATCH=1. Previously, in
https://github.com/pytorch/pytorch/pull/25721, this change had caused a
dependency cycle while building with that on.

Differential Revision: D17229490

Pulled By: zou3519

fbshipit-source-id: 22fbd5e2770374ab321c13542fa321a2bf7d3101
2019-09-06 15:33:20 -07:00
2fe8341aac Map module options between Python and C++ in API parity test (#25784)
Summary:
`torch.nn` modules in Python save their kwarg options directly as module object attributes, while `torch::nn` modules in C++ save their options inside the `options` field of the module object. This PR tries to map between these two (by using the newly added `options_args` list to discover options arguments in Python module), to make sure options equivalence is properly checked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25784

Differential Revision: D17238609

Pulled By: yf225

fbshipit-source-id: 2febd277ddcbe3ab458ac3feaaf93e4c94bb5b98
2019-09-06 15:30:36 -07:00
c5a0de23e2 Fix empty graph problem (#25599)
Summary:
This fixes the empty graph problem since pytorch 1.2

To prevent such things happen, we have to make the test harder.

There 3 levels of verification.
lv 1. make sure that the graph is saved to some event file.  <--currently here
lv 2. make sure the file can be read by tensorboard.
lv 3. make sure the graph in tensorboard is human-friendly.

I think (3) must be involved by a human.
(2) is possible, but it will be useless if we want to use lv 3 directly.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25599

Reviewed By: sanekmelnikov

Differential Revision: D17229276

Pulled By: orionr

fbshipit-source-id: b39f2f1805ee0b3a456b2c69d97e6e3622f5220e
2019-09-06 14:24:28 -07:00
c9e8dcb706 Change worker name constrant (#25780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25780

support "trainer:0", "server:1" format

Test Plan:
# Unit tests

```
buck test mode/dev-nosan caffe2/test:rpc
```

Differential Revision: D17228907

fbshipit-source-id: a6e759f4364548454ab0f2907707e738997bbf38
2019-09-06 13:53:50 -07:00
2bb166edb4 Revert D17228224: [pytorch][PR] add torch.nn.Identity to __init__.pyi.in
Test Plan: revert-hammer

Differential Revision:
D17228224

Original commit changeset: a8d36240892b

fbshipit-source-id: a21002b55305f2a03f1f4ba44a7cff6cb9f66c51
2019-09-06 13:15:05 -07:00
ec3793362f Documentation change of torch.where (#25554)
Summary:
Change the doc of torch.where. The parameters are x and y instead of input and other
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25554

Differential Revision: D17227193

Pulled By: soumith

fbshipit-source-id: 96d8a6f60ae8e788648247320ae715d0058de2b4
2019-09-06 12:55:16 -07:00
748436a514 Enable BLIS from the FLAME project as a BLAS choice. (#23819)
Summary:
BLIS is AMD's official recommendation for BLAS.

Mimicks my ancient
f5bc78263e
in cmake upstream

BLIS WWW: https://github.com/flame/blis
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23819

Differential Revision: D17231360

Pulled By: bddppq

fbshipit-source-id: 68db70d63e410438f99b2bf57986b81ff6b6c5b3
2019-09-06 12:00:25 -07:00
7970e5720b Rename tensor.view_names -> tensor.renamed (#25711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25711

This function renames the dimensions of a tensor out-of-place. Because
of that, I think `tensor.renamed(...)` is a clearer name: `view_names`
has the connotation that we can use names to `view` our tensors with a
"different shape", but what this function really does is let us rename a
tensor no matter the previous names.

`tensor.names_`, the in-place version of this, is unchanged for now.
However, we might delete this or not advertise it if it has no use case
and also because its naming is a little inconsistent with `tensor.renamed`.

Test Plan: - [namedtensor ci]

Differential Revision: D17206515

Pulled By: zou3519

fbshipit-source-id: 67053951fcc8130c84566b5ebbdce35ef619c90d
2019-09-06 11:28:04 -07:00
3c6009e6f1 derandomize hypothesis tests (#25513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25513

Randomized tests are flaky, this PR derandomized some of them

Test Plan:
python test/test_fake_quant.py
python test/test_quantized_nn_mods.py

Imported from OSS

Differential Revision: D17221273

fbshipit-source-id: f6978704ba0139071c26f443e923955a2f849832
2019-09-06 10:53:05 -07:00
a41ff31702 Correctly gate __CUDA_ARCH__ with defined() (#25729)
Summary:
Undefined preprocessor macros being evaluated cause
errors on some compilers/configs. There is an ungated define in caffe2
which is inconsistent with the rest of the file and should be
fixed anyway because it's causing issues in ovrsource.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25729

Test Plan: contbuilds

Differential Revision: D17211552

Pulled By: akrieger

fbshipit-source-id: 499b123894b255f37ff68079c4ba3650b1599a5c
2019-09-06 09:42:15 -07:00
511d1875c5 add torch.nn.Identity to __init__.pyi.in (#25777)
Summary:
I fixed https://github.com/pytorch/pytorch/issues/25694. Check it, please.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25777

Differential Revision: D17228224

Pulled By: ezyang

fbshipit-source-id: a8d36240892bcb7e669b8dce38419ff3fc9e9afd
2019-09-06 09:27:49 -07:00
5e372862dc Use constructor in test_params for C++ API parity test (#25749)
Summary:
This PR changes the C++ API parity test script so that `test_params` such as the following is understood:
88e4cee3e7/test/common_nn.py (L2194-L2200)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25749

Differential Revision: D17227867

Pulled By: yf225

fbshipit-source-id: 03a8e17d233931ba0b38f75e9b75b0c09b98ed08
2019-09-06 08:57:40 -07:00
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
9d2d31e626 Store bias in PackedLinearWeight struct in fbgemm (#25428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25428

Added bias as an optional param to the quantized_linear_prepack function.
Bias is quantized during runtime using input scale and weight scale.
ghstack-source-id: 89601399

Test Plan: python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer

Differential Revision: D17121304

fbshipit-source-id: 8adb0e55e4aed0a5430aaa2c8639c8ad1639c85a
2019-09-06 08:37:34 -07:00
4c7189d0f4 fix OSS mobile CI (#25755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25755

PR #25721 breaks mobile CI (with USE_STATIC_DISPATCH=1) due to circular
header dependency.
Move 'ATen/core/Tensor.h' back into '#ifdef BUILD_NAMEDTENSOR' to work
around the CI issue.

Test Plan: - build android library locally

Differential Revision: D17223997

Pulled By: ljk53

fbshipit-source-id: d8b5fd26e332953f1b592758fc76947ea2af94dc
2019-09-06 08:12:28 -07:00
3e843115c0 Use whitelist instead of blacklist for USE_DISTRIBUTED (#25759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25759

In #25260, USE_DISTRIBUTED was defaulted to OFF for Windows and macOS
only. The Android builds didn't run for the PR and started to fail
when it was merged to master. It turns out the mobile builds
explicitly disable USE_DISTRIBUTED but only after the USE_DISTRIBUTED
option, and derivative dependent options were defined. The result
being that USE_GLOO was enabled while USE_DISTRIBUTED was disabled.

This commit ensures that USE_DISTRIBUTED defaults to OFF unless the
build is for a supported platform.
ghstack-source-id: 89613698

Test Plan: N/A

Differential Revision: D17224842

fbshipit-source-id: 459039b79ad5240e81dfa3caf486858d6e77ba4b
2019-09-06 07:53:44 -07:00
66ac6698f6 remove tools/setup_helpers/cudnn.py (#25482)
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25482

Differential Revision: D17226408

Pulled By: ezyang

fbshipit-source-id: abd9cd0244cabea1f5d9f93f828d632d77c8dd5e
2019-09-06 06:54:35 -07:00
d95763b4dc Enable loading int8 prepacked models in PredictorContainer
Summary: To test the int8 ads models on CPU and accelerators with the ads replayer, we need to load the PREPACKING_INIT_NET_TYPE in the int8 model to initialize the int8 w_packed blobs.

Test Plan:
Ads replayer test.

P74811059

Reviewed By: zrphercule

Differential Revision: D16518888

fbshipit-source-id: cee212710ad37d9e491c970b25b2fe484373e5e4
2019-09-06 02:53:52 -07:00
cc4211069e Do not pass down USE_GLOO_IBVERBS to CMake (#25720)
Summary:
It doesn't seem to be used anywhere once down to CMake in this repo or any submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25720

Differential Revision: D17225088

Pulled By: pietern

fbshipit-source-id: a24b080e6346a203b345e2b834fe095e3b9aece0
2019-09-06 02:40:42 -07:00
d47ced49ad Adds a -m flag to pytorch.distributed.launch (#24910)
Summary:
Adds a '-m' flag to torch.distributed.launch that allows users to launch python modules using launch instead of specifying the full file path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24910

Differential Revision: D17221653

Pulled By: pietern

fbshipit-source-id: 5c6453ed266fd121103b11caab303e3f9404227d
2019-09-06 01:13:44 -07:00
2bed201190 remove caffe2.pb.h dependency for embedding_lookup_idx.cc (#25670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25670

This is part of the effort to get rid of protobuf dependency for
libtorch mobile build.

embedding_lookup_idx.cc is used by ATen/EmbeddingBag.cpp. It indirectly
includes caffe2.pb.h but doesn't really need it. Clean up the headers to
unblock no-protobuf mobile build.

The broader problem is that many common headers in pytorch/caffe2 directly
or indirectly include caffe2.pb.h. After landing the stack of changes to
remove protobuf from OSS libtorch mobile build, it's going to constraint
how ATen and other parts of pytorch use caffe2 components: it will break
OSS mobile CI if a PR introduces a dependency to a caffe2 file that
indirectly includes caffe2.pb.h. We will need to tease out caffe2.pb.h
dependencies like in this diff, or do a refactor to replace protobuf
generated types.

Chatted with gchanan and ezyang to confirm that there is no plan to
add more dependencies to caffe2 components from ATen in near future,
so this should be fine.

Test Plan: - build locally with stacked diffs

Differential Revision: D17191913

Pulled By: ljk53

fbshipit-source-id: 1248fe6424060a8bedcf20e73942b7500ae5e815
2019-09-06 00:54:36 -07:00
a6fb6e1fb3 Expose an API to iterate all the registered operators (#23207)
Summary:
So we can iterate over the operator registry, and check the backward compatiblity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23207
ghstack-source-id: 89570438

Test Plan: ci and the round trip tests added in the last diff

Reviewed By: zrphercule

Differential Revision: D16434335

fbshipit-source-id: 86a66d746a1f122a8aafe39e936606d6ba7ef362
2019-09-05 21:47:44 -07:00
21ba9b3c6d Copy quantize routine to vec256 (#25685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25685

This saves a bunch of dynamic linking/function call overhead

Benchmark script

```
import torch
import time

x = torch.rand(1, 256, 56, 56)
y = torch.rand(1, 256, 56, 56)

print('dtype', 'ms/iter (float)', 'ms/iter (quant)', 'quant / float', sep='\t')

for dtype in [torch.quint8, torch.qint8, torch.qint32]:
    qX = torch.quantize_linear(x, 0.1, 5, dtype).permute([0, 3, 1, 2])
    qY = torch.quantize_linear(y, 0.1, 5, dtype).permute([0, 3, 1, 2])

    _x = x.permute([0, 3, 1, 2])
    _y = y.permute([0, 3, 1, 2])

    NITER = 1000

    # Test float
    s = time.time()
    for i in range(NITER):
        _x + _y
    elapsed_float = time.time() - s
    ms_per_iter_float = elapsed_float / NITER * 1000

    # Test quantized
    s = time.time()
    for i in range(NITER):
        torch.ops.quantized.add(qX, qY, 0.1, 5)
    elapsed = time.time() - s
    ms_per_iter = elapsed / NITER * 1000

    print(str(dtype), ms_per_iter_float, ms_per_iter, ms_per_iter / ms_per_iter_float, sep='\t')
```

Before this change (DynDisp to AVX2)
```
dtype   ms/iter (float) ms/iter (quant) quant / float
torch.quint8    0.47539472579956055     0.5174136161804199      1.0883873717996941
torch.qint8     0.46573758125305176     0.5322310924530029      1.1427703365080666
torch.qint32    0.47144651412963867     4.043398380279541       8.576579228174513
```

After this change (DynDisp to AVX2)

```
dtype   ms/iter (float) ms/iter (quant) quant / float
torch.quint8    0.48140883445739746     0.3396260738372803      0.705483675263412
torch.qint8     0.4651052951812744      0.3467671871185303      0.7455670591395397
torch.qint32    0.4986207485198975      4.015796899795532       8.053810259031533
```

Test Plan: Imported from OSS

Differential Revision: D17199438

Pulled By: jamesr66a

fbshipit-source-id: d518500c2b5f4e3a202d9ebc2a5862b4062ef118
2019-09-05 21:43:32 -07:00
f7bcba33a6 Vectorized specialization of max_pool2d for channels-last layout (#25676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25676

This PR achieves two things:

1) Ensures the channels-last layout is propagated through the operator if we receive an input in that layout. This helps to alleviate unnecessary data movement in, e.g. ResNet inference
2) Applies interleaved vectorization along the channel dimension in the kernel. This allows us to use the functional units on the CPU much more effectively.

Benchmark script

```
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_linear(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.max_pool2d(x, kernel_size=3, stride=None, padding=0, dilation=1)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.max_pool2d(q_x, kernel_size=3, stride=None, padding=0, dilation=1)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_linear(float_out, 0.5, 1, dtype)
    torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

```

Before this change (DynDisp to AVX2)

```
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
5.197856426239014       1.2381434440612793      0.23820270175433766
GB/s float      GB/s quant
0.6816348335661166      0.7153936841878243
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
5.14232873916626        1.1790156364440918      0.2292765974808621
GB/s float      GB/s quant
0.6889952353715999      0.7512707826941549
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
4.918942451477051       3.401169776916504       0.6914432950715265
GB/s float      GB/s quant
0.7202849057394649      1.041712185038912
```

After this change (DynDisp to AVX2)

```
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
5.0574493408203125      0.018107891082763672    0.0035804394394243393
GB/s float      GB/s quant
0.700558673203699       48.915690731270566
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
4.984829425811768       0.016908645629882812    0.0033920209069399163
GB/s float      GB/s quant
0.7107645412406512      52.38503540665539
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
4.973354339599609       0.13938188552856445     0.028025729922108406
GB/s float      GB/s quant
0.7124044976624851      25.419658993448625
```

Test Plan: Imported from OSS

Differential Revision: D17196457

Pulled By: jamesr66a

fbshipit-source-id: 614be60ed74bed5d0369c58cc450b430cfabe5fb
2019-09-05 21:43:28 -07:00
ed64338297 Make tensor key in Dict works in serialization (#25442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25442

This make the tensor key in dict works in serialzation by comparing the
tensor keys TensorImpl address directly. Given that we just want to
ensure the ordering be stable when iterating, it should be good enough,
we will need careful consideration if we want to stick with python 3.7
insertion order

Test Plan: Imported from OSS

Differential Revision: D17216377

fbshipit-source-id: 80df17dc2fa9eddd73a66e3979d7f8d7934660c0
2019-09-05 20:20:17 -07:00
d939ee2d85 Migrate digamma and polygamma from the TH to Aten (CUDA) (#25662)
Summary:
Close https://github.com/pytorch/pytorch/issues/24550
Close https://github.com/pytorch/pytorch/issues/24612
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25662

Differential Revision: D17205206

Pulled By: ifedan

fbshipit-source-id: 625602ff88940e11e4f7d63bb8950754427b4242
2019-09-05 19:49:33 -07:00
38e4766349 Add CosineAnnealingWarmRestarts to optim documentation (#25421)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20028.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25421

Differential Revision: D17221542

Pulled By: soumith

fbshipit-source-id: 9c83c9ad6bf34ba59713c61485e4ef4b782a2792
2019-09-05 19:06:18 -07:00
88e4cee3e7 Improve handling of mixed-type tensor operations (#22273)
Summary:
Improve handling of mixed-type tensor operations.

This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops).

For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts.

The details of the promotion rules are described here:
https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst

Some specific backwards incompatible examples:
* now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)`
* Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result.

See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR:
https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273

Reviewed By: gchanan

Differential Revision: D16582230

Pulled By: nairbv

fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3
2019-09-05 18:26:09 -07:00
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
82c8949a9d add __getitem__ to class types (#25664)
Summary:
Add magic method for `class_type[index]`. Since the compiler has custom logic for indexing this was not included with the other magic methods.

Fix for https://github.com/pytorch/pytorch/issues/25637
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25664

Differential Revision: D17214996

Pulled By: eellison

fbshipit-source-id: bf77f70851f6c3487147da710cc996624492a0c8
2019-09-05 17:19:15 -07:00
76bc44fb30 Move most BUILD_NAMEDTENSOR macros out of header areas (#25721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25721

Context: I am starting to work on removing the BUILD_NAMEDTENSOR flag.
Here is the approach:
- Move the macro out of header areas
- Include a new `enable_namedtensor.h` header that does a `#ifndef
BUILD_NAMEDTENSOR #define BUILD_NAMEDTENSOR`.
- Include `enable_namedtensor.h` where necessary. This only really needs
to happen in two files (c10/TensorImpl.h, ATen/Dimname.h).
- Incrementally delete usages of the BUILD_NAMEDTENSOR macro later.

The alternative is to straight up delete all instances of
BUILD_NAMEDTENSOR. This alternative could be disruptive, lead to merge
conflicts, and isn't incremental.

Along with the above, some work needs to be done on feature flagging
named tensors, and merging the namedtensor CI with the regular CI, and
communicating with devs. This work will too be done incrementally.

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17210913

Pulled By: zou3519

fbshipit-source-id: c73f128b976bb90212639e8f2a3ad2a6a52b8e0c
2019-09-05 17:15:44 -07:00
0be29ee2ba Finish testing code examples in the docs (#25668)
Summary:
All of the code examples should now run as unit tests, save for those
that require interaction (i.e. show `pdb` usage) and those that use
CUDA.

`save` had to be moved before `load` in `jit/__init__.py` so `load`
could use the file generated by `save`
](https://our.intern.facebook.com/intern/diff/17192417/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25668

Pulled By: driazati

Differential Revision: D17192417

fbshipit-source-id: 931b310ae0c3d2cc6affeabccae5296f53fe42bc
2019-09-05 16:13:37 -07:00
c6dd4036f5 Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25724

Differential Revision: D17212373

Pulled By: bddppq

fbshipit-source-id: 2978bc13cdcd0e96a82c0019a08b589f67c0fe1d
2019-09-05 16:10:56 -07:00
1559c64417 Cyclical learning rate multiplier: use fabs(base_lr) (#25628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25628

We figured that base_lr is negative in learning_rate_functors.h. So using fabs(base_lr) for cyclical learning rate multiplier
computation.

Test Plan: Canary: f135306794

Reviewed By: chenshouyuan

Differential Revision: D17167635

fbshipit-source-id: e7fb55835f9fc07712edd63e81f1cf355e05b9f4
2019-09-05 15:53:54 -07:00
11eb8ac2a9 Revert D17199043: [JIT] preserve ignored function return value type
Test Plan: revert-hammer

Differential Revision:
D17199043

Original commit changeset: 1196fd94c207

fbshipit-source-id: 49789ae1f128262bc40a9d5b0d2b7bfbbf0b7e1e
2019-09-05 15:51:06 -07:00
a294e157cb Align AliasInfo's operator<< with FunctionSchema (#23206)
Summary:
old (a)
new (a! -> b)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23206
ghstack-source-id: 89570435

Test Plan: cont build and the round trip tests in the last diff

Reviewed By: zrphercule

Differential Revision: D16433909

fbshipit-source-id: b5b018e839935cccbb1fb446070afd1cb9379bb1
2019-09-05 15:47:44 -07:00
ce3b81fdf3 Only default USE_DISTRIBUTED=True on Linux (#25725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25725

After landing #25260 the macOS wheel builds started to fail. It turns
out that if not specified, the setup helpers default USE_DISTRIBUTED
to true on all platforms except Windows.

This commit updates that such that USE_DISTRIBUTED only defaults to
true on Linux. More work is needed to enable it by default on macOS.

[test wheel]

ghstack-source-id: 89571701

Test Plan: N/A

Differential Revision: D17211695

fbshipit-source-id: 185db2e3425e45e6b76bd09d70a84e57327ca20f
2019-09-05 15:26:35 -07:00
30aef56e63 rocBLAS deprecated the last two parameters. (#25726)
Summary:
Fixes warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25726

Differential Revision: D17212371

Pulled By: bddppq

fbshipit-source-id: ac07c437c71b70340d345894ccab069c817fdb61
2019-09-05 15:26:31 -07:00
bc2a37b2a2 bring back skipped bitwise dispatch (#25689)
Summary:
Before https://github.com/pytorch/pytorch/issues/24879, `bitwise_not` calls into `at::bitwise_not_out` which goes through a device dispatch. But after the PR it's dispatched directly to `at::native::bitwise_not_out` which only has cpu and cuda impls. Skipping `at::` dispatch indeed broke XLA but XLA didn't have unary tests. We didn't notice it until a test has been added in https://github.com/pytorch/xla/pull/986.  :P
Trying to fix the breakage in this PR to save a revert.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25689

Differential Revision: D17201071

Pulled By: ailzhang

fbshipit-source-id: 0ca560a14a2ec6141f3795479c6dcb460e3805b5
2019-09-05 15:24:06 -07:00
3be1745b3c Make SparseNormalize backwards compatible (#25660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25660

As title

Test Plan:
buck test caffe2/caffe2/python/operator_test:sparse_normalize_test
https://our.intern.facebook.com/intern/testinfra/testrun/5910974517813190

Reviewed By: boryiingsu

Differential Revision: D17187839

fbshipit-source-id: 1e5a6eaac0e825db4ae969540a1f689444070579
2019-09-05 15:14:21 -07:00
197fd4f707 Adding RRef as return value for builtin operators (#25169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25169

See #23110 for RRef design details. This commit only implements
RRef as return value for builtin operators, and RRef will communicate
between a user and the owner. More specifically, a RRef is first
created on the `dist.remote` caller, which is a user of the RRef.
Then the RRef user sends and notification to the owner to report
the fork to the owner, and the owner uses a shared_ptr to keep
the RRef alive. When the user RRef is destructed on the caller,
another notification will be sent to the owner, and the owner
can then drop it's RRef as well.

Test Plan: Imported from OSS

Differential Revision: D17048343

Pulled By: mrshenli

fbshipit-source-id: 9dd3b3d0e4fd214c76fecdbed746a6d3029b3efd
2019-09-05 15:14:17 -07:00
99b6472d6b move USE_STATIC_DISPATCH from CI script to master cmake (#25696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25696

Move the flag from CI to CMake so it's less magic and can be reused by
iOS build as well.

Test Plan: - will check CI

Differential Revision: D17202734

Pulled By: ljk53

fbshipit-source-id: da4f150cbcf2bb5624def386ce3699eff2a7446f
2019-09-05 15:14:13 -07:00
17e7079aa2 rename 'mobile' to 'static_dispatch' (#25695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25695

Rename codegen variables to better reflect its semantics.

As we are going to change other parts of codegen for mobile build, e.g.
autograd, it would be more clear to use more specific names instead of
calling everything 'mobile'.

Test Plan: - will check CI

Differential Revision: D17202732

Pulled By: ljk53

fbshipit-source-id: b2953c0914f25f9a1de00be89a09a6372cc5b614
2019-09-05 15:14:09 -07:00
99cd83ea22 Inserting observers for all methods called in forward (#25503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25503

Previously we only insert observers for forward methods, this PR
extends the support to all observers. It will insert
duplicated observers right now, we'll remove them in next PR.

Test Plan:
python test/test_jit.py -- 'TestJit.insert_observers'

Imported from OSS

Differential Revision: D17208886

fbshipit-source-id: 04084c8f42c56cb66a11968987a15752f532ac04
2019-09-05 15:11:22 -07:00
7333a8c679 Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 3acbf05f8b585739c26f865d57cb7be587542073
2019-09-05 12:50:19 -07:00
df043cd49d preserve ignored function return value type (#25262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25262

Preserve the type of ignore'd functions on serialization. Currently we first compile an ignore'd function with it's annotated type when first compiling, but do not preserve it. This is important for being able to compile models with not-yet-supported features in JIT.

```
torch.jit.ignore
def unsupported(x):
    return x

def foo():
   if not torch.jit._is_scripting():
      return torch.linear(...)
   else:
      return unsupported(...)
```

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D17199043

Pulled By: eellison

fbshipit-source-id: 1196fd94c207b9fbee1087e4b2ef7d4656a6647f
2019-09-05 11:21:55 -07:00
61819260f7 Rename FBGEMM quantized operators to generic quantized ops (#25678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25678

As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops.
Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK.
ghstack-source-id: 89518961

Test Plan: buck test caffe2/test:quantized

Differential Revision: D17194364

fbshipit-source-id: 5960aedff6b8cb89eb3872c39b74caf54c0fbf20
2019-09-05 10:13:08 -07:00
50cb48643d Fix named tensor build (#25673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25673

We recently moved new_empty into ATen. new_empty doesn't support named
tensors (in fact, it was hackily supporting named tensors before). This
fixes the named tensor test by changing all uses of `new_empty` to
`empty`.

Named tensor support for `new_empty` will come eventually, but it might
be a little tricky.

Test Plan: - [namedtensor ci]

Differential Revision: D17206043

Pulled By: zou3519

fbshipit-source-id: 1697bd1d63e7cb344f3d459a29af0fcb9696ea49
2019-09-05 09:18:24 -07:00
3556bea5aa Build torch.distributed with Gloo backend on macOS (#25260)
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.

A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260

Reviewed By: mrshenli

Differential Revision: D17202381

Pulled By: pietern

fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
2019-09-05 07:09:50 -07:00
a3d0abf729 move GetDimFromOrderString to caffe2/core/types.h (#25671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671

To decouple string_utils.h from types.h and protobuf headers.
Logically GetDimFromOrderString seems to be more similiar to
StringToStorageOrder comparing to other string_utils functions.

Test Plan: - Will check all internal/external CI jobs.

Reviewed By: yinghai

Differential Revision: D17191912

Pulled By: ljk53

fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff
2019-09-05 04:32:04 -07:00
a35a63b8bd move legacy deserialization code into jit/import_legacy.cpp (#25649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25649

Continue the work of PR #25493 to remove dependencies of generated
protobuf headers from jit/import.cpp.

Instead of adding intrusive #if/#else to gate the legacy functions,
moving them into a separate file. Keep the ScriptModuleDeserializer
structure as otherwise it will require a lot of interface changes.

There is not much state to copy from ScriptModuleDeserializer as it only
extracts extra_files before calling into LEGACY_deserialize. There is
no state to copy back into ScriptModuleDeserializer either as it directly
returns script::Module.

Test Plan:
- builds;
- with stacked PR to remove protobuf from cmake;
- load and run ResNet-18 in model.json format with non-mobile build;
- load and run ResNet-18 in pickle format with mobile build;

Differential Revision: D17183549

Pulled By: ljk53

fbshipit-source-id: 2947b95659cd16046d9595fb118d22acc179b3ad
2019-09-05 03:16:10 -07:00
3363ec9283 clean up binaries/cmake for mobile (#25651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25651

Most of the bianries are not useful/compilable for mobile. Consolidate the gating
logic and move to the beginning of the file.

Test Plan: - make sure BUILD_BINARY=ON works for both mobile and non-mobile builds;

Differential Revision: D17183550

Pulled By: ljk53

fbshipit-source-id: a8179f4e80999271bf43b5d97798abc713c59843
2019-09-04 22:32:45 -07:00
d4226392bd change shape for some ops to reduce variance (#25686)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25686

From the new runs, we found some ops that we can increase the shape size to reduce the variance

Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
last few lines of the output P108624830

Reviewed By: mingzhe09088

Differential Revision: D17199623

fbshipit-source-id: a9277509f6d3e6503d3086b3b02f87eebd953239
2019-09-04 21:17:43 -07:00
ef6ea545e8 Add Python/C++ API parity tracker for torch.nn (#25289)
Summary:
This PR adds Python/C++ API parity tracker at `test/cpp_api_parity/parity-tracker.md`, which currently shows parity status for `torch.nn` modules.

A good amount of line changes here is moving `new_criterion_tests` from `test_nn.py` to `common_nn.py`, so that it can be used in `test_cpp_api_parity.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25289

Differential Revision: D17188085

Pulled By: yf225

fbshipit-source-id: 33d12fb1a4de2d9147ed09380973f361a3981fdf
2019-09-04 19:46:33 -07:00
0806203d54 Remove accidentally re-added file (#25677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25677

(from a merge conflict resolution gone bad)

Test Plan: Imported from OSS

Differential Revision: D17195369

Pulled By: zdevito

fbshipit-source-id: 9e40a2fbf2f58c952642147086e537bbbb049d97
2019-09-04 19:43:44 -07:00
4d415bff2b Add requests as a legit dependency (#25596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25596

Giving up on trying to limit this to just a py2 dependency

Test Plan: Imported from OSS

Differential Revision: D17171063

Pulled By: jamesr66a

fbshipit-source-id: 5df35fd128f3051dd9c6709f7d45323fedc12e65
2019-09-04 17:43:37 -07:00
76b6b1b1a6 move no_deadline to hypothesis_utils.py (#25598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25598

att

Test Plan:
CI

Imported from OSS

Differential Revision: D17192467

fbshipit-source-id: 9ee93b02cc293bb71ed114534d92eedda3ddee88
2019-09-04 17:06:33 -07:00
80820b2610 Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 67a4aef6ea96db636e1779b51a776c2d238a81d6
2019-09-04 15:38:34 -07:00
55da02a86d Revert D17097735: [quantization] Rename fbgemm quantized operators to generic quantized ops
Test Plan: revert-hammer

Differential Revision:
D17097735

Original commit changeset: 447112a7a421

fbshipit-source-id: 78368b6f84d96cea70692fb000cebe99602a08c1
2019-09-04 15:02:32 -07:00
2e1a5cb80e Port new_full to ATen. (#25583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25583

Following the game plan from https://github.com/pytorch/pytorch/pull/25475

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17183438

Pulled By: ezyang

fbshipit-source-id: 67bd98206f349ddf5ffdd7be0c16e45418c1b1cd
2019-09-04 14:34:43 -07:00
3d9c419648 Port new_empty to ATen. (#25475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25475

I got sucked into this rabbit hole when I was trying to understand
what I should do with TensorTypeId occurrences in
torch/csrc/utils/tensor_new.cpp.  I eventually concluded that all of my problems
were because Tensor.new_empty was hand implemented and not actually a native
function.  So I made it a native function.

There are a bunch of other new_* functions which should get this
treatment, but I'm sending out this PR just to show how it can
be done.

The general recipe:
1. Implement a concept of TensorOptions merging (TensorOptions::merge_in).
   This represents the notion of taking a tensor, but "overriding" some
   of its values with specific overrides.  One subtlety here is how
   devices get merged; see the comments for what our existing behavior is,
   and how I preserve it.
2. Implement new_empty as a native function, using options merging.
3. Add another special case to Python binding generation to treat new_*
   similar to *_like (i.e., handle TensorOptions correctly).  The logic
   here is probably wrong, actually; we should codegen TensorOptions
   correctly no matter what happens, but new_empty follows the same
   pattern as empty_like so I opted not to touch this code too much.
4. Delete the now defunct manual binding code.
5. Delete manual type annotations that are no longer necessary since
   we're going through native.

I didn't handle memory format correctly here.  I don't know if this function
should accept memory format; prior memory format patches didn't add support
for memory format to new_like.  If we had put memory format in TensorOptions
this wouldn't have been a question.
ghstack-source-id: 89294185

Test Plan: sandcastle & ossci

Differential Revision: D17133000

fbshipit-source-id: 00f4e98bd5174f6fd54e8aba2910ea91824771d9
2019-09-04 14:34:39 -07:00
0cc8ac75c9 Alphabetize Package Reference section in Docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25666

Differential Revision: D17190766

Pulled By: soumith

fbshipit-source-id: 836305062b0195b2f11be069447e05008c128d21
2019-09-04 14:31:16 -07:00
c9ba5186d3 Rename fbgemm quantized operators to generic quantized ops (#25338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25338

As an effort to unify fbgemm and qnnpack at the dispatcher level, we need to have a generic name for the quantized backed ops.
Currently FBGEMM is guarded by the USE_FBGEMM macro and QNNPACK uses USE_QNNPACK.

TBD: Use compile time macro or run_time to switch between fbgemm and qnnpack.
ghstack-source-id: 89454244

Test Plan: buck test caffe2/test:quantized

Differential Revision: D17097735

fbshipit-source-id: 447112a7a421387724d3e29b8fd8412dfb1c373a
2019-09-04 14:27:27 -07:00
efc5306ad2 Make NoneType <: Optional[T] (#25361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25361

Previously we had a different None object for each type T so that
unwrap optional could still recover the type T from it. After a few
months of having this conversion behavior, it has become clear that
only the unwrap optional operators cause this problem. Furthermore, it
is beneficial to have NoneType <: Optional[T] because this is how IValues
work (in particular the None IValue is not tagged). This patch makes the
necessary changes to do this. In particular it special cases unwrap optional
in export so that it annotates the None to make sure we can recover the type.

This also changes how matching and evaluating type values work so that we
can consider None matchable to type Optional[T], eventhough we cannot
derive T from that match.

Test Plan: Imported from OSS

Differential Revision: D17103072

Pulled By: zdevito

fbshipit-source-id: 37678ed3e5ce54f2eb3ee4dff2734a39f0bee028
2019-09-04 13:52:40 -07:00
738303ba43 Add set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "-Wl,--no-as-needed") to CMakeLists.txt (#25445)
Summary:
This is a fix for a rare build issue on Ubuntu:
`symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk`
https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25445

Differential Revision: D17151458

Pulled By: pbelevich

fbshipit-source-id: a0f3e86a05ac408b95446560f42fc16fbff2d7af
2019-09-04 13:40:10 -07:00
817f4502fb Dynamic dispatch for optimized quantized op kernels (#25545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25545

This re-uses the infrastructure from ATen/native/cpu, which compiles kernels multiple times for different instruction sets and dispatches dynamically based on the CPU's capability flags at runtime. This ensures we use the most optimal quantized kernel for the given machine

Test Plan: Imported from OSS

Differential Revision: D17166369

Pulled By: jamesr66a

fbshipit-source-id: 8c8393f99365e1408819bbaf254c1b5734a34b70
2019-09-04 13:26:40 -07:00
849c32f8e9 Cpu-strided-complex support for binary-ops (#25534)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

Note: These changes do not support AVX/SSE operations on complex tensors.
Changes so far:

- [x]  Added complex support of torch.empty.
- [x]  Added complex support of CopyKernels
- [x]  Added complex support of BinaryOp kernels

Once these changes are applied the rest of the kernels are pretty easy.

ezyang
I have fixed the issues in the original [PR: 25373](https://github.com/pytorch/pytorch/pull/25373).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25534

Differential Revision: D17188390

Pulled By: ezyang

fbshipit-source-id: ade9fb00b2caa89b0f66a4de70a662b62db13a8c
2019-09-04 13:20:52 -07:00
e3afe6a4e1 Update Transformer.py comments to include a full example (#25411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25411

We provide a full example in Transformer.py in comments section.

Test Plan: N/A

Reviewed By: zhangguanheng66

Differential Revision: D17116514

fbshipit-source-id: b8fd331bef7a626e52f3347c88adba21b1f43ec5
2019-09-04 12:53:35 -07:00
5330b7392d Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 23397af84c6b6654d2fa5af9175035b4a1e60b17
2019-09-04 12:29:28 -07:00
0c6ee947b6 Remove forward compat code for serialization format (#25440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25440

See the comments deleted for what this PR is all about

Test Plan: Imported from OSS

Differential Revision: D17125690

Pulled By: suo

fbshipit-source-id: a4a2f541a3e161f9c15b51df475130e7bf683cf8
2019-09-04 12:22:31 -07:00
bb969d5ac8 Remove friend dependency on ClassType in InterfaceType (#25617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25617

This was causing some build issues if you included c10 but not torch

Test Plan: Imported from OSS

Differential Revision: D17173352

Pulled By: suo

fbshipit-source-id: 8b6f65b6cdefea716598dec2909bbeb511f881b5
2019-09-04 12:07:40 -07:00
2eebf08427 Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 6dceee13c8dbd2f20d464a432c1100456e4f2892
2019-09-04 12:04:08 -07:00
b266a079f0 Enable PiecewiseLinearTransform test on ROCm (#25632)
Summary:
thrust segfaults should have been fixed in ROCm rocThrust
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25632

Differential Revision: D17179503

Pulled By: bddppq

fbshipit-source-id: 4d3854eacb30c945119d58250bccf399ccbc6105
2019-09-04 10:51:28 -07:00
478440a061 Kill discover_sparse_tensor_operations. (#25589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25589

It's not used anymore.

Test Plan: Imported from OSS

Differential Revision: D17172501

Pulled By: gchanan

fbshipit-source-id: 4fff9e48358015bcf886294b8db359c3cc7acafa
2019-09-04 10:47:57 -07:00
9f1a817742 Kill unused enumerate_options_due_to_default.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25588

Test Plan: Imported from OSS

Differential Revision: D17172502

Pulled By: gchanan

fbshipit-source-id: b3156a52ed5b4b108a1668714fe5cb26a3d3f575
2019-09-04 10:47:53 -07:00
5407241b4f Run clang-format on torch/csrc/distributed (#25647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25647

TSIA

Test Plan: N/A

Differential Revision: D17182909

fbshipit-source-id: 22a6554693def0032a051cef5fe788f49de1d740
2019-09-04 10:08:09 -07:00
ee087a6a47 Fix clang-tidy script (#25652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25652

The clang-tidy driver script generates a chunk whitelist per file so
that it only shows errors for lines that were actually changed. If a
change removes the chunk the count is equal to 0. If the chunk happens
to be at the start of the file, and the start position is equal 0,
clang-tidy fails to run. This change filters out those chunks.

Test Plan: Imported from OSS

Differential Revision: D17184188

Pulled By: pietern

fbshipit-source-id: b6c2d9dca4d52cd6bf4b186603545312726fb00b
2019-09-04 09:46:26 -07:00
14c2492fb5 Fix iOS simulator build (#25633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25633

The iOS simulator build (x86_64) is broken right now. To fix it:

1. Fix the bug in iOS.cmake
2. Disable avx2 for mobile x86_64 build

Test Plan:
1. The `build_ios.sh` can be run successfully for iOS x86 build. The build script I'm using:

	```shell
   	./scripts/build_ios.sh \
   	-DBUILD_CAFFE2_MOBILE=OFF \
	-DIOS_PLATFORM=SIMULATOR \
   	-DUSE_NNPACK=OFF \
   	-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
   	-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
   	```
2. All generated static libs are x86 libs as shown below

	```
	> lipo -i *.a
	Non-fat file: libasmjit.a is architecture: x86_64
	Non-fat file: libc10.a is architecture: x86_64
	Non-fat file: libcaffe2_protos.a is architecture: x86_64
	Non-fat file: libclog.a is architecture: x86_64
	Non-fat file: libcpuinfo.a is architecture: x86_64
	Non-fat file: libfbgemm.a is architecture: x86_64
	Non-fat file: libtorch.a is architecture: x86_64

Differential Revision: D17183803

Pulled By: xta0

fbshipit-source-id: 870d5433a3616b8e7ed9fb7dfab6aebbda26f723
2019-09-04 08:58:25 -07:00
47cee2dd22 Implement initial version of autograd with named tensors (#25604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25604

In this initial version:
- autograd ignores all names.
- tensor.grad is unnamed, unless the user manually assigns to it.
- if a grad tensor has any names, perhaps the user was hoping for some
alignment-checking behavior that named tensor offers for other ops. We
raise a warning in this case.

Future: do some more extensive checking to see if this actually works in
all cases.

Test Plan:
- [namedtensor ci]
- Check a warning is raised if a grad tensor has names.
- Check tensor.grad field is unnamed.
- Check that we can perform backward on an op that doesn't explictly
support names in backward. `sigmoid` is one such op.

Differential Revision: D17171788

Pulled By: zou3519

fbshipit-source-id: 64837fde94d8269610b6d3539ac025516dbe1df4
2019-09-04 06:36:54 -07:00
791347642b Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888

This is an alternative to https://github.com/pytorch/pytorch/pull/23684.

Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed.
ghstack-source-id: 89357687

Test Plan: waitforsandcastle

Differential Revision: D16673569

fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf
2019-09-04 01:35:19 -07:00
885da48d22 remove protobuf usage from mobile build (#25493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25493

remove protobuf usage from mobile build

Test Plan:
buck build //caffe2:torch

buck build -c 'protobuf.use_v3=true' -c 'project.ignore=true' fbsource//fbandroid/mode/dev_clang_asan //xplat/experimental/pytorch/predictor:predictor

Reviewed By: ljk53

Differential Revision: D17116846

fbshipit-source-id: d75e5f48e7eae960c0b5c7b8ef7f3359eb6ca4ec
2019-09-03 22:55:34 -07:00
4fe857187c switch to rocThrust for thrust/cub APIs (#25620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602

Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option.

Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header.

Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust.

Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable.

Skip four tests that fail with the new rocThrust for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864

Reviewed By: xw285cornell

Differential Revision: D16940768

Pulled By: bddppq

fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5
2019-09-03 22:16:30 -07:00
68b9920c7c Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 7d6c34ed015a5016e59413e2e02224a6a46e2b03
2019-09-03 21:17:17 -07:00
0ebbcd9541 Name inference rules for relu/relu_/threshold/threshold_ (#25569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25569

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17159121

Pulled By: zou3519

fbshipit-source-id: c68bdb543155488aa3634f908bd576e5c30c8d77
2019-09-03 20:10:24 -07:00
9ea6238b07 Fix named tensor printing (#25564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25564

There are a number of ops that get called while printing tensors
depending on how large the tensors are. This PR makes it so that before
we attempt to format tensor data for printing, we drop the names of the
tensor (if there are any). This is easier than supporting named tensors
for all of those ops (which should happen eventually).

Test Plan: - new test [namedtensor ci]

Differential Revision: D17158872

Pulled By: zou3519

fbshipit-source-id: 282023837645b8cb16a4d93896a843dd598fc738
2019-09-03 19:58:00 -07:00
0483d537ab Add the dynamic quantized LSTM module (#25157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25157

Add the dynamic quantized LSTM module.

TODO (separate PRs):
- Serialization.
- Bias can be Null.

ghstack-source-id: 89443731

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
```
[jianyuhuang@devvm2816.prn3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_q
uantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.4 sec
Building: finished in 4.0 sec (100%) 8122/8122 jobs, 2 updated
  Total time: 5.5 sec
Trace available for this run at /tmp/testpilot.20190902-164918.1275502.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision b61bc0e3b71033578eddfe0a28b0739bc685663f fbpkg 3b1c1aed1c534c0cb161a981eca6e2f0 at Sun Sep  1 20:58:52 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/690/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
      ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 1.048 1/1 (passed)
Test output:
> test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 1.049s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799823877227
Summary (total time 5.53s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16955662

fbshipit-source-id: 61cf1a74913105fa02e44b3941813eabac0006b5
2019-09-03 19:18:28 -07:00
4edf77b6c0 Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519

Fuse  Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically.

Test Plan:
```
buck test caffe2/caffe2/opt/custom:concat_elim_test
```

Reviewed By: dreamingleo

Differential Revision: D17125045

fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728
2019-09-03 19:08:49 -07:00
cd4a7cdaa6 change shape for some ops to reduce variance
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25619

Test Plan:
```
[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```

last few lines of output P108286305

Reviewed By: mingzhe09088

Differential Revision: D17175802

fbshipit-source-id: 46b69fc1895444b15b6dfcec0625b6b9b006712a
2019-09-03 18:52:25 -07:00
67d64ea910 Fix binary op name inference to happen before shape checks (#25563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25563

Before, for binary ops, name inference occurred after shape checks. This
defeats the purposes for names because the names are supposed to tell
the user that i.e. their tensors are misaligned or that they are adding
incompatible tensors.

This PR changes TensorIterator so that names are computed before shape checks and
propagated after the binary ops are finished. In order to support this,
this PR makes the following changes:
- adds a `names_` field to TensorIterator, similar to `shape_`. This is
necessary to hold the output names, that are computed in
`compute_names`, until they are used in `propagate_names_to_outputs()`.

Test Plan: Imported from OSS

Differential Revision: D17158869

Pulled By: zou3519

fbshipit-source-id: 0caa90f7a93e4d9bdb2549cd330cc3abd2258868
2019-09-03 18:49:09 -07:00
9922e09436 Name inference rule for torch.cat (#25568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25568

Test Plan
- new test [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17159069

Pulled By: zou3519

fbshipit-source-id: fbc185ea5865b128508451096b742ac18e467670
2019-09-03 18:43:10 -07:00
49baeb9d4c Eliminate magic numbers in BatchLinearAlgebra.cu (#25524)
Summary:
Changelog:
- We had 65535 as a common magic number for several linalg routines as a batch size limit. This PR explicitly assigns them to a variable to minimize possible errors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25524

Test Plan:
- All existing tests should pass to confirm that the modification is correct

This is a follow-up of the suggestion in https://github.com/pytorch/pytorch/issues/24438.

Differential Revision: D17171842

Pulled By: zou3519

fbshipit-source-id: a9ed5000f47614b8aa792c577f30b30475e0ac4b
2019-09-03 17:53:27 -07:00
8edf149f7f Don't save self in index backward (#25594)
Summary:
`self` isn't necessary for `index` backward, we only need the shape of
`self`. Changing derivatives.yaml to use `zeros_like(self)` triggers a
codepath in the codegen to only save the shape.

Fixes https://github.com/pytorch/pytorch/issues/24853.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25594

Test Plan:
- I added a new test that is adapted from the code in
https://github.com/pytorch/pytorch/issues/24853. I'm not sure what a
more minimal example would look like because the bug is hard to trigger
because of how autograd handles differential views.

Differential Revision: D17168645

Pulled By: zou3519

fbshipit-source-id: 11f270fed7370730984a93e4316dd937baa351a7
2019-09-03 17:47:40 -07:00
a6ba4f64ac Name inference for masked_fill_ / masked_fill
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25567

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17159070

Pulled By: zou3519

fbshipit-source-id: d177a0847fc592b6b15e3ae59fcea847d4975e12
2019-09-03 17:45:14 -07:00
2aef60660f Name inference rule for masked select (#25566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25566

masked_select returns a tensor with None names. However, it broadcasts
its inputs so we need to perform a check that they are broadcastable.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17159071

Pulled By: zou3519

fbshipit-source-id: ad201f3f73bc54163ede1ba3d906d2409ebef475
2019-09-03 17:45:09 -07:00
d1e079e2e0 Enable torch.cholesky for batches > 262140 (#24438)
Summary:
Changelog:
- Iterate over mini batches of 262140 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24438

Test Plan:
- Added slow tests to test the behavior in test_torch and test_cuda

Fixes https://github.com/pytorch/pytorch/issues/24403

Differential Revision: D17175603

Pulled By: soumith

fbshipit-source-id: 1abb0a1e92494cf43ef4ba9efb54a919cd18bfef
2019-09-03 17:35:37 -07:00
0621e2ce94 Get rid of _th_reciprocal_. (#25507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25507

It doesn't seem to be used.

Test Plan: Imported from OSS

Differential Revision: D17163584

Pulled By: gchanan

fbshipit-source-id: 7409cc06bf84863bd14aea060c755d0f162d2aec
2019-09-03 15:45:36 -07:00
1e4832ffad Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (#24333)
Summary:
Changelog:
- Enable broadcasting of RHS and LHS tensors for lu_solve. This means that you can now have RHS with size `3 x 2` and LHS with size `4 x 3 x 3` for instance
- Remove deprecated behavior of having 2D tensors for RHS. Now all tensors have to have a last dimension which equals the number of right hand sides
- Modified docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24333

Test Plan: - Add tests for new behavior in test_torch.py with a port to test_cuda.py

Differential Revision: D17165463

Pulled By: zou3519

fbshipit-source-id: cda5d5496ddb29ed0182bab250b5d90f8f454aa6
2019-09-03 15:14:48 -07:00
914a6051f9 Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 2a2db3cc9bde896c49121450a7853047a56c3154
2019-09-03 15:11:51 -07:00
896cd1c510 Documentation for cdist (#25221)
Summary:
https://github.com/pytorch/pytorch/issues/21730
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25221

Differential Revision: D17073908

Pulled By: ifedan

fbshipit-source-id: 19e2534183d6a2a7e9cdfcee4734cff1b124e05a
2019-09-03 14:16:07 -07:00
9cb9f15989 Remove index calculation in quantized max_pool2d (#25526)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25526

This is not used, adds unnecessary operations in the tight inner loop, and makes vectorization extremely difficult

Benchmark script
```
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_linear(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.max_pool2d(x, kernel_size=3, stride=None, padding=0, dilation=1)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.max_pool2d(q_x, kernel_size=3, stride=None, padding=0, dilation=1)
    time_per_iter_quant = (time.time() - s) / NITER

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    numel = x.numel() + float_out.numel()

    float_bw_gbps = (numel * 4) / time_per_iter_float / 1e9
    quant_bw_gbps = numel / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')
```

Before this change (AVX2)
```
$ OMP_NUM_THREADS=1 python pool_bench.py
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
3.6582303047180176      2.891871929168701       0.7905111729677203
GB/s float      GB/s quant
0.9685120139731342      0.30629295546107427
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
3.6472487449645996      2.889857292175293       0.7923389640383144
GB/s float      GB/s quant
0.9714281223323551      0.3065064847313822
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
3.7154507637023926      3.0337929725646973      0.8165342957045585
GB/s float      GB/s quant
0.9535962727896339      0.291964549990766
```

After this change (AVX2)
```
$ OMP_NUM_THREADS=1 python pool_bench.py
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
3.869810104370117       1.928541660308838       0.4983556320065849
GB/s float      GB/s quant
0.9155591371263668      0.45929005228653125
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
4.014170169830322       1.846764087677002       0.460061235459548
GB/s float      GB/s quant
0.8826332342930452      0.47962812679240213
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
3.983309268951416       1.848154067993164       0.4639745355448337
GB/s float      GB/s quant
0.8894714823217043      0.4792674027235246
```

Test Plan: Imported from OSS

Differential Revision: D17166342

Pulled By: jamesr66a

fbshipit-source-id: ce6b29349ceb4912a0dba4d085ef9a3cc1a2e965
2019-09-03 13:08:58 -07:00
500e72aaa5 Make scatter/gather arguments optional (#25575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25575

For both scatter and gather, only the source and destination rank,
respectively, need to supply a list of tensors. The `scatter_list` and
`gather_list` arguments were mandatory, however, and this has resulted
in some confusion. This commit makes both the `scatter_list` and
`gather_list`, and the `src` and `dst` arguments optional.

Closes #25463.

Test Plan: Imported from OSS

Differential Revision: D17164253

fbshipit-source-id: a16bc208c87a1c96163c1a86d4a7ca8634a26f95
2019-09-03 12:27:05 -07:00
493f7bd817 Error phrasing in torch.distributed helper functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25574

Test Plan: Imported from OSS

Differential Revision: D17164254

fbshipit-source-id: 13dbcffd67c2b5425c722b2b21765345a85a3872
2019-09-03 12:27:01 -07:00
938e740241 Name inference rule for mean, std, var, std_mean, var_mean (#25431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25431

I put the name propagation logic in a central place, `make_reduction`,
that creates a TensorIterator for the reduction. This lets us implement
name inference rules for mean, std, var, std_mean, and var_mean.

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17123577

Pulled By: zou3519

fbshipit-source-id: 2d47080a40da0c4bcabbb3df71ffa8fbeb7a14c6
2019-09-03 11:54:13 -07:00
f3f83ccb23 Added invert bitwise operation to JIT (#22324)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25360
Fixes https://github.com/pytorch/pytorch/issues/22124
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22324

Differential Revision: D17140477

Pulled By: yf225

fbshipit-source-id: f42aec5e688fe079d9e79726b7a6c345da94ae2e
2019-09-03 11:16:30 -07:00
5c4cc1e8f3 Prepare to add some Dimname/DimnameList overloads (#25405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25405

This PR adds schemas to native_functions.yaml, core/Tensor.h, and
core/TensorMethods.h for Dimname/DimnameList overloads for the following
functions:
- min, max, max_values, min_values
- mean, meadian
- logsumexp, std, var, norm

The actual implementations will come in a later PR. I am accumulating
all the addtional schemas and changes to core/{Tensor|TensorMethods}.h
in this PR so that there is only one point of failure for potential
merge conflicts.

Test Plan: - Check that all pytorch builds still build. [namedtensor ci]

Differential Revision: D17116333

Pulled By: zou3519

fbshipit-source-id: fd666d60109a311767169261afbec0fd85cc00c8
2019-09-03 10:55:47 -07:00
c89301a625 Migrate multinomial from the TH to ATen (CPU) (#25274)
Summary:
https://github.com/pytorch/pytorch/issues/24738

I updated the way to define n_categories and n_dist to fix https://github.com/pytorch/pytorch/issues/12309
Previously:
n_dist = prob_dist.size(0)
n_categories = prob_dist.size(1)
Changed to:
n_dist = prob_dist.size(-2)
n_categories = prob_dist.size(-1)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25274

Differential Revision: D17137157

Pulled By: ifedan

fbshipit-source-id: 0320eafdaa7c272e169101b436b6c2ea4ba4736b
2019-09-03 09:38:27 -07:00
f793a7c57e Implement indexing methods for sparse tensors (#24937)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/7416 .

This PR implements the following indexing methods for sparse tensors:
-  [x] `select`
-  [x] `index_select`

Note that this PR also modifies [gen.py](https://github.com/pytorch/pytorch/pull/24937/files#diff-76aa8cb3d0fad99c5f761d08cbcb4d19) that is not directly required to resolve the original issue but to work around a CI build issue reported in issue https://github.com/pytorch/pytorch/issues/24931 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24937

Differential Revision: D17163796

Pulled By: ezyang

fbshipit-source-id: 06613301ec456d9ed3491b9ce48e804048600f09
2019-09-03 09:31:03 -07:00
832c72a2d6 Update index.rst (#24245)
Summary:
Adds links to torchaudio and torchtext to docs index. We should eventually evolve this to bring the audio and text docs builds in like torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24245

Differential Revision: D17163539

Pulled By: soumith

fbshipit-source-id: 5754bdf7579208e291e53970b40f73ef119b758f
2019-09-03 09:28:19 -07:00
b46bc79f2f Create helpers for implementing unary ops whose CUDA implementation is ATen. (#24879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24879

Test Plan: Imported from OSS

Differential Revision: D17073557

Pulled By: VitalyFedyunin

fbshipit-source-id: 0d876627d500601ecd2a6aa6501e880842f2e98b
2019-09-03 09:02:13 -07:00
4864403fb4 Delete torch/csrc/nn/type_checks, which aren't used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25506

Test Plan: Imported from OSS

Differential Revision: D17141960

Pulled By: gchanan

fbshipit-source-id: 460d6a83c796f0d1ca576a709c298e90204f6b06
2019-09-03 08:32:30 -07:00
09ef107e59 Add copy logic for LibTorch to avoid issues on Windows (#25556)
Summary:
This should work both on VS and Ninja.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25556

Differential Revision: D17162045

Pulled By: ezyang

fbshipit-source-id: 18c3d62e9ba93bf603f3a5310087fac77be4a974
2019-09-03 06:33:38 -07:00
ba9f13448b Updating submodules
Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: f4673e8e37d73ece6d4e1f2a03460293d8484715
2019-09-03 05:56:57 -07:00
8199bb3dd3 add options to flush cache in SLS benchmarks (#25530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25530

Add an option to flush cache for more consistent benchmarking.

Test Plan:
buck run mode/opt caffe2/caffe2/fb/python/benchmarks:sparse_lengths_sum_4bit_benchmark -- --flush-cache
buck run mode/opt caffe2/caffe2/python/operator_test:sparse_lengths_sum_benchmark -- --flush-cache

Reviewed By: hyuen

Differential Revision: D17148087

fbshipit-source-id: 7eb782986676620254c1619a9a48c656cb1a6856
2019-09-03 05:09:03 -07:00
f1059d4e6a format sparse_lengths_sum_benchmark (#25529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25529

To prepare D17148087

Test Plan: Just formatting

Reviewed By: hyuen

Differential Revision: D17148085

fbshipit-source-id: faff90ee7dfec543d47037d20ce00f251144bc06
2019-09-03 05:08:59 -07:00
53cacb6a59 test_allreduce_coalesced_stress message passed in as kwarg (#25557)
Summary:
addresses https://github.com/pytorch/pytorch/issues/25427, see issue discussion for more context.

message conversion to unicode is a potential source of flakiness, passing in as kwarg instead of to `prec` is both more clear and resilient to being broken in the future.

cc mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25557

Differential Revision: D17160343

Pulled By: pietern

fbshipit-source-id: af071fecc04c7e0a6658694dc0d76472193f8e78
2019-09-03 00:54:11 -07:00
631c34d876 checks requiring GPU moved to their own test (#25555)
Summary:
`test_allreduce_coalesced_checks` is skipped if no GPU/not compiled with `CUDA` support. This PR moves the checks involving `.cuda()` to their own tests, since the others are still valid with or without CUDA.

cc pietern mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25555

Differential Revision: D17160337

Pulled By: pietern

fbshipit-source-id: 4c5e6db44d2728ca43784b85131e890d3d003bcd
2019-09-03 00:50:32 -07:00
71c97d3747 Fixed flatten docs (I think) (#25544)
Summary:
I think...

I'm having issues building the site, but it appears to get rid of the error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25544

Differential Revision: D17157327

Pulled By: ezyang

fbshipit-source-id: 170235c52008ca78ff0d8740b2d7f5b67397b614
2019-09-02 11:34:56 -07:00
7d3564fc2c remove MULTI_GPU (#25509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25509

Trying to reduce the number of build parameters to simplify the config.
This one is purely derived from the build environment, so we can have
the CI scripts just compute it.

Test Plan: Imported from OSS

Differential Revision: D17143343

Pulled By: suo

fbshipit-source-id: 7837607b7b18a9233fd8657dc9c63539c0194110
2019-09-02 10:22:30 -07:00
9d37179061 Fix CUDA distributions test on Windows (#25539)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25304.

The possible cause for the failure could have been the fact that `at::empty` was creating a tensor with very small values or 0, which led to `cumdist` not summing to a positive number.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25539

Differential Revision: D17156212

Pulled By: ezyang

fbshipit-source-id: ee8039e576bf76a2266aeb7e9537337d635e0f8f
2019-09-02 08:19:47 -07:00
c36b77fcda Run clang-format on torch/lib/c10d (#25382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382

The formatted code swapped the inclusion order around in
ProcessGroupNCCLTest.cpp, causing a compilation failure in
`ATen/cuda/CUDAMultiStreamGuard.h`.

To fix this, this commit also includes a fix to the include list in
`ATen/cuda/CUDAMultiStreamGuard.h`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25382

Test Plan: Imported from OSS

Differential Revision: D17152634

Pulled By: pietern

fbshipit-source-id: c7b74d65a10dce5d602a98dc23fe2810235f932d
2019-09-02 02:59:47 -07:00
40cb5182e9 Attach 'send' autograd function to the autograd graph as part of RPC. (#24876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24876

This contains very basic functionality of adding 'send' autograd
function to our autograd graph. The purpose of this change is to validate the
basic structure proposed here makes sense. Once this makes sense, we can build
upon this to address more complicated scenarios. At a high level we've added
the following functionality:

1) Define a very simple 'SendRpcBackwards' autograd function.
2) Attach this function to appropriate tensors when we call an RPC.
3) Store the send function in our distributed autograd context.
ghstack-source-id: 89359708

Test Plan: unit tests.

Differential Revision: D16903255

fbshipit-source-id: 6c04794a8e58b199795404225fd9da0c1440460e
2019-09-01 23:54:01 -07:00
a024e1e091 Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130)
Summary:
Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130

Test Plan: Two tests were added to cuda_stream_test for this functionality.

Differential Revision: D17145538

Pulled By: mruberry

fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455
2019-09-01 12:37:52 -07:00
6a458512c2 Fix pow precision (#25476)
Summary:
Found in gpytorch:

```
test_computes_cubic_kernel (test.kernels.test_polynomial_kernel.TestPolynomialKernel) ... FAIL

======================================================================
FAIL: test_computes_cubic_kernel (test.kernels.test_polynomial_kernel.TestPolynomialKernel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/xarfuse/uid-30041/0efc4638-seed-sandcastle-2ddc31a66f82cDbd-ns-4026533029/test/kernels/test_polynomial_kernel.py", line 70, in test_computes_cubic_kernel
    self.assertLess(torch.norm(res - actual), 1e-5)
AssertionError: tensor(1.0790e-05, grad_fn=<NormBackward0>) not less than 1e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25476

Differential Revision: D17147518

Pulled By: pbelevich

fbshipit-source-id: 60b619f5166d2bfaed7aa4803672e6be17d32b76
2019-09-01 06:38:16 -07:00
c881136215 Move worker name collection code from Python to C++ (#24260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24260

This also simplifies ProcessGroupAgent constructor signature.

Test Plan: Imported from OSS

Differential Revision: D16789219

Pulled By: mrshenli

fbshipit-source-id: bbb69022435467fbb1c28da21dd03d3ab52fc521
2019-08-31 19:02:45 -07:00
ac7996ccd3 Removes SymbolicVariable (#25077)
Summary:
This PR excises the last of SymbolicVariable. There should be no change in functionality. One new test for addmm fusion was added. A case where the peephole optimizer might convert a scalar argument remains untested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25077

Test Plan: Refactors existing code so mostly covered by current tests. One test for addmm fusion was added.

Differential Revision: D17145334

Pulled By: mruberry

fbshipit-source-id: 6b68faf764f9ee8398b55c43110228ed9faf81eb
2019-08-31 11:19:50 -07:00
60c4e74e49 Migrate CPU_tensor_apply to TensorIterator in aten/src/ATen/native/TensorCompare.cpp (#25402)
Summary:
https://github.com/pytorch/pytorch/issues/24497
https://github.com/pytorch/pytorch/issues/24498
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25402

Differential Revision: D17135825

Pulled By: ifedan

fbshipit-source-id: fba07e4b59453db1a98bfdebfe21f0827cc952e5
2019-08-30 20:59:57 -07:00
e316b7d548 Multiple fixes to test_c10d.py. (#25441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25441

1) There was a bug in https://github.com/pytorch/pytorch/pull/25012, where the
tests which needed to be skipped for return code checking was incorrect.
2) Added proper setup and teardown for the nccl_error tests.
3) Ensure AssertionError is not ignored for tests that skip return code
checking.
ghstack-source-id: 89317660

Test Plan: unit tests

Differential Revision: D17125824

fbshipit-source-id: 317ec39942b93e40ab847246b3a5129919ba2ac4
2019-08-30 18:22:58 -07:00
6e4eeb1d17 Gradle tasks for publishing to bintray, jcenter, mavencentral etc. (#25351)
Summary:
Gradle tasks for publishing to bintray and jcenter, mavencentral; snapshot buidls go to oss.sonatype.org

Those gradle changes adds tasks:

bintrayUpload - publishing on bintray, in 'facebook' org
uploadArchives - uploading to maven repos

Gradle tasks are copied from facebook open sourced libraries like https://github.com/facebook/litho, https://github.com/facebookincubator/spectrum

To do the publishing we need to provide somehow (e.g. in ~/.gradle/gradle.properties)
```
signing.keyId=
signing.password=
signing.secretKeyRingFile=

bintrayUsername=
bintrayApiKey=
bintrayGpgPassword=

SONATYPE_NEXUS_USERNAME=
SONATYPE_NEXUS_PASSWORD=
```

android/libs/fbjni is submodule, to be able to add publishing tasks to it (it needs to be published as separate maven dependency) - I created `android/libs/fbjni_local` that has only `build.gradle` with release tasks.

pytorch_android dependency for ':fbjni' changed from implementation -> api as implementation treated as 'private' dependency which is translated to scope=runtime in maven pom file, api works as 'compile'

Testing:
it's already published on bintray with version 0.0.4 and can be used in gradle files as

```
repositories {
    maven {
        url  "https://dl.bintray.com/facebook/maven"
    }
}

dependencies {
    implementation 'com.facebook:pytorch_android:0.0.4'
    implementation 'com.facebook:pytorch_android_torchvision:0.0.4'
}
```

It was published in com.facebook group

I requested sync to jcenter from bintray, that usually takes 2-3 days

Versioning added version suffixes to aar output files and circleCI jobs for android start failing as they expected just pytorch_android.aar pytorch_android_torchvision.aar, without any version

To avoid it - I changed circleCI android jobs to zip *.aar files and publish as single artifact with name artifacts.zip, I will add kostmo to check this part, if circleCI jobs finish ok - everything works :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25351

Reviewed By: kostmo

Differential Revision: D17135886

Pulled By: IvanKobzarev

fbshipit-source-id: 64eebac670bbccaaafa1b04eeab15760dd5ecdf9
2019-08-30 17:52:34 -07:00
a27fdfd38c Vectorized quantized relu/relu6 (#25496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25496

Benchmark Script

```
import torch, time

sizes = [
    (1, 56, 56, 256),
    (1, 28, 28, 512),
    (1, 14, 14, 1024),
    (1, 7, 7, 2048),
]

NITER = 1000

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('*****', str(dtype), '*****')

    print('\t*****relu*****')
    print('\tsize',
          'time (float ms)',
          'time (quant ms)',
          'quant / float',
          sep='\t')
    for size in sizes:
        # NHWC
        x = torch.rand(*size)

        # NCHW
        x = x.permute([0, 2, 3, 1])

        # Test float
        s = time.time()
        for i in range(NITER):
            torch.relu(x)
        time_per_iter_float = (time.time() - s) / NITER

        # Test quantized
        q_x = torch.quantize_linear(x, 0.5, 1, dtype)

        s = time.time()
        for i in range(NITER):
            torch.relu(q_x)
        time_per_iter_quant = (time.time() - s) / NITER

        print('\t',
              size,
              time_per_iter_float * 1000,
              time_per_iter_quant * 1000,
              time_per_iter_quant / time_per_iter_float,
              sep='\t')

    print('\t*****relu6*****')
    print('\tsize',
          'time (float ms)',
          'time (quant ms)',
          'quant / float',
          sep='\t')
    for size in sizes:
        # NHWC
        x = torch.rand(*size)

        # NCHW
        x = x.permute([0, 2, 3, 1])

        # Test float relu6
        s = time.time()
        for i in range(NITER):
            torch._C._nn.hardtanh(x, 0., 6.)
        time_per_iter_float_6 = (time.time() - s) / NITER

        # Test quantized relu6
        q_x = torch.quantize_linear(x, 0.5, 1, dtype)

        s = time.time()
        for i in range(NITER):
            torch.ops.quantized.relu6(q_x)
        time_per_iter_quant_6 = (time.time() - s) / NITER

        print('\t',
              size,
              time_per_iter_float_6 * 1000,
              time_per_iter_quant_6 * 1000,
              time_per_iter_quant_6 / time_per_iter_float_6,
              sep='\t')

```

Before this change (AVX2)

```
$ OMP_NUM_THREADS=1 python relu_bench.py
***** torch.qint8 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.28845906257629395	0.32473158836364746	1.1257458353479874
		(1, 28, 28, 512)	0.12658190727233887	0.1621997356414795	1.2813816692816096
		(1, 14, 14, 1024)	0.060466766357421875	0.08151435852050781	1.3480852943031985
		(1, 7, 7, 2048)	0.021933555603027344	0.04172706604003906	1.9024305404582809
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.0264298915863037	0.4686436653137207	0.45657640054641424
		(1, 28, 28, 512)	0.4577608108520508	0.23253798484802246	0.5079901541051298
		(1, 14, 14, 1024)	0.22967290878295898	0.11695981025695801	0.509245129853278
		(1, 7, 7, 2048)	0.12731575965881348	0.060141801834106445	0.4723830105187069
***** torch.quint8 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.28515172004699707	0.32268643379211426	1.1316306762551913
		(1, 28, 28, 512)	0.1268613338470459	0.1618938446044922	1.2761480562681475
		(1, 14, 14, 1024)	0.06022787094116211	0.08164644241333008	1.355625578946535
		(1, 7, 7, 2048)	0.018331527709960938	0.04460000991821289	2.432967433149516
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.027123212814331	0.5206699371337891	0.50692062124382
		(1, 28, 28, 512)	0.4589383602142334	0.25958728790283203	0.565625605542444
		(1, 14, 14, 1024)	0.23261427879333496	0.13058066368103027	0.561361341867771
		(1, 7, 7, 2048)	0.13072657585144043	0.06684517860412598	0.5113358027528374
***** torch.qint32 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.285900354385376	0.44794583320617676	1.5667900593168678
		(1, 28, 28, 512)	0.12691712379455566	0.21081137657165527	1.6610160258035915
		(1, 14, 14, 1024)	0.05957603454589844	0.10731720924377441	1.8013486473507283
		(1, 7, 7, 2048)	0.01675701141357422	0.05678510665893555	3.388737123669683
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.0314903259277344	0.6447939872741699	0.6251090980366052
		(1, 28, 28, 512)	0.4572310447692871	0.3106963634490967	0.6795172090859886
		(1, 14, 14, 1024)	0.2294166088104248	0.1586904525756836	0.6917130080447454
		(1, 7, 7, 2048)	0.12760710716247559	0.07992196083068848	0.6263127705647926

```

After this change (AVX2)

```
$ OMP_NUM_THREADS=1 python relu_bench.py

***** torch.qint8 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.2889232635498047	0.06460881233215332	0.22361928056034167
		(1, 28, 28, 512)	0.13853216171264648	0.013955354690551758	0.10073729102343015
		(1, 14, 14, 1024)	0.0721442699432373	0.007253408432006836	0.10054032617855548
		(1, 7, 7, 2048)	0.015225648880004883	0.004289150238037109	0.28170557930505313
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.042311191558838	0.06422209739685059	0.061615089540392104
		(1, 28, 28, 512)	0.46384429931640625	0.01335287094116211	0.028787399049295198
		(1, 14, 14, 1024)	0.2301616668701172	0.007760286331176758	0.033716675920477994
		(1, 7, 7, 2048)	0.12573981285095215	0.004631757736206055	0.03683604763827976
***** torch.quint8 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.2877991199493408	0.0571134090423584	0.1984488661828141
		(1, 28, 28, 512)	0.12664175033569336	0.013076543807983398	0.10325618347283565
		(1, 14, 14, 1024)	0.06389951705932617	0.005294084548950195	0.08285014961904974
		(1, 7, 7, 2048)	0.016280174255371094	0.003660917282104492	0.22486966199988284
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.0244698524475098	0.05978655815124512	0.05835853344870231
		(1, 28, 28, 512)	0.454937219619751	0.013289213180541992	0.02921109244842504
		(1, 14, 14, 1024)	0.22972846031188965	0.0077877044677734375	0.03389960676705229
		(1, 7, 7, 2048)	0.125657320022583	0.0045795440673828125	0.03644470586003093
***** torch.qint32 *****
	*****relu*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	0.28399205207824707	0.2665698528289795	0.9386525111468004
		(1, 28, 28, 512)	0.12665152549743652	0.12166023254394531	0.9605903447756557
		(1, 14, 14, 1024)	0.0598299503326416	0.059305429458618164	0.9912331387355795
		(1, 7, 7, 2048)	0.014290809631347656	0.012906551361083984	0.9031364698031366
	*****relu6*****
	size	time (float ms)	time (quant ms)	quant / float
		(1, 56, 56, 256)	1.020923376083374	0.27229976654052734	0.2667191024513184
		(1, 28, 28, 512)	0.4564201831817627	0.12390279769897462	0.2714665176181136
		(1, 14, 14, 1024)	0.23244047164916992	0.05935955047607422	0.25537527976482316
		(1, 7, 7, 2048)	0.1271505355834961	0.014976024627685547	0.11778184463762029

```

Test Plan: Imported from OSS

Differential Revision: D17141891

Pulled By: jamesr66a

fbshipit-source-id: 14b8c3330017c518a6b385780a449ca51efef0ce
2019-08-30 17:22:24 -07:00
5455ba634c remove PYTHON_VERSION (#25494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25494

As far as I can tell, we don't use this anywhere in our CI scripts?

[Diff of processed config](https://gist.github.com/suo/be4ca818afdcb3184f5c61c92f4a4c81)

Test Plan: Imported from OSS

Differential Revision: D17139125

Pulled By: suo

fbshipit-source-id: ff2d025c220a420cda08502eda9fc7d41477e103
2019-08-30 16:50:15 -07:00
7a921ba17d Manually implement is_zipfile (#25279)
Summary:
The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494

This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy https://github.com/python/cpython/pull/5053's fix, but that seems like overkill.

Fixes #25214
](https://our.intern.facebook.com/intern/diff/17102516/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25279

Pulled By: driazati

Differential Revision: D17102516

fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a
2019-08-30 16:47:50 -07:00
fcab254d05 Minor fixes in per channel support for qconv kernel (#25182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25182

Removing empty line, an unused variable and adding a check for supported quantized data types.
ghstack-source-id: 89293181

Test Plan: buck test mode/dev caffe2/test:quantized -- --print-passing-details

Reviewed By: jianyuh

Differential Revision: D17052234

fbshipit-source-id: dbe470f0cd73fa4fca44bd15424adbaf7ceca469
2019-08-30 16:47:46 -07:00
0f928dc0d9 Revert "Memory layout for pooling ops" (#25495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25495

This reverts commit 8dcd256201a16827bae3610fe05f6566cce787d0.

Test Plan: Imported from OSS

Differential Revision: D17139716

Pulled By: jamesr66a

fbshipit-source-id: 5f4a12e4048e8a50f8400fcde7de1fbce1495d37
2019-08-30 15:46:55 -07:00
77e8dba620 Disable Int8Transpose test
Summary: It's failing in the FB internal build because we don't enable that op.

Test Plan: buck test //xplat/caffe2:caffe2_testAndroid

Reviewed By: supriyar

Differential Revision: D17139694

fbshipit-source-id: 8091b71ff826466f3e2e1b4d6f87b9b50d1def20
2019-08-30 15:21:23 -07:00
890d0f88ae update speed benchmark binary to work in USE_STATIC_DISPATCH mode (#25449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25449

Currently Variable and Tensor are still not 100% merged. There are
various places in ATen/TH codebase where it asserts input type to be
Variable/Tensor.

Usually when input type is Variable it will dispatch function calls to
corresponding generated VariableType methods, where it converts input
Variable type to Tensor type with "unpack()" before calling into LegacyTHFunctions
and then converts result from Tensor type back to Variable type with "as_variable()".

However, when USE_STATIC_DISPATCH mode is enabled, it no longer dispatches function
calls to VariableType methods. This way, Variable inputs will remain as
Variable instances when they reach LegacyTHFunctions and fail the "checked_tensor_unwrap"
asserts. And there are a couple other failed asserts because of similar reason.

There are several options to address this problem with USE_STATIC_DISPATCH:
1. Wait until Variable and Tensor are fully merged as planned in https://github.com/pytorch/pytorch/issues/23032;
2. Create Tensors instead of Variables upfront on caller side (JIT);
3. Fix downstream asserts in ATen/TH to tolerant Variable inputs when AutoGrad is disabled;

Option 1 will still take some time; Option 2 was tried before and caused
a lot problems; Option 3 needs to be conducted case by case as it can be
dangerous to remove asserts before 100% merge happens.

After digging into it a bit more, turns out NonVariableTypeMode not only controls how
it dispatches, but also controls TensorImpl.is_variable() result. So the
problem can be addressed by:
1. Set AutoNonVariableTypeMode mode right before calling forward();
2. Make sure all inputs/params are created as Variable, e.g.:
 A. should use torch::ones() to create test input tensor instead of at::ones();
 B. should not set AutoNonVariableTypeMode before torch::jit::load() call;

This diff applied these changes to speed benchmark to proof how it works.

Test Plan:
- Build speed benchmark binary for Android:
```
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DUSE_STATIC_DISPATCH=ON \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

- Push binaries and model to Android device:
```
adb push build_android/bin/speed_benchmark_torch /data/local/tmp
adb push resnet.pb /data/local/tmp
```

- Run inference on device:
```
/data/local/tmp # ./speed_benchmark_torch --model=resnet.pb \
--input_dims="1,3,224,224" --input_type=float --print_output=true
```

Differential Revision: D17128567

Pulled By: ljk53

fbshipit-source-id: 58cc49ff35d21fefc906172cc3271f984eeb29f0
2019-08-30 15:16:45 -07:00
a779263501 add speed benchmark binary for torch jit (#25486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25486

Reland PR #25230.

Test Plan: Imported from OSS

Differential Revision: D17137095

Pulled By: ljk53

fbshipit-source-id: 3b258e29e16d03ef24b5b49d9b67b72257f0f3a8
2019-08-30 15:16:41 -07:00
7ec6b74a35 turn off BUILD_BINARY for android CI jobs (#25485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25485

I recently enabled binary build macro for android CI in PR #25368 as I started
adding new binaries for android. But seems it's fragile, e.g.: PR #25230 failed
android-armv8 CI but passed armv7/x86-32/x86-64. Currently it only runs
x86-32 for PR so the armv8 failure was not captured before landing.

Similar problem might happen for other PRs so I think we should just
disable it for now to avoid breaking master CI. The android binaries are
for local testing purpose anyway. We can re-enable it when it becomes
more stable.

Test Plan:
- will check CI;

Imported from OSS

Differential Revision: D17137006

fbshipit-source-id: 2b7901f79e83c77ff82c14a0da3500b9416314b6
2019-08-30 15:16:37 -07:00
6d35579910 Fix implicit fallthrough warnings in FeatureLPPooling.cu (#25451)
Summary:
`-Wimplicit-fallthrough` is enabled for recent GCC versions, and there's about 1000 lines of warnings in the build output with GCC 9.1 like:

```
/home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu: In function ‘bool runFeatureLPPoolingUpdateOutput(THCState*, const THCDeviceTensor<T, 4>&, THCDeviceTensor<T, 4>&, float, int, int) [with T = c10::Half]’:
/home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:474:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
  474 |       L2_WIDTH_CASE(2);
      |          ^~~~~~
/home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:475:1: note: here
  475 |       L2_WIDTH_CASE(3);
      | ^

...

/home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:639:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
  639 |       LP_WIDTH_CASE(15);
      |           ^~~~~~
/home/rgommers/code/pytorch/aten/src/THCUNN/FeatureLPPooling.cu:640:1: note: here
  640 |       LP_WIDTH_CASE(16);
      | ^
```

Fix by ending each case statement with `break;`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25451

Differential Revision: D17131254

Pulled By: ezyang

fbshipit-source-id: 55b513620438cbbf86052f22d799d790b0633fa2
2019-08-30 14:37:14 -07:00
861194e3f8 Fix windows build error when TBB enabled and Windows SDK installed (#25398)
Summary:
Fixed https://github.com/pytorch/pytorch/issues/25320
See the issue for more infomation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25398

Differential Revision: D17131116

Pulled By: ezyang

fbshipit-source-id: cc3ebfe746abb33e24b4c884b08d9e57a1ea3476
2019-08-30 14:34:56 -07:00
03f67e4b16 Remove BUILD_ATEN_ONLY build option (#24441)
Summary:
This build option no longer works.

Close https://github.com/pytorch/pytorch/issues/21703
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24441

Differential Revision: D17138131

Pulled By: ezyang

fbshipit-source-id: 67adac990645a5df1f7c2e2dbef3689b2c30fcf8
2019-08-30 13:44:38 -07:00
9bdcc499d1 Delete a few cases where we directly use Backend/TensorTypeId. (#25467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25467

Use Layout/Device more directly in these cases.
ghstack-source-id: 89289651

Test Plan: sandcastle and ossci

Differential Revision: D17131883

fbshipit-source-id: ab3c6d1c879b7f26f20a2378364c852dc37508fc
2019-08-30 13:00:20 -07:00
d159104d1f Kill non-shared cwrap tools. (#25358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25358

They aren't used anymore.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D17101575

Pulled By: gchanan

fbshipit-source-id: 47c89c71951b49a22e3b1912fc7db40d982ad2fb
2019-08-30 12:36:44 -07:00
28d4e2e9a9 Update derivatives.yaml docs to refer to Declarations.yaml rather than Declarations.cwrap. (#25357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25357

Declarations.cwrap is no longer hooked up derivatives.yaml directly.

Test Plan: Imported from OSS

Differential Revision: D17101570

Pulled By: gchanan

fbshipit-source-id: 838ea6a89c7403e73676292b93d864ecfdd6251b
2019-08-30 12:36:41 -07:00
d2a8435c08 add tuple keyword (#25474)
Summary:
Doesn't really add much functionality, since inputs to `tuple()` which we can statically infer the output size is pretty much just tuples. Does improve the error message though.

Fix for https://github.com/pytorch/pytorch/issues/24000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25474

Differential Revision: D17133800

Pulled By: eellison

fbshipit-source-id: 41a052895e6ed24a384ec6f8aef0a6769ac094e6
2019-08-30 11:33:49 -07:00
7e61136c3b Get rid of extract_cwarp. (#25356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25356

It's no longer used.

Test Plan: Imported from OSS

Differential Revision: D17101572

Pulled By: gchanan

fbshipit-source-id: 7afb15b2f870601a773c946b8b3029bbe0a774ea
2019-08-30 11:18:01 -07:00
fea4225b8a Parameterize CircleCI config (#25446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25446

Parameterize the CircleCI config. So now instead of ~1zillion job specs, there are only a handful, like `pytorch_linux_build` and such. The workflow definition feeds in the appropriate parameters that actually control job behavior.

[Diff](https://gist.github.com/suo/12a48efd36948fc71bdb5c719682a64c) of the `circleci config process` output shows that the actual jobs generated are identical, except for some empty env vars being set.

Differential Revision: D17133395

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: e6d79268b05c91d5079670992bdf4a99e6dc2807
2019-08-30 10:35:04 -07:00
fe055f2dfb Get rid of more unused plugins.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25355

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D17101578

Pulled By: gchanan

fbshipit-source-id: aef433771b6420c09394276b8f73396dd4f305fb
2019-08-30 10:32:17 -07:00
a4fad42a09 Get rid of torch._thnn. (#25354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25354

It doesn't seem to be used anymore.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D17101577

Pulled By: gchanan

fbshipit-source-id: b7c00de8c05bff1336d2012fd7b6f97709391e17
2019-08-30 10:32:13 -07:00
be0f803798 torch/jit/passes/quantization.{h,cpp} and torch/jit/init.cpp (#25403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25403

att

Test Plan:
build

Imported from OSS

Differential Revision: D17125691

fbshipit-source-id: a7e944ea4b45a9f2b3078fcb9e830d8406dd6a86
2019-08-30 10:28:20 -07:00
9d89c9a30f change shape for conv and unary ops (#25477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25477

We want to increase `in_c, out_c` so that the metric reported back are more stable

Test Plan:
```[huaminli@devvm2388.ftw3 ~/fbsource/fbcode] buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators None --iterations 3
```
runs fine on my devserver, last couple lines of output P107448746

Reviewed By: mingzhe09088

Differential Revision: D17133043

fbshipit-source-id: 0b989a530cbfe3d608471a30ae4bbda10e5216ea
2019-08-30 10:02:30 -07:00
1a92b225db Migrate clamp and clamp_ from the TH to Aten (CPU) (#25290)
Summary:
https://github.com/pytorch/pytorch/issues/24686
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25290

Differential Revision: D17122999

Pulled By: ifedan

fbshipit-source-id: d6aec62ded0f01618f8b8c0d8057207df3fd329b
2019-08-30 09:26:53 -07:00
e26305ed60 cuda devices should have same dtype (#25470)
Summary:
addresses https://github.com/pytorch/pytorch/issues/25465

was passing in two tensors of different dtypes for a check making sure the two tensors were on the same device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25470

Differential Revision: D17132404

Pulled By: mrshenli

fbshipit-source-id: b03093b8324c98ec4c51be05b3abf9250c680c23
2019-08-30 09:00:21 -07:00
329757a907 Torch.flatten() returns a 1-dim tensor on a 0-dim tensor (#25406)
Summary:
PR for `torch.flatten()` to return a 1-dim tensor on a 0-dim tensor

> torch.tensor(123).shape -> torch.Size([])
> torch.tensor(123).flatten() -> torch.tensor([123])
> torch.tensor(123).flatten().shape -> torch.Size([1])

resolve https://github.com/pytorch/pytorch/issues/22963
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25406

Differential Revision: D17120464

Pulled By: CamiWilliams

fbshipit-source-id: efbecd61f0aefd82f2ab417ca6bb467488ff99de
2019-08-30 08:53:08 -07:00
4fb28e5df9 Fixes #25454
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25456

Differential Revision: D17132144

Pulled By: ezyang

fbshipit-source-id: 68d11cbbb80f783959110b626594373ee41981d7
2019-08-30 07:59:26 -07:00
25e6a52e2e Stop doing nn wrap. (#25353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25353

It doesn't seem necessary anymore.

Test Plan: Imported from OSS

Differential Revision: D17101569

Pulled By: gchanan

fbshipit-source-id: 67a198ae594dcd64dbd7cf6a73e2160e26e3513e
2019-08-30 07:42:20 -07:00
716815e3de Stop initializing THNN backend. (#25352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25352

It doesn't appear to be necessary anymore; assuming this works I'll kill the codegen in a follow-up PR.

Test Plan: Imported from OSS

Differential Revision: D17101573

Pulled By: gchanan

fbshipit-source-id: bd3d1724ee5c659185a161b1e291e30af52f0a8a
2019-08-30 07:42:17 -07:00
05bf74a890 Compare shapes of outputs and grad_outputs in autograd.grad (#25349)
Summary:
PR to compare shapes of `outputs` and `grad_outputs` in `torch.autograd.grad()`.

> grad_outputs should be a sequence of length matching output containing the pre-computed gradients w.r.t. each of the outputs.

resolve https://github.com/pytorch/pytorch/issues/17893
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25349

Differential Revision: D17119931

Pulled By: CamiWilliams

fbshipit-source-id: 86c9089e240ca0cea5f4ea8ec7bcff95f9d8cf53
2019-08-30 07:31:15 -07:00
c56464d13e Turn off warnings on Windows CI. (#24331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24331

Currently our logs are something like 40M a pop.  Turning off warnings and turning on verbose makefiles (to see the compile commands) reduces this to more like 8M. We could probably reduce log size more but verbose makefile is really useful and we'll keep it turned on for Windows.

Some findings:

1. Setting `CMAKE_VERBOSE_MAKEFILE` inside CMakelists.txt itself as suggested in https://github.com/ninja-build/ninja/issues/900#issuecomment-417917630 does not work on Windows. Setting `-DCMAKE_VERBOSE_MAKEFILE=1` does work (and we respect this environment variable.)
2. The high (`/W3`) warning level is by default on MSVC is due to cmake inserting this in the default flags. On recent versions of cmake, CMP0092 can be used to disable this flag in the default set. The string replace trick sort of works, but the standard snippet you'll find on the internet won't disable the flag from nvcc. I inspected the CUDA cmake code and verified it does respect CMP0092
3. `EHsc` is also in the default flags; this one cannot be suppressed via a policy. The string replace trick seems to work...
4. ... however, it seems nvcc implicitly inserts an `/EHs` after `-Xcompiler` specified flags, which means that if we add `/EHa` to our set of flags, you'll get a warning from nvcc. So we probably have to figure out how to exclude EHa from the nvcc flags set (EHs does seem to work fine.)
5. To suppress warnings in nvcc, you must BOTH pass `-w` and `-Xcompiler /w`. Individually these are not enough.

The patch applies these things; it also fixes a bug where nvcc verbose command printing doesn't work with `-GNinja`.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17131746

Pulled By: ezyang

fbshipit-source-id: fb142f8677072a5430664b28155373088f074c4b
2019-08-30 07:11:07 -07:00
f0c6021846 fix bug in assertNotEqual for int tensors (#25412)
Summary:
re-apply: https://github.com/pytorch/pytorch/pull/25199
but without a failing quantized test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25412

Differential Revision: D17131303

Pulled By: nairbv

fbshipit-source-id: edf7736af3ede5e809eded72be9514e922e70db4
2019-08-30 06:52:30 -07:00
c76dacba84 Add windows docs for the binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23150

Differential Revision: D17131252

Pulled By: ezyang

fbshipit-source-id: 5d80a41fc8779b93a3f157dfdacc21cf4b809d5a
2019-08-30 06:45:32 -07:00
061f2d1683 Skip useless macros from Windows.h (#25444)
Summary:
Applying https://github.com/pytorch/pytorch/issues/25398 to the whole project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25444

Differential Revision: D17131251

Pulled By: ezyang

fbshipit-source-id: 7a8817f3444aebd6028bf1056514355e2c4cc748
2019-08-30 06:42:44 -07:00
c2b710c3bd Revert D17067216: [pytorch][perf] add speed benchmark binary for torch jit
Test Plan: revert-hammer

Differential Revision:
D17067216

Original commit changeset: 0cf4c46f1c90

fbshipit-source-id: a8fa2a72042da817d199d325c30624a451b24582
2019-08-30 06:22:21 -07:00
60f6cc9d59 Emit script function calls during tracing. (#25089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25089

Previously, when the tracer encountered a scripted function (or method), it
inlined the function into the graph. Now, we emit a CallFunction or
CallMethod node instead.

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D16987936

Pulled By: suo

fbshipit-source-id: a3e38a4621f3504909ec0542865dc7e381c243d6
2019-08-30 01:30:03 -07:00
bbf84c1a9f Fix dead link and syntax in ONNX landing page
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25126

Differential Revision: D17129237

Pulled By: dzhulgakov

fbshipit-source-id: 80fab457387d357ddcfc23710cb4493ce94cab5e
2019-08-29 23:58:34 -07:00
0c222555ce Attempt to fix windows build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25450

Test Plan: Imported from OSS

Differential Revision: D17128958

Pulled By: jamesr66a

fbshipit-source-id: 721a939d77f3c848bb728544ce5c4715094c3f91
2019-08-29 23:49:26 -07:00
d49f1349e9 Updating submodules
Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: ebf75b38d338867af7d01545de29450c0cc70635
2019-08-29 23:46:30 -07:00
194acd023a Some alias analysis fixes (#25425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25425

1. Properly invalidate memory locations when we change the points-to
set.
2. Don't build a new indexToElementMap in toString(), just use
`MemoryDag::fromIndex`
3. Fix transitive wildcard assignment

Test Plan: Imported from OSS

Differential Revision: D17126402

Pulled By: suo

fbshipit-source-id: cbd99027d2e78fd333dbf030172d3b7ac4df8349
2019-08-29 23:32:07 -07:00
d291935377 Export Unique
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25050

Differential Revision: D17085391

Pulled By: dzhulgakov

fbshipit-source-id: a17d54cf634650d3874d02c2bfacd906572ccf5f
2019-08-29 23:27:29 -07:00
8986b9e38d Momentum setting in SyncBatchNorm forward (inference) pass. (#24995)
Summary:
This is a fix for a potential ONNX export issue with SyncBatchNorm where irrespective of the value of momentum, the value for momentum in ONNX BN node is always 0. The details are captured in https://github.com/pytorch/pytorch/issues/18525.

The fix in this PR for `SyncBatchNorm` is very similar to the fix that went in https://github.com/pytorch/pytorch/pull/18764 for `BatchNorm` (I think this site was just missed).

Please note that there are no ONNX test points added for this, because SyncBatchNorm works exclusively with tensors on GPU and the ONNX test passes are CPU only. If there's a way to add a test point, please let me know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24995

Differential Revision: D17085570

Pulled By: dzhulgakov

fbshipit-source-id: 162d428673c269efca4360fb103854b7319ec204
2019-08-29 23:16:46 -07:00
17831648dd Quantized vec256 + vectorized quantized::add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25202

Test Plan: Imported from OSS

Differential Revision: D17061047

Pulled By: jamesr66a

fbshipit-source-id: b08a61a9b4a258a4c1b6a97a6da1db05c3a6b0f7
2019-08-29 21:21:12 -07:00
8cd45b4c46 relax roi_width/roi_height check to non-negative
Summary: Pull Request resolved: https://github.com/fairinternal/detectron2/pull/260

Test Plan: sandcastle.

Reviewed By: ppwwyyxx

Differential Revision: D17127067

fbshipit-source-id: ddca51fa0dab1e683f8c3709e105b6cbdf8b78b0
2019-08-29 21:18:40 -07:00
93b653bba3 Attempt to enable CrossMapLRN2d, as it no longer uses Module._backend.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25343

Test Plan: Imported from OSS

Differential Revision: D17101574

Pulled By: gchanan

fbshipit-source-id: 71d40f5c2a9c94a71abc52e61f6f7be449a2b41a
2019-08-29 20:15:14 -07:00
c59540b7b1 Change exception to warning (#25408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25408

Change exception to warning so that observer can be called with no data and still provide a scale and zero-point.
ghstack-source-id: 89267768

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer'

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'

Differential Revision: D17116524

fbshipit-source-id: db4d76e882b57f23161dced846df3a0760194a41
2019-08-29 20:12:57 -07:00
86b1d5f271 add speed benchmark binary for torch jit (#25230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25230

Add similar benchmark binary as caffe2 speed benchmark for torch JIT.

Test Plan:
- Test with ResNet-18 torch script model:
```
./speed_benchmark_torch --model=res18.pb --input_dims="1,3,224,224" --input_type=float --print_output=true
./speed_benchmark_torch --model=res18.pb --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter=20
```

- Verified building as desktop/server binary works:
```
BUILD_BINARY=ON python setup.py develop
```

- Verified building as android binary works:
```
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

Differential Revision: D17067216

Pulled By: ljk53

fbshipit-source-id: 0cf4c46f1c90b89bd8dca2a14bb0e5fb70b233a1
2019-08-29 19:34:44 -07:00
e370486d80 fix binaries build for BUILD_CAFFE2_MOBILE=OFF (#25229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25229

The binaries don't build when BUILD_CAFFE2_MOBILE=OFF (libtorch mode)
in which case we don't include caffe2/predictor which is needed by
predictor_verifier.cc.

Add BUILD_BINARY=ON to libtorch android CI script to make sure binaries
can be compiled for libtorch android as we will add speed benchmark
binary for it.

Test Plan:
- Verified BUILD_BINARY=ON works with BUILD_CAFFE2_MOBILE=OFF and ON.
- Will check CI builds.

Differential Revision: D17067217

Pulled By: ljk53

fbshipit-source-id: 2a28139d9d25ff738be7b49b24849c9d300ef9a9
2019-08-29 19:34:40 -07:00
1294e55c15 Assign each RpcAgent a unique ID, and use ID for sending RPC messages. (#24195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24195

It is not efficient to use a string destination name in every
send. Moreover, when we add RRef later, RpcAgent will frequently check
RRef ownership. It will be slow as well if we have to go though string
comparison every time. This commit assigns each RpcAgent a unique
integer ID. In the Python send API, applications can provide either
destination name or id. If it is a string name, it will be converted to
id by calling the get_id(workerName) API.

Test Plan: Imported from OSS

Differential Revision: D16770241

Pulled By: mrshenli

fbshipit-source-id: fa56128a77a02a402dc6682474bc301dc1b7f43d
2019-08-29 19:19:11 -07:00
629a2b3615 Remove unnecessary checks in InsertQuantDeQuantImpl (#25370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25370

Removes some checking code that is copied from insert_observers pass

Test Plan:
python test/test_jit.py 'TestJit.test_insert_quant_dequant'

Imported from OSS

Differential Revision: D17106633

fbshipit-source-id: 3c39be89dbf58dc6ffd63e1ee1283eba65243ea6
2019-08-29 18:59:18 -07:00
f495a3abac Skip inserting observers for Tensors inside fused op (#25281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25281

We want to skip inserting observers for the Tensors that's between the two
ops that will be fused, e.g. Conv -> ReLU, this PR just added this pattern,
but new patterns can be easily added in the future.

Test Plan:
python test test/test_jit.py -- 'TestJit.test_insert_observers_skip_values'

Imported from OSS

Differential Revision: D17106037

fbshipit-source-id: 49697f4d9598a461edc62a2b4148495764a99574
2019-08-29 18:19:26 -07:00
88a27ebb00 Per Channel Quantization Support for Quantized Linear Operator (#25276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25276

We add the per channel quantization support for the quantized linear operator, based on the recent added per channel quantization APIs in https://github.com/pytorch/pytorch/pull/24935 and https://github.com/pytorch/pytorch/pull/24934.
ghstack-source-id: 89267515

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear_unpack \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.3 sec
Building: finished in 5.7 sec (100%) 8114/8114 jobs, 0 updated
  Total time: 7.0 sec
Trace available for this run at /tmp/testpilot.20190827-141824.842847.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523
      ✓ caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQuantizedLinear) 0.996 1/1 (passed)
Test output:
> test_qlinear_unpack (test_quantized.TestQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.997s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5629499540372523
Summary (total time 5.05s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 0.9 sec
Building: finished in 6.4 sec (100%) 8114/8114 jobs, 2 updated
  Total time: 7.3 sec
Trace available for this run at /tmp/testpilot.20190827-141631.836596.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision c4cde854bae419be71282b0f92bf2d57a9203003 fbpkg f45bf410f1694a6882727cf03961702b at Mon Aug 26 22:10:29 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/686/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601
      ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestQuantizedLinear) 2.893 1/1 (passed)
Test output:
> test_qlinear (test_quantized.TestQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 2.893s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900049005601
Summary (total time 6.78s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestDynamicQuantizedLinear\)'  --print-passing-details
```
[jianyuhuang@devvm6560.prn2.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_qlinear \(test_quantized\.TestDynamicQuantizedLinear\)'  --print-passing-details
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.7 sec
Building: finished in 4.9 sec (100%) 8118/8118 jobs, 2 updated
  Total time: 6.6 sec
Trace available for this run at /tmp/testpilot.20190829-153630.613647.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision f39465ac7f6b26840c8cbd0ae5e367fb8a60ec24 fbpkg cf4e6efcd2fa4642b6f8c26a9bd98d67 at Tue Aug 27 21:58:47 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/687/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806
      ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear) 3.377 1/1 (passed)
Test output:
> test_qlinear (test_quantized.TestDynamicQuantizedLinear) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 3.378s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124657066806
Summary (total time 8.18s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D17057818

fbshipit-source-id: 9ad8b9120fd0d9933ca81c132da61b53e2c91b9e
2019-08-29 17:53:08 -07:00
3805be62c1 Skip test_compare_tensor_scalar due to overflow error (#25432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25432

Test fails without width argument (it was dropped from hypothesis).
Temporarily skipping until fixed.
ghstack-source-id: 89260995

Test Plan: N/A

Differential Revision: D17123571

fbshipit-source-id: 2fc934a005959a300c6a962d8507cf0aaa137be5
2019-08-29 17:47:03 -07:00
a9bb68d436 Update QNNPACK submodule to 7d2a4e9 (#25400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25400

Bring in fixes for clamp operator and tests

Test Plan: CI

Reviewed By: dreiss

Differential Revision: D17100464

fbshipit-source-id: b071a8266dbdef19aa7d58a66c43bfa97d59ce02
2019-08-29 16:37:22 -07:00
7b4eddede9 Delete toType(const DeprecatedTypeProperties&, ...) (#25332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25332

This method makes reference to a deprecated class, we now delete it.
This deletion was somewhat involved.  Pre-existing use sites of
toType:

- Tensor::cpu()/cuda()/hip()
- native::type_as
- SummaryOps: toType(CPU(kDouble)) translated into to(kDouble) as weights
  is an input argument and therefore assumed to be on CPU already.  Similar
  for CUDA.
- TensorTransformations: toType(CUDA(kLong)) translated into cuda(), as
  the inputs are actually already the correct dtype, and this translation is just to move them to CUDA
- Adjusted native_test to take TensorOptions instead of
  DeprecatedTypeProperties, killing toType along the way in favor of to
- Some tests for toType with UndefinedType which I just deleted
- CopyBackwards stores TensorOptions now instead of
  DeprecatedTypeProperties
ghstack-source-id: 89177526

Test Plan: sandcastle and ossci

Differential Revision: D17096824

fbshipit-source-id: 964e5a073b9d37594e911d8bca98c9eab5766826
2019-08-29 16:20:18 -07:00
d704097d33 Add Int8Transpose operator (#16382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16382

Adding an Int8TransposeOp that inherits from TransposeOp.
Small refactoring to normal TransposeOp to move main logic into a TransposeImpl
function.

Test Plan: int8_test.cc

Reviewed By: supriyar

Differential Revision: D13822715

fbshipit-source-id: a4d61bdf8e4e1d3f2e30b86d325810ed44c21635
2019-08-29 16:06:25 -07:00
e44c09ecae making quant utilities inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25054

Test Plan: Imported from OSS

Differential Revision: D16974198

Pulled By: zafartahirov

fbshipit-source-id: 54befc8429990adafe746d1255d117fca5f12e11
2019-08-29 16:03:13 -07:00
23fde77d3d Remove Module._backend as it's not used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25342

Test Plan: Imported from OSS

Differential Revision: D17101571

Pulled By: gchanan

fbshipit-source-id: 2cda46fe197e26a1cacb5e912f535809973d306e
2019-08-29 15:43:49 -07:00
f077847a45 Revert D17078081: Invariant typevar matching on callsite checks
Test Plan: revert-hammer

Differential Revision:
D17078081

Original commit changeset: 54476469679a

fbshipit-source-id: 88a25dc1877caae9ba967e747cd0ebdfe996a345
2019-08-29 15:04:16 -07:00
247cac263f Revert D17003555: Multiple fixes to test_c10d.py.
Test Plan: revert-hammer

Differential Revision:
D17003555

Original commit changeset: 0e0429367fb6

fbshipit-source-id: 622b5fc09e5f50dccca9ff295c62999bc8528ead
2019-08-29 14:59:49 -07:00
04764d5751 Fix allreduce_coalesced tests in c10d (#25419)
Summary:
1. `test_allreduce_coalesced_stress` is flaky.
2. `test_allreduce_coalesced_checks` uses GPU but didn't claim so.

cc jfc4050
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25419

Differential Revision: D17119311

Pulled By: mrshenli

fbshipit-source-id: f560b126d6bc01363a14bdf6d697ecd55c4db468
2019-08-29 14:46:07 -07:00
2513ca66ca Add guards for using named tensor with serialization and multiprocessing (#25345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25345

Test Plan
- New tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17101486

Pulled By: zou3519

fbshipit-source-id: 58e803b042056ee6abab8551517f74078f2b81d5
2019-08-29 14:10:33 -07:00
0bb69f6071 Add guard for named tensors in the JIT (#25344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25344

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17101487

Pulled By: zou3519

fbshipit-source-id: d6170a809dfd98e6a4dba8450433c439962991cc
2019-08-29 14:10:28 -07:00
8640aef505 Add support for non-affine batch norm with float stats and half inputs (#22750)
Summary:
This PR creates support for non-affine batch norm with float running estimates and half inputs.
Changed were made similar to https://github.com/pytorch/pytorch/issues/16735.

I couldn't find a specific test for `SyncBatchNorm`, so I used [this code](https://gist.github.com/ptrblck/ab45bfcde6df55ac28a7be18531f4718) to test it.

cc ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22750

Differential Revision: D17119965

Pulled By: ezyang

fbshipit-source-id: 2e8c5d63fc3c636b8a1338c43c9c101a0f5e9b22
2019-08-29 14:04:37 -07:00
fe922a2e84 Fix item() call in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25404

Pulled By: driazati

Differential Revision: D17116098

fbshipit-source-id: e365f254f38a3134898817d75201dd9ae009ecb4
2019-08-29 13:50:04 -07:00
1ea1d7f095 Fixed masking warnings in tests (#25317)
Summary:
Fixing deprecation warnings in tests related to uint8 masking and indexing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25317

Differential Revision: D17099063

Pulled By: izdeby

fbshipit-source-id: 49f1d85dcd9464d61e3156eebc07390e9f6fa1b4
2019-08-29 12:13:52 -07:00
8dcd256201 Memory layout for pooling ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25374

Test Plan: Imported from OSS

Differential Revision: D17107577

Pulled By: jamesr66a

fbshipit-source-id: e40dacaddf5ee17e6483be9e9302d3afc1a708c7
2019-08-29 11:43:03 -07:00
2339d9f19c Updating submodules
Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 520f0f1681258afd1d09118cf679228160bf8d1a
2019-08-29 11:36:36 -07:00
4cdce0da71 Multiple fixes to test_c10d.py. (#25334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25334

1) There was a bug in https://github.com/pytorch/pytorch/pull/25012, where the
tests which needed to be skipped for return code checking was incorrect.
2) Added proper setup and teardown for the nccl_error tests.
3) Ensure AssertionError is not ignored for tests that skip return code
checking.

Test Plan: unit tests

Differential Revision: D17003555

fbshipit-source-id: 0e0429367fb6dae251b74e9f8b2baa67a48a0d22
2019-08-29 11:33:55 -07:00
0604b45f23 pytorch android circleci integration (#25286)
Summary:
Introducing circleCI jobs for pytorch_android gradle builds, the ultimate goal of it at the moment - to run:
```
gradle assembleRelease -p ~/workspace/android/pytorch_android assembleRelease
```

To assemble android gradle build (aar) we need to have results of libtorch-android shared library with headers for 4 android abis, so pytorch_android_gradle_build requires 4 jobs
```
  - pytorch_android_gradle_build:
      requires:
        - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
        - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build
        - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build
        - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build
```
All jobs use the same base docker_image, differentiate them by committing docker images with different android_abi -suffixes (like it is now for xla and namedtensor): (it's in `&pytorch_linux_build_defaults`)
```
      if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
      elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-xla
      elif [[ ${BUILD_ENVIRONMENT} == *"-x86"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-android-x86
      elif [[ ${BUILD_ENVIRONMENT} == *"-arm-v7a"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a
      elif [[ ${BUILD_ENVIRONMENT} == *"-arm-v8a"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a
      elif [[ ${BUILD_ENVIRONMENT} == *"-x86_64"* ]]; then
        export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64
      else
        export COMMIT_DOCKER_IMAGE=$output_image
      fi
```
pytorch_android_gradle_build job copies headers and libtorch.so, libc10.so results from libtorch android docker images, to workspace first and to android_abi=x86 docker image afterwards, to run there final gradle build calling `.circleci/scripts/build_android_gradle.sh`

For PR jobs we have only `pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build` libtorch android build => it will have separate gradle build `pytorch_android_gradle_build-x86_32` that does not do docker copying,
it calls the same `.circleci/scripts/build_android_gradle.sh` which has only-x86_32 logic by condition on BUILD_ENVIRONMENT:
`[[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]`
And has filtering to un only for PR as for other runs we will have the full build. Filtering checks `-z "${CIRCLE_PULL_REQUEST:-}"`
```
    - run:
        name: filter_run_only_on_pr
        no_output_timeout: "5m"
        command: |
          echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
          if [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then
            circleci step halt
          fi
```

Updating docker images to the version with gradle, android_sdk, openjdk - jenkins job with them https://ci.pytorch.org/jenkins/job/pytorch-docker-master/339/

pytorch_android_gradle_build successful run: https://circleci.com/gh/pytorch/pytorch/2604797#artifacts/containers/0
pytorch_android_gradle_build-x86_32 successful run: https://circleci.com/gh/pytorch/pytorch/2608945#artifacts/containers/0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25286

Reviewed By: kostmo

Differential Revision: D17115861

Pulled By: IvanKobzarev

fbshipit-source-id: bc88fd38b38ed0d0170d719fffa375772bdea142
2019-08-29 11:29:23 -07:00
cad3abb036 Adding ModuleList to modules.h (#25346)
Summary:
Here is a PR adding ```ModuleList``` to ```modules.h``` so that it can be used by including ```torch/torch.h```.

yf225 edit: Closes https://github.com/pytorch/pytorch/issues/25293.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25346

Differential Revision: D17115013

Pulled By: yf225

fbshipit-source-id: 38a1848b9a8272fa411865dfc83b76d10c5789a0
2019-08-29 10:49:22 -07:00
e231bd16fb Revert D17112656: [pytorch][PR] fix bug in assertNotEqual for int tensors
Test Plan: revert-hammer

Differential Revision:
D17112656

Original commit changeset: 43e0e7da6d58

fbshipit-source-id: 0a0f7b8b125f24a45023ddb46fe144f21499b723
2019-08-29 10:36:56 -07:00
8cdad0ab9f Remove a unused member var (stop_) in process_group_agent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25392

Differential Revision: D17114129

Pulled By: mrshenli

fbshipit-source-id: fe2513f694751a22d47783dd9fafade8ddc3b559
2019-08-29 10:19:40 -07:00
e59bbc82a0 Upgrade to circleci version 2.1 configs (#25336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25336

1. Remove versions from workflows
2. Escape heredoc `<<` used in shells
3. Replace "." with "_" in binary job names (we already do the same for other jobs)
4. (Bonus), fix `should_run_job.py` it so that commits with `[ci]` don't accidentally skip all jobs

Let's see if it works

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25336

Test Plan: Imported from OSS

Differential Revision: D17114619

Pulled By: suo

fbshipit-source-id: 722606ad862af565cd0ba4bb539daeb9d8f5da71
2019-08-29 10:10:38 -07:00
a8ae33ce27 Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (#25339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25339

This is to get rid of backend-specific dispatch in modules; this autograd function is no longer backend specific so
doesn't need to be in a backend specific location.

Test Plan: Imported from OSS

Differential Revision: D17101576

Pulled By: gchanan

fbshipit-source-id: f4f0bd3ecc2d4dbd8cdfedbaabcadb8c603d2507
2019-08-29 09:55:11 -07:00
7a9f37d7af Kill backend-specific lookup in CrossMapLRN2d, as it never succeeds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25331

Test Plan: Imported from OSS

Differential Revision: D17097300

Pulled By: gchanan

fbshipit-source-id: 571e9691da13d34206ff3aabb8cb0cd1e82f6097
2019-08-29 09:55:07 -07:00
66e521edd5 Kill ConvTransposeMixin.forward, which doesn't seem to be used. (#25326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25326

And also uses self._backend, which I'm trying to kill or at least drastically reduce.

Test Plan: Imported from OSS

Differential Revision: D17097303

Pulled By: gchanan

fbshipit-source-id: f55d7df2a668425978499d4a4338b23ba6cf1b90
2019-08-29 09:55:02 -07:00
2e934c78dd Remove THNN sparse autograd Functions. (#25323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25323

They don't seem to be used anymore.

Test Plan: Imported from OSS

Differential Revision: D17097302

Pulled By: gchanan

fbshipit-source-id: dc1133e32586818a9b2e2b7560d259d36c7b36f6
2019-08-29 09:54:58 -07:00
c2e1cb38fd Fix dependency by moving Dimname.{h,cpp} NamedTensor.{h,cpp} to core/ (#25280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25280

`ATen/core/Tensor.h` and `ATen/core/TensorMethods.h` both depend on
Dimname.h and NamedTensor.h. Therefore `Dimname.h` and `NamedTensor.h`
should really be in `ATen/core`. It's not a problem right now because
this dependency chain (core files cannot depend on non-core files) isn't
enforced in our OSS builds, but it is necessary to resolve this before
removing the BUILD_NAMEDTENSOR flag.

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17087195

Pulled By: zou3519

fbshipit-source-id: f06e4268d91fabadb04b41d5b78fb0e530f030fd
2019-08-29 09:46:35 -07:00
cb022d7bec Fix AliasAnalysisKind::PURE on MSVC (#25375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25375

Either MSVC or the Windows headers have a PURE macro defined and will replace
any occurrences of the PURE token in code with an empty string. Replace
AliasAnalysisKind::PURE with AliasAnalysisKind::PURE_FUNCTION.

Note: this is bc breaking.
ghstack-source-id: 89202222

Test Plan: unit tests

Differential Revision: D17107743

fbshipit-source-id: 899a20651ba32d50691956b5424b351586c21cec
2019-08-29 09:42:41 -07:00
07fe66f25e logical_xor doc cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25364

Differential Revision: D17105048

Pulled By: gchanan

fbshipit-source-id: 8bef3e330ef00decb3118a5ae7d17308a58878a2
2019-08-29 09:09:16 -07:00
58a0dee749 Replace open registration TensorTypeId with closed enum. (#25252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252

Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system.  This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.

There are some codemods in this diff:

```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```

The main hand-written changes are in c10/core/TensorTypeId.h

Other manual fixes:

- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
  std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
  wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
  of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration

Differential Revision: D17072001

Test Plan: ossci and sandcastle

Pulled By: ezyang

fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
2019-08-29 08:55:58 -07:00
2e1c37c95c Move the CUDA implementation of ceil to ATen. (#24866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24866

Fix #24542

Test Plan: Imported from OSS

Differential Revision: D16965903

Pulled By: VitalyFedyunin

fbshipit-source-id: b9decaa58bec813a23d369b5e1eec627599f41da
2019-08-29 08:48:31 -07:00
1f21c422e4 Add missing call to DistAutogradContainer::init (#25391)
Summary:
cc pritamdamania87
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25391

Differential Revision: D17113014

Pulled By: mrshenli

fbshipit-source-id: 2de37b832be1dd7a68ecd3576d93d72a960648b5
2019-08-29 08:44:52 -07:00
c84dfa8fa3 Issue #24962: Fix cuda method to support "None" arg for device and a … (#25018)
Summary:
…default value

Addresses https://github.com/pytorch/pytorch/issues/24962.  A valid (and the default) value for the `device` parameter in the `cuda` method is `None`.  Type signature was returning invalid linter errors in PyCharm.  Verified fix in latest PyCharm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25018

Differential Revision: D17098520

Pulled By: VitalyFedyunin

fbshipit-source-id: d83eb9976f09c75b4a033cb49c81d972e3fd37c1
2019-08-29 08:16:58 -07:00
2e3a37e630 Kill THNN function auto generation. (#25322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25322

As far as I can tell, none of these are actually used anymore.

Test Plan: Imported from OSS

Differential Revision: D17097301

Pulled By: gchanan

fbshipit-source-id: 649ee0fd549f6e2a875faef7c32b19c70bb969b6
2019-08-29 07:54:07 -07:00
8145dd35ef Describe the relation between fold and unfold operations. (#24840)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/21817.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24840

Differential Revision: D17113060

Pulled By: ezyang

fbshipit-source-id: 1f1a010d84582a943de7b1173c09e91fb0bd22ce
2019-08-29 07:48:25 -07:00
c845984271 CUDA_KERNEL_LOOP: prevent int overflow in loop increment. (#24818)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24309.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24818

Differential Revision: D16927215

Pulled By: ezyang

fbshipit-source-id: aeab5226fec6045941399693479975db4542c79e
2019-08-29 07:38:55 -07:00
6d66902a81 Re-enable libtorch tests on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25377

Differential Revision: D17113007

Pulled By: ezyang

fbshipit-source-id: 5442fe41b971cd7f63244e503103c64ef2c2d816
2019-08-29 07:35:24 -07:00
1e2b19db6d fix bug in assertNotEqual for int tensors (#25199)
Summary:
assertNotEqual was failing to detect differences in int tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25199

Differential Revision: D17112656

Pulled By: nairbv

fbshipit-source-id: 43e0e7da6d58eb1c837a508d462a748b2065bdd9
2019-08-29 07:32:50 -07:00
e8acc2ebb1 Removing future imports from the test fixtures.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25296

Test Plan: Imported from OSS

Differential Revision: D17090201

Pulled By: zafartahirov

fbshipit-source-id: 5a4f6ac0ea475b55d2c610e2f9f4f0cef8690e8f
2019-08-29 01:39:59 -07:00
07db41bb07 Remove spurious print
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25378

Pulled By: driazati

Differential Revision: D17109684

fbshipit-source-id: 0d437b81c5d765427d129eeb217ea2a951c426d3
2019-08-29 00:49:22 -07:00
b7d992eb46 Integration tests for qconfig_dict (#25217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25217

att

Test Plan:
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17065129

fbshipit-source-id: 05c5a31d5768a46521bbdbac4df79d40fe06f8fc
2019-08-28 22:20:17 -07:00
910d2f18fc Implement FoldConvBatchnorm2d pass. (#25282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25282

For now it will be used in quantization, but it can be used as a
standalone pass too.

Couple of things are not finished at this moment:
- Batchnorm.eps value is hardcoded. This is bad and wrong, but we cannot
access fields listed in __constants__ from IR now. Once we fix this, we
should remove the hardcoded value.
- We do not remove Batchnorm submodules from the parent module even when
they were merged into a Conv. Once we figure out API for removing
attributes and modules, we should fix this.

Test Plan: Imported from OSS

Differential Revision: D17086611

Pulled By: ZolotukhinM

fbshipit-source-id: d58a947a3b2205d8f3629d693b70b9ad2b5a9102
2019-08-28 21:56:05 -07:00
96db3ad413 insert_quant_dequant work with qconfig_dict (#25127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25127

Extend insert_quant_dequant pass to go through forward function graphs

Test Plan:
```
python test/test_jit.py 'TestJit.test_insert_quant_dequant'
python test/test_quantizer.py
```

Imported from OSS

Differential Revision: D17001137

fbshipit-source-id: 41b029906fe5c8bc0de01956059388a7d552a380
2019-08-28 21:43:29 -07:00
11b4d57711 insert_observers use qconfig_dict (#25069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25069

This PR changes the API of insert_observers to use qconfig_dict,
full functionality support will come in later PRs

Test Plan:
```
python test/test_quantizer.py
python test/test_jit.py
```

Imported from OSS

Differential Revision: D17001135

fbshipit-source-id: 16df6fa521fcc0c9e268a375be8e1a630e77011a
2019-08-28 21:07:31 -07:00
efe808b326 Fix old annotate() error (#25261)
Summary:
Fixes #25067

](https://our.intern.facebook.com/intern/diff/17103889/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25261

Pulled By: driazati

Differential Revision: D17103889

fbshipit-source-id: bd94cb36cf4829e63ad39ae169047b9b9e857679
2019-08-28 20:50:24 -07:00
490eb7fed9 Add GET_ATTR instruction (#25151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25151

The prim::GetAttr operator depends on node. However, in lite interpreter there will be no node dependency. Promote the operator to a first-class instruction.

Test Plan: Imported from OSS

Differential Revision: D17076412

fbshipit-source-id: 8de20978445bb598634c5462e66e4459dcd567be
2019-08-28 20:45:55 -07:00
5dd01a7eea Pull instruction definitions out of interpreter.cpp. (#25148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25148

Instructions will be used in lite interpreter as well. Pull it out of interpreter.cpp, so that the lite interpreter doesn't have to compile with interpreter.cpp.

Test Plan: Imported from OSS

Differential Revision: D17076413

fbshipit-source-id: 99b3d8d27a96823a4a4dde6b2337ee44635e34cb
2019-08-28 20:17:36 -07:00
8456c96967 Make quantized relu ops inherit the memory format from input
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25271

Test Plan: Imported from OSS

Differential Revision: D17107576

Pulled By: jamesr66a

fbshipit-source-id: 43cdf5d9a9321113cbb28f365a761b0bdd390926
2019-08-28 19:58:25 -07:00
f88f9e1331 Ensure quantized::add stride matches inputs (#25265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25265

This ensures that the output strides match the input strides. Previously, we would degenerate down to slow scalar code because the call to _empty_affine_quantize would produce a tensor with different strides than the operands. When this mismatch occurs, TensorIterator uses the scalar code. This fixes that

Benchmark script:
```
import torch, time
x = torch.rand(1, 56, 56, 256)
y = torch.rand(1, 56, 56, 256)

qX = torch.quantize_linear(x, 0.1, 128, torch.quint8)
qY = torch.quantize_linear(y, 0.1, 128, torch.quint8)

s = time.time()
for i in range(1000):
    x + y
print('float contig', time.time() - s)

s = time.time()
for i in range(1000):
    torch.ops.quantized.add(qX, qY, 0.5, 1)
print('quantized contig', time.time() - s)

x = torch.rand(1, 56, 56, 256)
y = torch.rand(1, 56, 56, 256)
qX = torch.quantize_linear(x, 0.1, 128, torch.quint8).permute([0, 3, 1, 2])
qY = torch.quantize_linear(y, 0.1, 128, torch.quint8).permute([0, 3, 1, 2])
x = x.permute([0, 3, 1, 2])
y = y.permute([0, 3, 1, 2])

s = time.time()
for i in range(1000):
    x + y
print('float strided', time.time() - s)

s = time.time()
for i in range(1000):
    torch.ops.quantized.add(qX, qY, 0.5, 1)
print('quantized strided', time.time() - s)
```

Before this change

```
$ OMP_NUM_THREADS=1  python cmp.py

float contig 0.4625673294067383
quantized contig 1.8083674907684326
float strided 0.46366071701049805
quantized strided *8.30056643486023*
```

After this change

```
$ OMP_NUM_THREADS=1  python cmp.py

float contig 0.48703694343566895
quantized contig 2.0587124824523926
float strided 0.4711723327636719
quantized strided *2.0382332801818848*
```

Test Plan: Imported from OSS

Differential Revision: D17077811

Pulled By: jamesr66a

fbshipit-source-id: 25f52743081162122dfc9eb4bc39185d4cc4ba3b
2019-08-28 19:58:21 -07:00
fa902c58ee fix inliner bug (#25052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25052

Previously we would not inline nested functions, now we do.

Test Plan: Imported from OSS

Differential Revision: D16973848

Pulled By: suo

fbshipit-source-id: 94aa0b6f84a2577a663f4e219f930d2c6396d585
2019-08-28 19:45:47 -07:00
18c77dd243 Quantized comparators (#24387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24387

Differential Revision: D16824421

Test Plan: Imported from OSS

Pulled By: zafartahirov

fbshipit-source-id: aae5495fd2d50095c9bac6424d77343e3d09876f
2019-08-28 19:22:08 -07:00
7818e7e5d4 Basic framework for Distributed Autograd context. (#24875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24875

As per https://github.com/pytorch/pytorch/issues/23110, each autograd pass
would be assigned a unique autograd_context_id. In this change we introduce a
DistAutogradContainer per worker which holds information for each autograd pass
currently running.

DistAutogradContainer has a map from the autograd_context_id to
DistAutogradContext (which holds all the relevant information for the autograd
pass). DistAutogradContext currently only stores the autograd_context_id and
more information would be added to it later as we build out the rest of the
framework.

The autograd_context_id is a 64 bit globally unique integer where the first 16
bits are the worker_id and next 48 bits are auto-incrementing for uniqueness.

Sample python code on how this would be used for distributed autograd:

```
import torch.distributed.autograd as dist_autograd
worker_id = 0
dist_autograd.init(worker_id)
with dist_autograd.context() as context_id:
     # forward pass...
     # backward pass...
     # optimizer step...
```
ghstack-source-id: 89119248

Test Plan: unit tests.

Differential Revision: D16356694

fbshipit-source-id: d1a8678da0c2af611758dbb5d624d554212330ce
2019-08-28 18:51:56 -07:00
8e189a327c Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25371

Pulled By: driazati

Differential Revision: D17106672

fbshipit-source-id: eab87a22798da40dd10487dc2f4b1528bd1f703e
2019-08-28 18:25:19 -07:00
91db62a8bb Invariant typevar matching on callsite checks (#25136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25136

Previously we were calling unifyType to match typevars at callsites.
unifyType actually does merging (e.g. to handle control flow joins)
so its effect at callsites was bivariance, allowing typevar bindings
to widen as new concrete types were encountered in arguments.

Fixes issue #24856

Strip refinements when doing invariant matching on type vars.

Previous change (bivariance to invariance) makes type matching
sensitive to the addition of type refinements. Use unshapedType
to avoid considering refinements when doing matching.

Test Plan: Imported from OSS

Differential Revision: D17078081

Pulled By: bhosmer

fbshipit-source-id: 54476469679af698cfe9bd020a39de31271f52cc
2019-08-28 17:37:09 -07:00
43c4b9f2a5 Add source location to class instantiation error (#24990)
Summary:
Fixes #24987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24990

Pulled By: driazati

Differential Revision: D17099779

fbshipit-source-id: 296e2b4ccc3fddabd4998497d0753e99680ba92d
2019-08-28 17:14:00 -07:00
05f1fed693 Add OneCycleLR (#25324)
Summary:
Squash rebase of https://github.com/pytorch/pytorch/issues/21258

ghstack-source-id: 7d3ce522ac4dd3050bc6c6bbda1eaaeb8bc4b2c1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25324
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25325

Differential Revision: D17095722

Pulled By: vincentqb

fbshipit-source-id: 7fe69b210924ee3b39223dd78122aea61267234a
2019-08-28 16:59:40 -07:00
1b7f7aa12a change LBFGS's default tolerance_grad to 1e-7 (#25240)
Summary:
Hi,

I noticed after v1.2.0 the implement of LBFGS optimizer has been changed. In this new implement, the return condition has been changed from the sum of the gradients to the max value in the gradients (see: b15d91490a/torch/optim/lbfgs.py (L313)). But the default tolerance_grad parameter has not been changed (which is too large for max of gradients), so this result in lots of my old codes not optimizing or only optimizing for one or two steps.

So, I came up this pull request to suggest that changing this tolerance_grad to a smaller value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25240

Differential Revision: D17102713

Pulled By: vincentqb

fbshipit-source-id: d46acacdca1c319c1db669f75da3405a7db4a7cb
2019-08-28 16:46:04 -07:00
f362a5a04b Revert "Let logical_xor support non-bool tensors." (#25269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25269

This reverts commit 5ca612b55ec1205f98e6bc5d5e64b1bf35f3b3cd.

Test Plan: Imported from OSS

Differential Revision: D17080088

fbshipit-source-id: e6b6215b713910c448e9a6b831b08f28b849c64a
2019-08-28 15:41:51 -07:00
44bd63c7a1 don't throw in constant prop (#25270)
Summary:
Don't throw in constant propagation, since the op we're running may not be reached. Previously we would only only catch `C10::Error`; however it's hard to maintain that the entire codebase doesn't throw any other types of errors, and some errors map nicely to python errors, like `std::index_error` to IndexError.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25270

Differential Revision: D17102545

Pulled By: eellison

fbshipit-source-id: 9fd485821743ad882e5c6fc912ca47b0b001b0e9
2019-08-28 15:34:01 -07:00
5dd915cd1a @albanD's #15219 augmented with SavedVariable::weak_grad_fn_ (#23502)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10532 via https://github.com/pytorch/pytorch/issues/15219 and lots of analysis by albanD.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23502

Differential Revision: D16881340

Pulled By: ezyang

fbshipit-source-id: b483fe6c89ed9d27674c3347c043fe509ba80007
2019-08-28 14:58:53 -07:00
eb2c5930b2 contrib-tensorboard: removed external tensorboardX dependency (#25259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25259

Switching to tensorboard instead of tensorflow

Test Plan: went through instructions in [fbsource/fbcode/caffe2/caffe2/contrib/tensorboard/tensorboard.md] to make sure everything is working (using/not using tensorboard/tensorflow)

Reviewed By: orionr

Differential Revision: D17059111

fbshipit-source-id: aaa26dec840fb517b3bc7dc988f3a8c54566d356
2019-08-28 13:55:25 -07:00
df51cbe397 Include the correct header for make_unique in named tensor headers (#25178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25178

Previously, we were using torch/csrc/utils/memory.h. This switches those
headers to be c10/util/C++17.h.

Context: ATen and torch are the same library now, so one can call code
in torch from ATen. However, I haven't seen an example of that yet
(aside from the named tensor code that uses make_unique from torch). In
this PR I try to maintain the ATen / torch separation just in case it
matters.

Test Plan
- Check that code compiles [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17051453

Pulled By: zou3519

fbshipit-source-id: 44b6393a748bdb1e671ecb1e9a615c33202e8515
2019-08-28 13:51:08 -07:00
6f5fe96c80 Implement name inference for torch.matmul (#25177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25177

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17051452

Pulled By: zou3519

fbshipit-source-id: 7259cdb7ba7f480035528cf3c60ef6d051e42db5
2019-08-28 13:51:04 -07:00
d2719b549d Implement name inference for torch.bmm (#25123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25123

The approach is different for CPU and CUDA. In particular:
- in CPU, I added a name inference rule to bmm_out
- in CUDA, bmm calls THCTensor_(baddbmm) so I added a name inference
rule to that.

When one calls baddbmm on CPU or CUDA, it'll error out with NYI due to
named_guard: True on it in native_functions.yaml. I'm not planning on
implementing baddbmm soon because it's a little tricky to add it to CPU
and bmm is more commonly used function.

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16998073

Pulled By: zou3519

fbshipit-source-id: 8dc01898964318717911f28eebd6cdfffc7dfcf2
2019-08-28 13:51:00 -07:00
eb756746ab Fix possible deadlock in SharedCache inside a forked child proc (#25158)
Summary:
Related: https://github.com/pytorch/pytorch/issues/24927#issuecomment-524608021

`fork` inherits lock state. So if we happen to unfortunately fork when the `SharedCache` lock is held. We could deadlock in the child process when some code tries to acquire it.

Following pytorch multiprocessing library design, this patch resets the lock to a new object after fork. A similar example from python core lib for `multiprocessing.Queue` is :

```py
class Queue(object):
    def __init__(self, ...):
        ...
        self._after_fork()
        if sys.platform != 'win32':
            register_after_fork(self, Queue._after_fork)

    def _after_fork(self):
        debug('Queue._after_fork()')
        self._notempty = threading.Condition(threading.Lock())
        self._buffer = collections.deque()
        self._thread = None
        self._jointhread = None
        self._joincancelled = False
        self._closed = False
        self._close = None
        self._send_bytes = self._writer.send_bytes
        self._recv_bytes = self._reader.recv_bytes
        self._poll = self._reader.poll
```

d4d60134b2/Lib/multiprocessing/queues.py (L54-L78)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25158

Differential Revision: D17091227

Pulled By: soumith

fbshipit-source-id: ee7130f47d7bbd42fc34a2598f1f6974d8d7cdb7
2019-08-28 13:34:03 -07:00
805cf983b9 Fixes test_equal
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25275

Test Plan: Imported from OSS

Differential Revision: D17083204

Pulled By: zafartahirov

fbshipit-source-id: ecad9761dbf6cb27ae570485ee00eb8bffef60f5
2019-08-28 13:18:24 -07:00
06757acb30 Refactor MinMax observer (#23902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23902

Copied from Daya's diff in pytorch/pytorch #23191

Refactor MinMax observer and create the base observer class to prepare for future observers such as histogram observer.
ghstack-source-id: 89146014

Test Plan:
Added a test test_minmax_observer

buck test mode/dev caffe2/test:quantization -- 'test_minmax_observer'

```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635
      ✓ caffe2/test:quantization - test_minmax_observer (test_quantization.ObserverTest) 0.055 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2533274797931635
Summary (total time 4.26s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195
      ✓ caffe2/test:quantization - test_observer_scriptable (test_quantization.ObserverTest) 1.762 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5348024563344195
Summary (total time 6.02s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16663221

fbshipit-source-id: 3d0e1aa9e4d27808e61b10604782606de067a34a
2019-08-28 13:12:38 -07:00
e335cc3a95 Fix named tensor test (#25313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25313

`sign` was recently ported from TH to ATen, undoing some named tensor
changes and breaking the CI named tensor test. This PR re-enables named tensor
for `sign`.

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D17093439

Pulled By: zou3519

fbshipit-source-id: 11185ad88a0eaf56078b94e9547bbbd6d02d0aab
2019-08-28 12:53:06 -07:00
0cc92de447 Extend nn.Transformer to support BERT (gelu) (#24181)
Summary:
To use transformer for BERT, we need `gelu` activation. https://github.com/pytorch/pytorch/issues/24177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24181

Differential Revision: D16790327

Pulled By: zhangguanheng66

fbshipit-source-id: b4eed21ad1a4d753bb090fa7fd78886714a6d761
2019-08-28 12:39:47 -07:00
6100de9b1b implement bool_tensor.bernoulli_ (#25076)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25072
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25076

Differential Revision: D17073453

Pulled By: ezyang

fbshipit-source-id: 42410da8c9911c1d7b3543bde740c7e66ae0cc1c
2019-08-28 12:25:27 -07:00
5248dd1a51 Use C10_DEPRECATED_MESSAGE instead of TORCH_WARN_ONCE for Tensor.data<T>() (#25319)
Summary:
Using `TORCH_WARN_ONCE` for `Tensor.data<T>()` is still causing deadlocks internally. According to Dima: "So the problem seems to be in TORCH_WARN/c10::Warning::warn which produces a warning - we setup a wrapper that sends the message back to python land. But doing so requires acquiring GIL and it somehow deadlocks. In general using TORCH_WARN in so low-level API is dangerous as there's no guarantee whether we're running under GIL or not."

In order to avoid causing accidental deadlocks in other code including external extensions, the use of `TORCH_WARN_ONCE` in `Tensor.data<T>()` is changed to `C10_DEPRECATED_MESSAGE` in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25319

Reviewed By: dzhulgakov

Differential Revision: D17094933

Pulled By: yf225

fbshipit-source-id: e29dc35187f73ca7865cfa5a9ecde708cd237c58
2019-08-28 12:25:23 -07:00
80974dde4c Move new_criterion_tests from test_nn.py to common_nn.py (#25333)
Summary:
Moving so that `new_criterion_tests` can be used from `test_cpp_api_parity.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25333

Differential Revision: D17097188

Pulled By: yf225

fbshipit-source-id: 7f7905cc6799bca8dc6b3c9cc43995313c6bc058
2019-08-28 12:22:15 -07:00
d0a525b592 Remove unused THTensor_(add) and similar functions code. (#24864)
Summary:
Remove unused:

TH_API void THTensor_(add)(THTensor *r_, THTensor *t, scalar_t value);
TH_API void THTensor_(sub)(THTensor *r_, THTensor *t, scalar_t value);
TH_API void THTensor_(add_scaled)(THTensor *r_, THTensor *t, scalar_t value, scalar_t alpha);
TH_API void THTensor_(sub_scaled)(THTensor *r_, THTensor *t, scalar_t value, scalar_t alpha);
THC_API void THCTensor_(add)(THCState *state, THCTensor *self, THCTensor *src, scalar_t value);
THC_API void THCTensor_(sub)(THCState *state, THCTensor *self, THCTensor *src, scalar_t value);
THC_API void THCTensor_(add_scaled)(THCState *state, THCTensor *self, THCTensor *src, scalar_t THC_API void THCTensor_(sub_scaled)(THCState *state, THCTensor *self, THCTensor *src, scalar_t
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24864

Differential Revision: D16916608

Pulled By: VitalyFedyunin

fbshipit-source-id: d6182638bde196031e2cdbee898b117e08634ddd
2019-08-28 12:05:32 -07:00
509abd9a81 Fix typo "takes takes" -> "takes"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24785

Differential Revision: D16881150

Pulled By: ezyang

fbshipit-source-id: ffdd9b8df4c8de84f8bba33bfc4d7aba114022ce
2019-08-28 11:38:38 -07:00
53ac931af2 Disable cuda_distributions_test and converter_nomigraph_test on Windows. (#25305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25305

See https://github.com/pytorch/pytorch/issues/25304 and https://github.com/pytorch/pytorch/issues/25312

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D17092777

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 5c02a7ca6ead62bed214bc0bdcc0398f3aff2484
2019-08-28 11:06:10 -07:00
590619ab8c Support all_reduce a list of same-device tensors #21640 (#24949)
Summary:
addresses https://github.com/pytorch/pytorch/issues/21640 for CPU tensors and the Gloo backend.

Questions:
- ~~currently takes `AllreduceOptions`, since all of the options are the same. Would it be better to make a new `AllreduceCoalescedOptions` class?~~
- ~~I decided to inherit from `ProcessGroupGloo::AsyncWork` instead of `AsyncAllreduceWork` to shorten the inheritance chain a bit and for consistency with existing classes. However, this means that the two `getFunction` methods are copy-pasted. Would inheriting from `AsyncAllreduceWork` be preferable?~~
- ~~should the work class be named `AsyncCoalescedAllreduceWork` or `AsyncAllreduceCoalescedWork`?~~

thank you!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24949

Differential Revision: D17055580

Pulled By: mrshenli

fbshipit-source-id: e63b5fcaec6021053ea960776a09ee8cf11d1ec2
2019-08-28 10:57:37 -07:00
afb7a162fb Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (#24943)
Summary:
https://github.com/pytorch/pytorch/issues/24560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24943

Differential Revision: D16996434

Pulled By: ifedan

fbshipit-source-id: 77111a4e47bb2b20f65225d48e7213cd77ddae19
2019-08-28 09:30:08 -07:00
112f249446 Port pow operator from the TH code to Aten (#23492)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/24750
```
DEBUG = 0
OMP_NUM_THREADS = 1

import torch

base = torch.randn(1000000)
exp  = torch.randn(1000000)
out  = torch.empty_like(base)

timeit base.pow(0)							+30x
old 6.26 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 213 µs ± 3.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(1/3)						+6x
old 56 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.41 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(-1/3)						+6x
old 57 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.49 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(1/2)						+6x
old 4.04 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 620 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-1/2)						+5x
old 6.56 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 1.24 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(1)							no diff
old 322 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
new 331 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-1)							+3.5x
old 2.48 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 717 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(2)							no diff
old 328 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
new 324 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-2)							+3.5x
old 2.45 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 662 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(3)							+7x
old 2.39 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 334 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit base.pow(-3)							+9x
old 93.7 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10.3 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(123456.789)					+5x
old 46.5 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.68 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(-123456.789)				+5x
old 46.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit base.pow(exp)						+6x
old 60.6 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(0, exp)					no diff
old 18.3 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 21.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit torch.pow(1, exp)					+30x
old 6.01 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 203 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit torch.pow(-1, exp)					+3x
old 30.8 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.67 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(42, exp)					+8x
old 80.1 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.51 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(-42, exp)					+2x
old 21.8 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.5 ms ± 89.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(0, exp, out=out)			no diff
old 20.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 22.1 ms ± 648 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

timeit torch.pow(1, exp, out=out)			+30x
old 6.7 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
new 203 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

timeit torch.pow(-1, exp, out=out)			+3x
old 32.5 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.4 ms ± 99.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(42, exp, out=out)			+10x
old 91 ms ± 7.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 9.64 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

timeit torch.pow(-42, exp, out=out)			+2.5x
old 25.9 ms ± 5.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
new 10.1 ms ± 698 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```

BC: enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.
BC: enforce stronger integer tensor base power integer exponent requirement on CPU and CUDA: `Integers to negative integer powers are not allowed.`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23492

Differential Revision: D16731583

Pulled By: pbelevich

fbshipit-source-id: 4e5bf689357fe82a19371e42d48abbb7b4c1c3ca
2019-08-28 09:11:50 -07:00
d7cce32303 note location
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25311

Differential Revision: D17093302

Pulled By: soumith

fbshipit-source-id: 14510351cf3f1568cfc415488eb0ba05a8af6cf8
2019-08-28 08:55:00 -07:00
9b1097958e Migrate digamma\digamma_\polygamma\polygamma_ from the TH to Aten (CPU) (#25048)
Summary:
https://github.com/pytorch/pytorch/issues/24612
https://github.com/pytorch/pytorch/issues/24550
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25048

Differential Revision: D16996440

Pulled By: ifedan

fbshipit-source-id: 0d76588d179d4c932e3fc284cb399dcfc77bc622
2019-08-28 08:29:13 -07:00
529bb859b2 Revert D17052534: [pytorch][PR] Creates Torch-friendly Event class and adds Stream tracking to autograd
Test Plan: revert-hammer

Differential Revision:
D17052534

Original commit changeset: d91b308ad0f7

fbshipit-source-id: dacc7e70a835a8fa6ae71246999b4eff3383f3f3
2019-08-28 08:24:43 -07:00
e123e24e7e Implementation of cpu_serial_kernel for TensorIterator (#25125)
Summary:
https://github.com/pytorch/pytorch/issues/24472
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25125

Differential Revision: D17073868

Pulled By: ifedan

fbshipit-source-id: 9df1a64669c97854ba436aafef59e79c22a21a7f
2019-08-28 08:24:40 -07:00
883628cb5c Added documentation for nn.functional.bilinear (#24951)
Summary:
Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886.

The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes.

I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951

Differential Revision: D17091261

Pulled By: soumith

fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72
2019-08-28 08:19:25 -07:00
fe541aab5f Align AT_FORALL macros with DISPATCH macros wrt Half. (#25268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25268

The AT_FORALL AND macros with mistakenly already include Half, which differs from the Dispatch macros.

This change shouldn't have any effect.

Test Plan: Imported from OSS

Differential Revision: D17079747

Pulled By: gchanan

fbshipit-source-id: 635eb167722ce850d6c1949fac652de4dddf32ee
2019-08-28 08:15:40 -07:00
6c9410ffd1 Fix infer np scalar dtype mem leak (#24267)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24200 . I'm a bit worried that the test might be flaky...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24267

Differential Revision: D17079762

Pulled By: gchanan

fbshipit-source-id: a120688b9583ca4b74bdfb295914298f22540ffd
2019-08-28 07:51:54 -07:00
dfa48f9942 Disable the copy constructor and = operator of DispatchStub (#24932)
Summary:
They are not supposed to be copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24932

Differential Revision: D16940997

Pulled By: gchanan

fbshipit-source-id: 6f16211ec57f8db6baec86e17288c8050c89cab5
2019-08-28 07:49:08 -07:00
45943bd611 Remove some unused plugins.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25201

Test Plan: Imported from OSS

Differential Revision: D17060444

Pulled By: gchanan

fbshipit-source-id: 94722533fecc6d4eb11940eaf4f71aeea41502fb
2019-08-28 07:43:52 -07:00
687aa781df Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25238

Differential Revision: D17076308

Pulled By: mrshenli

fbshipit-source-id: 2827150be1d15af63088db21051ab0e3476992e6
2019-08-28 07:39:11 -07:00
718feb6d76 upgrade MKL-DNN to v0.20.3 (#22910)
Summary:
1. upgrade MKL-DNN to v0.20.3
2. allow user to change the capability of primitive cache in mkldnn-bridge by environment value LRU_CACHE_CAPACITY
3. support to fill all tensor elements by one scalar
4. fix the link issue if build with private MKLML other than pre-installed MKL
5. add rnn support in mkldnn-bridge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22910

Differential Revision: D16365998

Pulled By: VitalyFedyunin

fbshipit-source-id: b8d2bb454cbfbcd4b8983b1a8fa3b83e55ad01c3
2019-08-28 07:30:14 -07:00
9945c0cea6 Work around for bias quantization for conv and linear operators (#25212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25212

In eager mode, all modules need to work with input tensors that can change qparams dynamically. This issue https://github.com/pytorch/pytorch/issues/23874 will address this via FBGEMM modifications. This is a work around before that.
ghstack-source-id: 89118038

Test Plan:
buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
Summary (total time 65.86s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17064471

fbshipit-source-id: 3c192442b19bf2d9d88d4e52de6c24dc134a846f
2019-08-28 07:24:03 -07:00
a74d702e57 Return a message instead of void from rpc udf (#25283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25283

Return a message instead of void from rpc udf
This is to help thrift style rpc where there is no need for explicit send for a response.
We need to figure out how to solve the non-blocking callback case but don't want to block the thrift backed rpc agent implementation till then.
ghstack-source-id: 89130305

Differential Revision: D16825072

fbshipit-source-id: 75cb1c9aa5a10363b1c6b12cd21c50d7047d2268
2019-08-28 06:43:19 -07:00
98beb9ecd8 Revert D17059087: [quant] Reducing the test size for adaptive avg pool
Test Plan: revert-hammer

Differential Revision:
D17059087

Original commit changeset: 915f46ecae61

fbshipit-source-id: 8498fb0a7cab473babe385bc2ee1c8a9a734395a
2019-08-28 06:26:45 -07:00
febcb3b7b3 int8 static quantization in the numerical debugger
Summary: Fix the static int8 transformation in the numerical debugger

Test Plan:
Example:
buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger:multithreaded_sparsenn_emulator --  --warmup 0 --iter 1 --threads 1 --runs 1  --local_model_path=/data/models/mobile_cvr_int8/101245796_428 --filler="logfiledb" --benchmark_model_transformation="int8_static" --run_dir=/data/models/mobile_cvr_int8/local_output/ --local_dataset_path=/data/models/mobile_cvr_int8/dataset/test/dataset_cached_reader.db  --output_tensors="Sigmoid/sigmoid" --attach_ne_reporter=true --output_prediction_blob="Sigmoid/sigmoid" --activation-histogram-file=/data/models/mobile_cvr_int8/activation_histograms/101245796_428_train_10000_hist.txt.0x7f1f36133ea0 --dump_nets  --caffe2_logging_print_net_summary=1

Reviewed By: amylittleyang

Differential Revision: D16368515

fbshipit-source-id: b2649cec0fa35b852842a419fea1ea7105e5225c
2019-08-28 01:24:13 -07:00
c7ef50bd14 Upgrade the deprecated data to data_ptr APIs (#25295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25295

As Title says.

Test Plan: CI

Reviewed By: hl475

Differential Revision: D17089457

fbshipit-source-id: b45ca24decd6033e7e207f17540d486df6ef2ddc
2019-08-28 00:18:40 -07:00
8a8844dc83 Add the sparse feature information during logging in sparse lookup layer (#24863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24863

Add the sparse feature name in logging for ease of debugging

Test Plan:
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/sparse_nn/pooling_test#binary.par  -r test_simple_sum_pooling_named_exception

Another test for id_score_list. the original sparse_key is equivalent to get_key(self.input_record)()
P98343716

./buck-out/gen/caffe2/caffe2/python/layers_test-2.7#binary.par -r test_get_key

Reviewed By: chocjy

Differential Revision: D16901964

fbshipit-source-id: 2523de2e290aca20afd0b909111541d3d152a588
2019-08-27 23:25:26 -07:00
b1f7e13d5f Revert D17063240: [fix] Specify width for st.floats in hypothesis_utils.tensor
Test Plan: revert-hammer

Differential Revision:
D17063240

Original commit changeset: 0572fb810d8c

fbshipit-source-id: 62b7cf70388a5484d925805b53edb879247df4da
2019-08-27 23:23:12 -07:00
ca4bc9fc07 improve interface error messages (#25228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25228

This adds a facility to isSubtypeOf for it to explain why a type is
not a subtype of something else. It is used in situations where it
is not clear from the types python_str alone why the relationship
is now true. Because of subtle interaction between default arguments,
overloads, and virtual methods, it uses isSubtypeOfExt for the extended
version to avoid requiring readers to understand the interaction.

Test Plan: Imported from OSS

Differential Revision: D17066673

Pulled By: zdevito

fbshipit-source-id: 4de7c40fbf7f9eeae045d33a89a038538cf87155
2019-08-27 22:54:50 -07:00
fba107f18e add serialization of interface (#25227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25227

Adds cases to NamedType serialization to so that interfaces are written.
Similar implementation to NamedTuples

Test Plan: Imported from OSS

Differential Revision: D17066674

Pulled By: zdevito

fbshipit-source-id: fda5419260fad29e8c4ddb92de1d3447d621d982
2019-08-27 22:54:46 -07:00
a01358f91d Remove PythonPrint's is_method_ member (#25226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25226

Given the current structure, it is easier to just call different functions
to get the desired behavior.

Test Plan: Imported from OSS

Differential Revision: D17066672

Pulled By: zdevito

fbshipit-source-id: 88e76c5ee870d9d1e9887aebcac5e7873fabe6b1
2019-08-27 22:54:42 -07:00
61818b8986 Add interface declarations to JIT (#25258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25258

this is the first commit in a series to add interfaces to JIT.
Interfaces allow the specification through a blank python class of an
abstract interface that can be used in type annotations for Script functions.
If a TorchScript class implements all the methods in the interface with
the appropriate types, then it is implicitly considered to implement
that interface.

Follows required:
* implementation of serialization
* implementation in the parser frontend
* better error reporting for explaining why a class does not meet an
  interface specification.

Test Plan: Imported from OSS

Differential Revision: D17079963

Pulled By: zdevito

fbshipit-source-id: a9986eeba2d4fdedd0064ce7d459c0251480a5a0
2019-08-27 22:54:37 -07:00
011db3bcaa fix closures which always throw. (#25278)
Summary:
When a closure was declared that always throw'd we would erroneously propagate the ExitThrows status to the block in which it was declared, causing us to remove the subsequent code in the block. [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/script/exit_transforms.cpp#L462) was meant to handle this case, however it didn't handle the case when we were transforming Loops and the prim::Function wasn't a target block.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25278

Differential Revision: D17084780

Pulled By: eellison

fbshipit-source-id: ee31a4cc243653f615e4607ece29cdac8ef5710e
2019-08-27 22:16:54 -07:00
085bd15880 Add TORCH_WARN_ONCE, and use it in Tensor.data<T>() (#25207)
Summary:
This PR adds `TORCH_WARN_ONCE` macro, and use it in `Tensor.data<T>()`.

cc. gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25207

Differential Revision: D17066263

Pulled By: yf225

fbshipit-source-id: 411c6ccc8326fb27ff885fee4638df8b5ba4d449
2019-08-27 21:42:44 -07:00
e34ef04301 register HeatmapMaxKeypoint with C10 (#25191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25191

registering as C10.

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:heatmap_max_keypoint_op_test

Reviewed By: newstzpz

Differential Revision: D17056321

fbshipit-source-id: 989b72d7e3c9f23684b10d5fc9b98177ad4ee47b
2019-08-27 20:13:57 -07:00
2c22076342 Moving sign function to ATen (#22861)
Summary:
This PR linked to https://github.com/pytorch/pytorch/issues/22806 moving sign function to ATen.

sign(x) supports bool,  and vectorized operation on CPU.
sign(NaN) is defined to return 0.
sign(bool) is a no-op, the resulting tensor will holds the same values than the input one.

- [x] CPU Backend
- [x] CUDA Backend
- [x] Bring support for bool dtype
- [x] Bring support for Half dtype
- [x] Add test for NaN
- [x] Add test for bool dtype
- [x] Delete legacy implementation in THTensorMoreMath.cpp

Performances:
```python
timeit -s 'import torch; x = torch.randn((1000, 1000))' -n 1000 'torch.sign(x)'
timeit -s 'import torch; x = torch.randn((1000, 1000), device="cuda")' -n 1000 'torch.sign(x); torch.cuda.synchronize()'
```

| device |  before  | after |
| :-------------: | :-------------: | :-----: |
| CPU    | 1.24 msec | 33.9 usec |
| GPU    | 680 usec | 7.13 usec  |
| CPU (1 thread) | 0.82 msec | 0.73 msec |
| GPU (1 thread) | 16.1 used | 15.9 usec |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22861

Differential Revision: D16503452

Pulled By: VitalyFedyunin

fbshipit-source-id: a87ce7fff139642ef4ed791f15873074ad0d53af
2019-08-27 19:01:34 -07:00
9d06a984f8 Serialization for nn.quantized.functional modules (#25220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25220

Add load_from_state_dict and save_from_state_dict for quantized functional modules
ghstack-source-id: 89070836

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_scriptability_serialization\ \(test_quantization.ScriptabilityTest\)' --print-passing-details

Differential Revision: D17065243

fbshipit-source-id: 413ce0a95d0c27fedb23894f1483e3da2f60f417
2019-08-27 18:56:10 -07:00
5b4e052904 Add new qnnpack_add and qnnpack_maxpool op to C10 registry (#24103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24103

This change adds a quantized add and maxpool2d operation for pytorch mobile.

These operators follow the structure of qnnpack in terms of create/setup and run calls. The plan to refactor QNNPACK to make it more functional is currently for FC and Conv ops where the cost of create/setup is high.
For ops like add and maxpool the cost of calling create and setup in each operator invocation is negligible.

Once we migrate FC and Conv QNNPACK ops to be functional in nature, we will consider changing these ops as well to make it consistent.
ghstack-source-id: 88997042

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_qnnpack_add
python test/test_quantized.py TestQNNPackOps.test_qnnpack_maxpool2d

Differential Revision: D16734190

fbshipit-source-id: 5152aed88e8bbe4f701dba4886eac989bdcefe8f
2019-08-27 18:56:06 -07:00
86a35d7b8d Fixing the enforcement of the zero_point
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25193

Test Plan: Imported from OSS

Differential Revision: D17058781

Pulled By: zafartahirov

fbshipit-source-id: 7c665e1a0618c04a44f0c0e72e1bcc741a388e1c
2019-08-27 18:52:09 -07:00
3c3d95cf1d disable deadline checking on test_adaptive_avg_pool2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25255

Test Plan: Imported from OSS

Differential Revision: D17078073

Pulled By: jamesr66a

fbshipit-source-id: fd4c3442e87088a9b2f338a2687c5dddd0d93b81
2019-08-27 18:45:23 -07:00
2e224d62b6 Add USE_CUDNN check to AT_CUDNN_ENABLED definition (#25037)
Summary:
We have environment variable USE_CUDNN with self-explanatory name. However cpp code is compiled based on cpp macro definition AT_CUDNN_ENABLED, which is defined as:

```
  IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND)
    MESSAGE(STATUS "CuDNN not found. Compiling without CuDNN support")
    set(AT_CUDNN_ENABLED 0)
  ELSE()
    include_directories(SYSTEM ${CUDNN_INCLUDE_DIRS})
    set(AT_CUDNN_ENABLED 1)
  ENDIF()
```

So, even if USE_CUDNN is set to 0, cpp is compiled with cuDNN if cmake finds cuDNN in the system. I actually tested it and was very surprised when I was debugging cuDNN code which I built with USE_CUDNN=0. I believe that cmake code above should look like this:

`IF (NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND OR NOT USE_CUDNN) ...`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25037

Differential Revision: D17048683

Pulled By: pbelevich

fbshipit-source-id: 48afa19eaae0bba2ffd49c1f68db0b4efd5cf85e
2019-08-27 18:43:11 -07:00
f82c4ce6d6 Add libtorch android build with shared lib for 4 android abis (#25192)
Summary:
In current  pytorch/master we have only libtorch android build for static libraries for armv7

This change adds the same builds with shared library to circleCI, abis: x86, x86_64, arm
-v7a, arm-v8a

In pytorch_build_data.py I added new AndroidAbiConfigNode:
class AndroidAbiConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["android_abi"] = node_name
def child_constructor(self):
return ImportantConfigNode
That can be children of
ExperimentalFeatureConfigNode
And it results:

    ("android", [
        ("r19c", [
            ("3.6", [
                ("android_abi", [XImportant("x86")]),
                ("android_abi", [XImportant("x86_64")]),
                ("android_abi", [XImportant("arm-v7a")]),
                ("android_abi", [XImportant("arm-v8a")]),
            ])
        ]),
    ]),

As all parameters are used for docker_image_name generation, while I wanted to use the same docker image for all android jobs - I introduced in  Conf.parms_list_ignored_for_docker_image in pytorch_build_definitions.py

It contains parameters that will not be joined to docker_image but used for job name generation and build_environment generation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25192

Reviewed By: kostmo

Differential Revision: D17078465

Pulled By: IvanKobzarev

fbshipit-source-id: c87534a45fb92c395e0dd3471213d42d3613c604
2019-08-27 18:43:07 -07:00
f8852c947b Implement a bunch of pickle serialization features that optimize for size. (#23759)
Summary:
This saves about 10KB of compressed size on FaceBlaze.  https://github.com/pytorch/pytorch/issues/23582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23759

Differential Revision: D16641664

fbshipit-source-id: 5a7cec1a1b5123bb2a3eaa21ea12e041be551561
2019-08-27 18:40:38 -07:00
3af758c077 data -> data_ptr: upgrade the deprecated APIs (#25223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25223

Before this PR, it shows the following warning:
```
> caffe2/aten/src/ATen/core/Tensor.h:297: UserWarning: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.
>   TORCH_WARN("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.");
> caffe2/aten/src/ATen/core/Tensor.h:297: UserWarning: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.
>   TORCH_WARN("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.");
```

After this PR, the warning message should disappear.
ghstack-source-id: 89113498

Test Plan: CI

Differential Revision: D17066471

fbshipit-source-id: e4fec964b5333ff968c8cf218286d4a8ab8dbe54
2019-08-27 18:38:16 -07:00
a4fa167878 Optimize LeftRight and either (#25133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25133

This is driven by benchmarks I did for moving ATen ops to the c10 operator library.
Improvements:
 - tell the compiler that the error cases are unlikely so it can optimize code better
 - optimize cache layout of LeftRight.
ghstack-source-id: 88907294

Test Plan: unit tests

Differential Revision: D16998010

fbshipit-source-id: 0e3cbff0a4983133a4447ec093444f5d85dd61d6
2019-08-27 18:33:29 -07:00
9fd62436b4 get rid of dynamic_cast in Quantizer (#25001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25001

Seems QScheme and Quantizer class type has 1-1 mapping, so use it to compare whether
two quantizers are equal instead of using dynamic_cast.
This way the code can remain mobile friendly as our internal mobile build doesn't
enable rtti by default.
ghstack-source-id: 88925243

Test Plan:
- builds;
- will check CI tests;

Differential Revision: D16951501

fbshipit-source-id: 585b354f64e5188fd34f01d456c91cec232ba6b0
2019-08-27 18:33:24 -07:00
858493d168 generic overrideable convolution for backends (#23562)
Summary:
One possible solution based on our discussion yesterday: ezyang gchanan zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23562

Differential Revision: D16998161

Pulled By: ailzhang

fbshipit-source-id: 07fe3a335f43b4205a421b3521aeb5fa4dc80279
2019-08-27 18:33:21 -07:00
ac862e6ddc Reducing the test size for adaptive avg pool (#25195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25195

The test will fila for large samples adue to deadline constraint in the hypothesis framework.

Test Plan: Imported from OSS

Differential Revision: D17059087

Pulled By: zafartahirov

fbshipit-source-id: 915f46ecae61de1b384136c14da25ee875d1c02d
2019-08-27 18:28:42 -07:00
f5a3d59254 Handle empty qconfig for functional Modules (#25215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25215

ghstack-source-id: 89044252

Test Plan: Test implemented in D16879132/

Differential Revision: D17064670

fbshipit-source-id: 08d3d566aa123bedf318ab5a8bc9b71457930ff2
2019-08-27 12:31:26 -07:00
3779893d1d Implementation of cyclical learning rate (#23914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23914

Implementation of cyclical learning rate, see https://arxiv.org/pdf/1506.01186.pdf

Test Plan: canary: https://fburl.com/fblearner/siqb34md

Reviewed By: chenshouyuan

Differential Revision: D16632831

fbshipit-source-id: 20bd9d7fb61af5a8b594b039c5d434a0cc96fadc
2019-08-27 10:44:16 -07:00
c351a68f5b Specify width for st.floats in hypothesis_utils.tensor (#25188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25188

circleci complains about generated numbers are not representable by float32 and it pollutes the logs:
https://circleci.com/gh/pytorch/pytorch/2554740?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Test Plan:
circleci

Imported from OSS

Differential Revision: D17063240

fbshipit-source-id: 0572fb810d8ccd8cdf3f3ac7efdf0cfce5aee6ca
2019-08-27 10:21:05 -07:00
44a7879b6e Disable flaky test_adaptive_avg_pool2d test. (#25249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25249

See #25097

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17071632

Pulled By: ezyang

fbshipit-source-id: 1c5ad7204f1d30f5c67d682fbb083608e067cb2a
2019-08-27 09:09:25 -07:00
c142dbf876 Fix scriptability for Observer (#25219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25219

Ensure that observer code remains scriptable after addition of warnings
ghstack-source-id: 89055664

Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17065218

fbshipit-source-id: b3599613b4835bf1c5241aff191b40ba5f40d7be
2019-08-27 08:54:40 -07:00
92750acb88 Move the detection of cuDNN to FindCUDNN.cmake (#24938)
Summary:
Currently they sit together with other code in cuda.cmake. This commit is the first step toward cleaning up cuDNN detection in our build system.

Another attempt to https://github.com/pytorch/pytorch/issues/24293,  which breaks manywheels build because it does not handle `USE_STATIC_CUDNN` properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24938

Differential Revision: D17070920

Pulled By: ezyang

fbshipit-source-id: a4d017a3505c102d9c435a73ae62332e4336c52e
2019-08-27 06:51:52 -07:00
2f4f6c2563 Implement name inference for torch.dot (#24474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24474

torch.dot is a little weird. It ignores the names of its inputs to be
consistent with the rest of our matrix multiplication functions.

I've written the implementation using a helper function that is also
used by other matrix multiplication functions so that it is easy to
change the behavior.

Test Plan
- new tests [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16915802

Pulled By: zou3519

fbshipit-source-id: 628a6de1935357022cc92f4d23222736a70bb070
2019-08-27 06:49:27 -07:00
9340b155bc Revert D15901930: Add interface declarations to JIT
Test Plan: revert-hammer

Differential Revision:
D15901930

Original commit changeset: 22c82d12c9c2

fbshipit-source-id: 4009a3ce7af245d7e0f4924824ece59cdc774180
2019-08-27 06:41:32 -07:00
1f57b8b738 Add myself as a CODEOWNER for better discoverability (#25231)
Summary:
Not meant to be a landing blocker or anything like that. This only lets me setup some more effective email filters, hopefully allowing me to discover the current changes earlier and be more responsive.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25231

Differential Revision: D17070735

Pulled By: soumith

fbshipit-source-id: 171c8dcd48edf64a9dc3367015e4166baa860c0a
2019-08-27 06:22:40 -07:00
f622ec8084 Update mapping dictionary to support functionalmodules and pooling operations (#25216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25216

ghstack-source-id: 89045562

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ \(test_quantization.PostTrainingQuantTest\)' --print-passing-details

Differential Revision: D17065029

fbshipit-source-id: b248abf6de162f38e35e6bace17bde1be9e38c57
2019-08-26 23:00:01 -07:00
4d2bf0b51b Move test QAT tests to double precision to ensure numerics match (#25211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25211

P
Change dtypes of all tensors in testqat to double precision. Without this change, the backward pass showed small mismatches the root cause of which wasnt clear. With this change, the numerics match to a precision of 1e-10 and this test is useful and provides a tight check on numerics.
ghstack-source-id: 89041119

Test Plan:
buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details

Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151
      ✓ caffe2/test:quantized - test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) 17.777 1/1 (passed)
Test output:
> test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 17.778s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151
Summary (total time 22.03s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17064183

fbshipit-source-id: 7f6d5d2b71430b6aaf4f6d741b56a2bd1247ac29
2019-08-26 22:55:39 -07:00
b15d91490a Remove InsertQuantDeQuantNode (#25000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25000

Remove deprecated insert_quantdequant pass

Test Plan:
.

Imported from OSS

Differential Revision: D17001139

fbshipit-source-id: 5ecabdff84598fe21f24ea827b615e697081ee53
2019-08-26 20:00:25 -07:00
fbb88f5d71 Remove insert_observers pass (#24999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24999

As described in previous PR, we are doing module level observer rather
than global observer now, so majority of code are deprecated. But we
still keeps some logic that is independent of this decision in the new
code.

Test Plan:
.

Imported from OSS

Differential Revision: D17001138

fbshipit-source-id: b456f80d587a61e368c626e7e8ac2a4f1282268b
2019-08-26 20:00:22 -07:00
0b60f5c0f8 Remove deprecated graph mode quantization tests (#24998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24998

Original graph mode was developed at the time when we don't have conrete API of qconfig yet and
it has a global observer module which is passed around during the whole quantization flow,
we have a much clearer picture of quantization API now, and we are going to use a per Tensor
observer design, just like in eager mode. This PR removes the deprecated tests, next PR will
remove deprecated code.

Test Plan:
```
python test/test_quantizer.py
```

Imported from OSS

Differential Revision: D17001140

fbshipit-source-id: 87f342cfa8ea6b45606372c51dbfc493065a737a
2019-08-26 20:00:18 -07:00
ab0229388c add import for test_quantizer.py (#25222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25222

forgot to add import..

Test Plan:
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17066193

fbshipit-source-id: 638119053724b21151eb6f05adfd39d094e44de7
2019-08-26 19:57:30 -07:00
9e27cf617e Initial commit for android torchvision utils (#25185)
Summary:
Initial commit of pytorch_android_torchvision that has utility methods for

android.media.Image, YUV_420_888 format (camera output) -> Tensor(Float) with torchvision format, normalized by ImageNet mean,std

Bitmap -> Tensor(Float) torchvision format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25185

Reviewed By: dreiss

Differential Revision: D17053008

Pulled By: IvanKobzarev

fbshipit-source-id: 6bf7a39615bf876999982b06925e7444700e284b
2019-08-26 19:40:44 -07:00
c0334015ed add to Tensor symmetric methods getDataAsIntArray, getDataAsByteArray (#25183)
Summary:
Tensor has getDataAsFloatArray(), we also support Int and Byte Tensors,
adding symmetric methods for Int and Byte, that will throw
IllegalStateException if called for not appropriate type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25183

Reviewed By: dreiss

Differential Revision: D17052674

Pulled By: IvanKobzarev

fbshipit-source-id: 1d44944461ad008e202e382152cd0690c61124f4
2019-08-26 19:11:11 -07:00
c2e0383975 skip tests if fbgemm is not supported for test_quantizer.py (#25209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25209

att

Test Plan:
ossci

Imported from OSS

Differential Revision: D17063744

fbshipit-source-id: d4ff860f3cd80c3a90d06c4f13d9ae0e9fe8e125
2019-08-26 17:31:22 -07:00
4b22cf6bd5 Add interface declarations to JIT (#21972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21972
ghimport-source-id: 280f89ca678615f915be2139d1c05cb6bc39eefc

Test Plan: Imported from OSS

Differential Revision: D15901930

Pulled By: zdevito

fbshipit-source-id: 22c82d12c9c2600e569d7083e2771fd6ec3de2b1
2019-08-26 16:57:59 -07:00
6e42580d32 Simplify NamedType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25058

Test Plan: Imported from OSS

Differential Revision: D16974556

Pulled By: zdevito

fbshipit-source-id: f15611df5117abb5b03dfd22fb412421f6385976
2019-08-26 16:57:55 -07:00
26a438d4fb Revert D16852280: Work around for bias quantization for conv and linear operators
Test Plan: revert-hammer

Differential Revision:
D16852280

Original commit changeset: 988f8ff91616

fbshipit-source-id: e2cf03e13dc8dcf0db22d43740d72fd8b069fd74
2019-08-26 16:25:33 -07:00
17f69eff22 Revert D16879133: Handle empty qconfig for functional Modules
Test Plan: revert-hammer

Differential Revision:
D16879133

Original commit changeset: 230f5204cfbd

fbshipit-source-id: 29b4bfe066b173797f3d9f2fcf7cbf5ee21ff8fb
2019-08-26 16:25:29 -07:00
a9fdc1923b Revert D16879132: Update mapping dictionary to support functionalmodules and pooling operations
Test Plan: revert-hammer

Differential Revision:
D16879132

Original commit changeset: cd8c10182aa7

fbshipit-source-id: 9b67ccf73f43d15ef50bf0331d3df4d57835931b
2019-08-26 16:25:25 -07:00
978a964be4 Revert D17053634: Move test QAT tests to double precision to ensure numerics match
Test Plan: revert-hammer

Differential Revision:
D17053634

Original commit changeset: e19d555adee2

fbshipit-source-id: 6ae9be6459b6ac7fe046817f02d12b0f5b6d6ca3
2019-08-26 16:23:03 -07:00
a1bf4d7ee1 Integration tests for initial quantization graph mode (#24428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24428

att

Test Plan:
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17001136

fbshipit-source-id: b0c6cd433efdcbc6b54b429a29677fc509221937
2019-08-26 15:38:29 -07:00
77ee1f5f3c Revert D16923660: Support observer without any data calibration
Test Plan: revert-hammer

Differential Revision:
D16923660

Original commit changeset: 9927ed4e4ee9

fbshipit-source-id: 31a2b28584aae3808df6508b4caedb54de32156d
2019-08-26 15:36:26 -07:00
c3c36a5b68 Revert D16923651: Serialization for nn.quantized.functional modules
Test Plan: revert-hammer

Differential Revision:
D16923651

Original commit changeset: eb1234be1941

fbshipit-source-id: c80d0b50db0f949cc293dbc2f825404bbc8cb86c
2019-08-26 15:36:21 -07:00
ff30201fff Revert D17059486: Fix scriptability for Observer
Test Plan: revert-hammer

Differential Revision:
D17059486

Original commit changeset: 70ea9ee39f0b

fbshipit-source-id: 6f39057b264e4d4213cf07496929274240bce917
2019-08-26 15:32:21 -07:00
85d1ebd26e Fix scriptability for Observer (#25197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25197

Ensure that observer code remains scriptable after addition of warnings
ghstack-source-id: 89022474

Test Plan: buck test caffe2/test:quantization -- 'test_observer_scriptable \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17059486

fbshipit-source-id: 70ea9ee39f0b896c7801e168666f88c156dbf15b
2019-08-26 15:27:27 -07:00
433fe47d95 Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130)
Summary:
Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130

Differential Revision: D17052534

Pulled By: mruberry

fbshipit-source-id: d91b308ad0f730646bb7b3492a601cd9b05c72d8
2019-08-26 15:19:06 -07:00
088201f95d Implement name inference for addmv, addmv_, mv (#24471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24471

mv(Tensor[M, N], Tensor[O]) ignores the names of N and O and returns a
tensor with names [M].

Test Plan: - new tests [namedtensor ci]

Differential Revision: D16915805

Pulled By: zou3519

fbshipit-source-id: d7d47903f249f85ef3be8a188d51993834bf5f55
2019-08-26 15:03:26 -07:00
78fa8a8ad0 Implement name inference for expand (#24469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24469

tensor.expand(*sizes) returns a tensor with names equal to tensor.names
plus unnamed padding in the beginning dimensions.

For example, Tensor[H, W].expand(10, 2, 128, 128) -> Tensor[None, None,
H, W].

Test Plan: - new tests [namedtensor ci]

Differential Revision: D16915804

Pulled By: zou3519

fbshipit-source-id: 77ac97f42e9959d7f6d358c5286e3dc27488e33d
2019-08-26 15:03:22 -07:00
277cd748f9 skip fstrings test if not py36 (#25184)
Summary:
Fixes py35 job on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25184

Differential Revision: D17057957

Pulled By: eellison

fbshipit-source-id: 53decc408680d9436395698cbd4b4ede98933159
2019-08-26 13:58:45 -07:00
121839b2f8 Fix bugs in assignment to optionals (#25059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25059

This fixes the cases where a type annotated with optional cannot
be conditionally assigned to none:

```
x : Optional[int] = 4
if ...:
 x = None
```

Test Plan: Imported from OSS

Differential Revision: D16975166

Pulled By: zdevito

fbshipit-source-id: 5a7a81224d08b9447e1f4d957fcd882091e02f32
2019-08-26 13:47:54 -07:00
3b3261ca8e Adding Scalar add/mul. (#24447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24447

Note: This should be landed ONLY after #24259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24447

Differential Revision: D16846006

Test Plan: Imported from OSS

Pulled By: zafartahirov

fbshipit-source-id: 458fd65279d98cb177ef206240d24dfcbc8d1c1b
2019-08-26 13:05:44 -07:00
a3e6e82b6c Adding return for the observer in the functional_modules.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25168

Test Plan: Imported from OSS

Differential Revision: D17048164

Pulled By: zafartahirov

fbshipit-source-id: 40ee1f276ee5421255de5b2fc14194402ded10db
2019-08-26 13:03:08 -07:00
c395f42109 fix to loggin in AA
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25143

Differential Revision: D17004030

Pulled By: Krovatkin

fbshipit-source-id: 5081c8f89238b7eaf72267ec67b714e125378782
2019-08-26 12:24:00 -07:00
0156d02b59 Implement name inference for mm, addmm (#24306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24306

Featuring:
- a new way of writing name inference tests. At some point I'll migrate
the older tests over.
- The out= variants aren't implemented. This is because they are a
little weird: the output gets resized, but I haven't throught through
what semantics that should have.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D16915801

Pulled By: zou3519

fbshipit-source-id: 29ae2ee414c7d98e042965458c5dccef7ddbd4dd
2019-08-26 12:20:26 -07:00
6195aee2c6 Fix binary op name inference between unnamed and named tensors. (#24921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24921

Let `unnamed = torch.randn(1, 1, 1)` and `named = torch.randn(1, 1,
names=('N', 'C'))`.

Previously, there was a bug where `unnamed + named` would error out.
This happened because `unify_from_right(unnamed.opt_names(),
named.opt_names())` would return `named.names()`, which was propagated
to the output tensor. However, the output tensor has dim 3, but
`names.names()` only has 2 elements, so the code would throw an error.

The solution implemented in this PR is to stop trying to do premature
optimization. If all inputs to an operation doesn't have names, then
don't run name inference. However, if any inputs do, then materialize
the names and run name inference.

It's possible to make this more efficient for the case where some inputs
are named and some aren't, but we should benchmark these cases
and determine if it is necessary for it to be more efficient.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D16930710

Pulled By: zou3519

fbshipit-source-id: 0de73c803c8b0f9a1c2d80684b9a47cccba91cbc
2019-08-26 12:20:22 -07:00
5d6b3dfdf4 Move test QAT tests to double precision to ensure numerics match (#25189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25189

Change dtypes of all tensors in testqat to double precision. Without this change, the backward pass showed small mismatches the root cause of which wasnt clear. With this change, the numerics match to a precision of 1e-10 and this test is useful and provides a tight check on numerics.
ghstack-source-id: 88999698

Test Plan:
buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details

Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151
      ✓ caffe2/test:quantized - test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) 17.777 1/1 (passed)
Test output:
> test_conv_bn_relu (test_qat.IntrinsicQATModuleTest) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 17.778s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699726578151
Summary (total time 22.03s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17053634

fbshipit-source-id: e19d555adee29b49bff873fcc01f527e8272f1c6
2019-08-26 12:17:01 -07:00
95a3ffc2f1 Serialization for nn.quantized.functional modules (#24924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24924

Add load_from_state_dict and save_from_state_dict for quantized functional modules
ghstack-source-id: 89001576

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_scriptability_serialization\ \(test_quantization.ScriptabilityTest\)' --print-passing-details

Differential Revision: D16923651

fbshipit-source-id: eb1234be1941ccf268a2fc5b756540ab973f3ffb
2019-08-26 12:16:57 -07:00
a5710e2303 Support observer without any data calibration (#24923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24923

Replace exception with warning for un initialized min/max values to support creation of quantized models without observers.
ghstack-source-id: 89003800

Test Plan: Replace error message with warning for observers

Differential Revision: D16923660

fbshipit-source-id: 9927ed4e4ee977c1388595ddef042204f71076a4
2019-08-26 12:16:53 -07:00
794f63fe92 Update mapping dictionary to support functionalmodules and pooling operations (#24804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24804

ghstack-source-id: 89003799

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_resnet_base\ \(test_quantization.PostTrainingQuantTest\)' --print-passing-details

Differential Revision: D16879132

fbshipit-source-id: cd8c10182aa732ddf655bcda17f72ea08033a633
2019-08-26 12:16:49 -07:00
d7f6ac1dbb Handle empty qconfig for functional Modules (#24803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24803

ghstack-source-id: 89003797

Test Plan: Test implemented in D16879132/

Differential Revision: D16879133

fbshipit-source-id: 230f5204cfbd149fea1c0985578a2572a0e0f2a8
2019-08-26 12:16:46 -07:00
ea601d90d6 Work around for bias quantization for conv and linear operators (#24789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24789

In eager mode, all modules need to work with input tensors that can change qparams dynamically. This issue https://github.com/pytorch/pytorch/issues/23874 will address this via FBGEMM modifications. This is a work around before that.
ghstack-source-id: 89003798

Test Plan:
buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
Summary (total time 65.86s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D16852280

fbshipit-source-id: 988f8ff91616eddf511e71926aa7d2d0f1938188
2019-08-26 12:16:42 -07:00
969c918f56 bind autograd.backward and tensor.backward in TorchScript (#23913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23913

This PR bind torch.autograd.backward and tensor.backward to TorchScript,
and make aliasing to the conservative for these two ops, this is mainly
because backward op might write to every input tensors in the graph

Test Plan: Imported from OSS

Differential Revision: D16923272

fbshipit-source-id: 8a4016c62e00d00e0dee3d8c599d3aca220202f7
2019-08-26 12:11:02 -07:00
9a9ef21bad quant_fusion jit pass (#24427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24427

Added following pass:
- _jit_pass_quant_fusion: Fusion pass that replaces the dequant->conv->quant patterns
to quantized_conv

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17001142

fbshipit-source-id: 729a6bf291c5268b24f5716ccadfcfb63e039c0b
2019-08-26 11:17:39 -07:00
105fbb9cce insert_quant_dequant jit pass (#24426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24426

Added following pass:
- _jit_pass_insert_quant_dequant: removes observer modules and calls, insert
quantize_linear-int_repr-_dequantize_linear calls for activation, weight and bias,
the scale of bias is calculated from the scale of input activation and weight

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17001141

fbshipit-source-id: e81faac697a9c0df862adc5aa8ca2aa9e4ae5fd9
2019-08-26 10:52:32 -07:00
d80625754f per channel quantization support (#25134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25134

copy of https://our.intern.facebook.com/intern/diff/D16909378/

it was reverted in https://our.intern.facebook.com/intern/diff/D16997422/

Per channel quantization support in qconv2d + tests
ghstack-source-id: 88992610

Test Plan:
buck test mode/dev caffe2/test:quantized -- --print-passing-details
```
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124656103386
Summary (total time 64.42s):
  PASS: 33
  FAIL: 0
  SKIP: 3
    caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16999104

fbshipit-source-id: 008447ffbc0144f0fc76f3cf143a2f69b65922fd
2019-08-26 10:43:09 -07:00
cd14518ee8 hyperparameter plugin (#23134)
Summary:
closes https://github.com/pytorch/pytorch/issues/16838

example usage:
```python
writer.add_hparam(hparam_dict= {'lr': 0.1, 'bsize': 12}, metrics= {'accuracy': 0.987, 'loss': 10})

```
cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23134

Reviewed By: orionr

Differential Revision: D16807300

Pulled By: sanekmelnikov

fbshipit-source-id: 4072c529076f423b34b00b68be2d6eec444423fe
2019-08-26 10:40:34 -07:00
1bf1970fe2 Add Python/C++ torch.nn API parity test harness (#23852)
Summary:
This PR adds test harness for checking Python / C++ API parity for `torch.nn.Module` subclasses. Under the hood, we use JIT tracing to transfer `nn.Module` state from Python to C++, so that we can test initialization / forward / backward on Python / C++ modules with the same parameters and buffers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23852

Differential Revision: D16830204

Pulled By: yf225

fbshipit-source-id: 9b5298c0e8cd30e341a9f026e6f05604a82d6002
2019-08-26 08:02:25 -07:00
573b1cd224 prevent generating caffe2::mkl for multiple times (#25167)
Summary:
fixes issue https://github.com/pytorch/pytorch/issues/25004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25167

Differential Revision: D17051290

Pulled By: ezyang

fbshipit-source-id: 30c2b6d6ffca2ce8dae45a4a706ce45d6386c672
2019-08-26 07:39:56 -07:00
c24314bf0e Ensure tests get passed on Windows (#25145)
Summary:
(1) check error codes after every test command
(2) add missing LibTorch tests mentioned in https://discuss.pytorch.org/t/pre-compiled-tests-failing/54166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25145

Differential Revision: D17050539

Pulled By: ezyang

fbshipit-source-id: 8a01e5f3c97b181cf2cd7641a545551dcb3627b8
2019-08-26 06:02:24 -07:00
30bc65271d torch.from_numpy fix for np.int (#25139)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22615
Because of different sizeof(long) we have the following relations between NPY_TYPES and NPY_INTXX aliases:
```
int value	Enum			Unix		Windows
1		NPY_BYTE		NPY_INT8	NPY_INT8
3		NPY_SHORT		NPY_INT16	NPY_INT16
5		NPY_INT			NPY_INT32	-
7		NPY_LONG		NPY_INT64	NPY_INT32
9		NPY_LONGLONG		-		NPY_INT64
```
I suggest the following fix for `numpy_dtype_to_aten` method:
```
if (dtype == NPY_INT || dtype == NPY_INT32) {
	return kInt;
} else if (dtype == NPY_LONGLONG || dtype == NPY_INT64) {
	return kLong;
}
```
On Unix it will be replaced with:
Unix:
```
if (dtype == 5 || dtype == 5) {
	return kInt;
} else if (dtype == 9 || dtype == 7) {
	return kLong;
}
```
and on Windows with:
```
if (dtype == 5 || dtype == 7) {
	return kInt;
} else if (dtype == 9 || dtype == 9) {
	return kLong;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25139

Differential Revision: D17048443

Pulled By: pbelevich

fbshipit-source-id: 9f2c27ff2829b893a35d3d57f176a58e7749a468
2019-08-26 05:07:22 -07:00
43a2fd0e24 Support focal loss in MTML
Summary:
[Not in need of review at this time]
Support focal loss in MTML (effectively dper2 in general) as described in https://arxiv.org/pdf/1708.02002.pdf. Adopt approach similar to Yuchen He's WIP diff D14008545

Test Plan:
Passed the following unit tests
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_mtml_with_lr_loss_based_focal_loss
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss_with_stop_grad_in_focal_factor

Passed ./fblearner/flow/projects/dper/canary.sh; URL to track workflow runs: https://fburl.com/fblearner/446ix5q6

Model based on V10 of this diff
f133367092
Baseline model
f133297603

Protobuf of train_net_1 https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GEq30QIFW_7HJJoCAAAAAABMgz4Jbr0LAAAz

Reviewed By: hychyc90, ellie-wen

Differential Revision: D16795972

fbshipit-source-id: 7bacae3e2255293d337951c896e9104208235f33
2019-08-25 01:42:25 -07:00
b7b80c6bdd Fix ios_crash:backtrace=FBCameraFramework:caffe2::getClockTimeMilliseconds() (perf_observer.cc (#24813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24813

clock_gettime does not work on some apple platforms e.g. OSX < 10.12

Use a custom implementation, similar to https://opensource.apple.com/source/Libc/Libc-1158.1.2/gen/clock_gettime.c.auto.html

T52655182

Test Plan: sandcastle tests

Differential Revision: D16883407

fbshipit-source-id: a42828bb91bb0c43297e9bdce4b18f7c9ea4274d
2019-08-24 21:16:02 -07:00
f2bcad5ddf Add logging to JIT CSE pass.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25141

Test Plan: Imported from OSS

Differential Revision: D17003448

Pulled By: ZolotukhinM

fbshipit-source-id: ec9f738efc0baf80b3447b12e7c43d24237e8496
2019-08-24 19:52:39 -07:00
f71ddd4292 Switch hub to use requests because of SSL (#25083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25083

I missed this in the last PR

Test Plan: Imported from OSS

Differential Revision: D17005372

Pulled By: jamesr66a

fbshipit-source-id: 1200a6cd88fb9051aed8baf3162a9f8ffbf65189
2019-08-24 12:06:49 -07:00
85bca16a61 SubgraphMatcher: matching modules support. (#25075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25075

This change adds a special behavior to subgraph matcher to allow it to
match calls to modules. Namely, when a node in the pattern graph has a
'match::module' type, it is considered 'match' only when the
corresponding node in the target graph is a 'prim::GetAttr' obtaining a
submodule which type matches the type specified in 'name' attribute of
the 'match::module' node.

Currently when comparing the expected module type we check if the string
specified in 'name' prefixes qualified name of the module GetAttr
returns. In future when qualified name format is better defined  we will
probably change it for the exact comparison.

Why do we want this? In some cases we would like to perform fusion on a
module level rather than on a graph-level. A popular example of such
fusion would be Conv-BN. It is inpractical to match batchnorm on
graph-evel because that would mean we woudl need to specify its full
and exact implementation in the pattern graph. If we match on the
CallMethod level, however, the problem becomes trivial.

The feature added in this PR allows to detect patterns with 'CallMethod'
nodes, which in its turn allows us to use subgraph rewriter to replace
such patterns with some node (or nodes). I expect that a usual approach
would be to use subgraph-rewriter to replace all matches with some
artificial node and then in additional pass replace such nodes with
calls to another module or something else. It is not possible at the
moment to use subgraph-rewriter to add a call to a method of a new
module, because it can not add a new submodule, but we probably would
add a higher level API to do that.

Test Plan: Imported from OSS

Differential Revision: D16978652

Pulled By: ZolotukhinM

fbshipit-source-id: 37307a5ec65cf4618ad8eb595ef5f8ae656e2713
2019-08-23 21:15:46 -07:00
16289c2fdc SubgraphMatcher: add logging.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25074

Test Plan: Imported from OSS

Differential Revision: D16978654

Pulled By: ZolotukhinM

fbshipit-source-id: a59c86c11ea6a6e0acc09d0b1d73fa22e8d1451b
2019-08-23 21:15:41 -07:00
b5096b68d3 SubgraphMatcher: Factor out matchAttributes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25073

Test Plan: Imported from OSS

Differential Revision: D16978653

Pulled By: ZolotukhinM

fbshipit-source-id: 57b5d371fcb74f8dbbb2b64cbd98a92134f3e78a
2019-08-23 21:15:37 -07:00
a54f8f0f21 use avx2 for Add without broadcast and when inputs are uint8_t (#25098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25098

Use the same optimization we used for Sum operator in Add when broadcast is not used and inputs are uint8_t.
The optimization uses AVX2 instruction and use fp32 (instead of pure fixed point arithmetic). It does introduce numerical difference but only for minor cases like tie-breaking when rounding.

Test Plan: buck test caffe2/caffe2/quantization/server:elementwise_add_dnnlowp_op_test

Reviewed By: jianyuh

Differential Revision: D16985776

fbshipit-source-id: 8097503dd55f7d39857b3e4102db0f91327a6f55
2019-08-23 18:20:22 -07:00
363655dc48 Use the EmbeddingLookup API which takes the offsets instead of lengths (#24945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24945

As Title says.
ghstack-source-id: 88903516

Test Plan:
To Check with CI.

```
import torch, time

eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum')

input = torch.LongTensor(1500).random_(0, 1000000)
offsets = torch.zeros(64, dtype=torch.int64)

niter = 10000
s = time.time()
for i in range(niter):
    out = eb(input, offsets)
time_per_iter = (time.time() - s) / niter
print('time_per_iter', time_per_iter)
print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9)
```

Reviewed By: bddppq

Differential Revision: D16930519

fbshipit-source-id: 44d59ca2588deecde1adb096673fc100bcd9bc46
2019-08-23 17:15:44 -07:00
35a00155e3 print padding_mode for Conv modules if not zeros (#23996)
Summary:
padding_mode info is helpful if it's not default.
`Conv1d(3, 4, kernel_size=(2,), stride=(2,), padding=(1,), padding_mode=circular)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23996

Differential Revision: D16766348

Pulled By: ailzhang

fbshipit-source-id: b2511ec0ab6b6cfb32c0915fe9e84f9b96a641f5
2019-08-23 16:30:46 -07:00
add57fd267 Support lowering of fp16 weights
Summary: It's needed by fp16 SLS.

Test Plan: The lowering works but NNPI doesn't seem to support fp16 SLS yet.

Reviewed By: zrphercule

Differential Revision: D16996047

fbshipit-source-id: e830e4926b416cb7770975838baf17a88dde6d91
2019-08-23 16:06:15 -07:00
bc83ed10fa Revert "per channel quantization support (#24936)" (#25131)
Summary:
This reverts commit 9e9965035ce92c7f3eda36fa9b18f4bc0042001b.

Since it is breaking master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25131

Differential Revision: D16997422

Pulled By: zrphercule

fbshipit-source-id: cc467600fad4940e0db7b2387a0a6c938fe50470
2019-08-23 16:01:06 -07:00
9854435588 move some methods into function.cpp (#25119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25119

Make `defaultSchemaFor` an anonymous function and move it + its caller
into function.cpp

Purely mechanical changes

Test Plan: Imported from OSS

Differential Revision: D16994147

Pulled By: suo

fbshipit-source-id: 96da8b3527eea37ad7beae433122384303a010c9
2019-08-23 15:58:49 -07:00
65beee5872 Add a skip_override option to should_run_job.py (#25118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25118

This allows people to temporarily disable a job from running on PRs. We
should use this only if there is a long-running breakage that can't be
fixed in a simple way.

Test Plan: Imported from OSS

Differential Revision: D16994074

Pulled By: suo

fbshipit-source-id: 6aa9c618057c126d16065e53a60204665d8ff0eb
2019-08-23 15:51:28 -07:00
5fd3251c50 add some sparse tensor ops support in TorchScript (#24967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24967

Fixes https://github.com/pytorch/pytorch/issues/24140

Test Plan: Imported from OSS

Differential Revision: D16975865

fbshipit-source-id: 134ecfff6ecb7144079d4eae85b186293aa26dd3
2019-08-23 15:48:14 -07:00
12ea1d74f0 Add missing functions and methods for channelwise quantization (#24934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24934

1) Functions and methods to get scales and zero_points for channelwise quantization were missing. Adding these.
2) Correctly print quantized tensors for channelwise quantization.
ghstack-source-id: 88868339

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qtensor\ \(test_quantized_tensor.TestQuantizedTensor\)'  --print-passing-details

```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324844629541
      ✓ caffe2/test:quantized - test_qtensor (test_quantized_tensor.TestQuantizedTensor) 0.161 1/1 (passed)
Test output:
> test_qtensor (test_quantized_tensor.TestQuantizedTensor) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.161s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324844629541
Summary (total time 6.61s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```
To be added in a followup diff.
Current output for printing qtensors:
print(W_q.int_repr())
print(W_q)

```
> tensor([[[[-3,  0,  0],
>           [ 4, -2, -4],
>           [-1, -3, -2]],
>
>          [[-3,  1,  3],
>           [-3, -3,  3],
>           [-3, -5, -1]]],
>
>
>         [[[ 4, -3, -4],
>           [ 4, -3, -3],
>           [ 4, -1, -1]],
>
>          [[ 2, -3,  0],
>           [ 3,  1,  1],
>           [ 2, -4,  0]]]], dtype=torch.int8)
> tensor([[[[-0.9273, -0.2318, -0.2318],
>           [ 0.6955, -0.6955, -1.1592],
>           [-0.4637, -0.9273, -0.6955]],
>
>          [[-0.9273,  0.0000,  0.4637],
>           [-0.9273, -0.9273,  0.4637],
>           [-0.9273, -1.3910, -0.4637]]],
>
>
>         [[[ 0.3938, -0.1575, -0.2363],
>           [ 0.3938, -0.1575, -0.1575],
>           [ 0.3938,  0.0000,  0.0000]],
>
>          [[ 0.2363, -0.1575,  0.0788],
>           [ 0.3150,  0.1575,  0.1575],
>           [ 0.2363, -0.2363,  0.0788]]]], size=(2, 2, 3, 3), dtype=torch.qint8,
>        quantization_scheme=torch.per_channel_affine,
>        scale=tensor([0.2318, 0.0788]), zero_point=tensor([ 1, -1]))
```

Differential Revision: D16659715

fbshipit-source-id: f8d3eeaff8f618aa0cca4fd076db73318e6df946
2019-08-23 15:44:16 -07:00
9e9965035c per channel quantization support (#24936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24936

Per channel quantization support in qconv2d + tests
ghstack-source-id: 88897977

Test Plan:
buck test mode/dev caffe2/test:quantized -- --print-passing-details
```
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124656103386
Summary (total time 64.42s):
  PASS: 33
  FAIL: 0
  SKIP: 3
    caffe2/test:quantized - test_qlinear (test_quantized.TestDynamicQuantizedLinear)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16909378

fbshipit-source-id: a5dbe00aab220a01557ef03c905dcbe4668432c4
2019-08-23 14:51:31 -07:00
aba15ce904 Per Channel quantization APIs (#24935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24935

Adding per channel qtensor creation APIs

Added two tests:

EmptyPerchannelQuantized in aten/src/ATen/test/quantized_test.cpp

test_perchannel_qtensor_creation in test/test_quantized_tensor.py
ghstack-source-id: 88888140

Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_per_channel_qtensor_creation'  --print-passing-details

Differential Revision: D16696959

fbshipit-source-id: f179247cc1c461bec215e17b51263060003493a5
2019-08-23 14:49:32 -07:00
ad7250d315 Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (#24944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24944

As Title says, we would like to make the EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag.
ghstack-source-id: 88883902

Test Plan:
python hp_emblookup_codegen.py --use-offsets
Check the benchmark in D16990830.

Reviewed By: jspark1105

Differential Revision: D16924271

fbshipit-source-id: 7fac640c8587db59fd2304bb8e8d63c413f27cb8
2019-08-23 14:43:56 -07:00
6981b4e5bb Update QNNPACK submodule to 901e9d4 (#25044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25044

Bring in Windows fixes, new microkernels, and zero-batch support.

Test Plan: CI

Reviewed By: supriyar

Differential Revision: D16946393

fbshipit-source-id: 3047eb73f1980e4178b795a20d53e744f176c2d8
2019-08-23 14:37:10 -07:00
c013c06653 Add helper function Tensor::names() (#24914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24914

There are two helpers, Tensor::names(), and Tensor::opt_names().
- Tensor::names() always returns a DimnameList; if the tensor doesn't have
names, it returns a DimnameList of all `None` names.
- Tensor::opt_names() returns an optional<DimnameList>: it returns
names if the tensor has names allocated, otherwise, nullopt.

Tensor::opt_names() is more of an implementation detail. It is
recommended that devs use Tensor::has_names() and Tensor::names()
because those result in a cleaner API.

This PR also cleans up callsites of Tensor::opt_names() to use
Tensor::names() where applicable.

Finally, this PR also adds impl::get_names(TensorImpl*), which is the
analogous function for TensorImpl*. (Tensor::opt_names() <->
impl::get_opt_names(TensorImpl*)).

Test Plan: - run existing tests. [namedtensor ci]

Differential Revision: D16919767

Pulled By: zou3519

fbshipit-source-id: ef30c9427a3d8e978d2e6d01c7f74f5174ccd52c
2019-08-23 14:32:15 -07:00
530db2c7c2 Rename Tensor::names() to Tensor::opt_names() (#24907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24907

This better reflects the semantics because Tensor::opt_names() returns
an `optional<DimnameList>`, not just a DimnameList.

Also rename `impl::get_names` to `impl::get_opt_names` (that is the
`TensorImpl*` variant of `Tensor::opt_names()`.

Test Plan
- run existing tests [namedtensor ci]

gh-metadata: pytorch pytorch 24907 gh/zou3519/110/head

Test Plan: Imported from OSS

Differential Revision: D16919768

Pulled By: zou3519

fbshipit-source-id: 094d404576b3f4b39629d0204e51c6ef48ee006e
2019-08-23 14:32:11 -07:00
867d8af20f Fix FIXME_default_names by storing static list of 64 none names (#24885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24885

Store a static pre-allocated vector of names. When one calls
`default_names`, it gives a const reference to some amount of these
names.

Also make clearer the maximum number of dimensions we support for named
tensors. Right now it is 64 but that number is easy to change. 64
follows some internal pytorch maximum number of dimensions;
TensorIterator reduce ops have a limit of 64 dims.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D16915803

Pulled By: zou3519

fbshipit-source-id: 931741b199456f8976882b82f25ab5af6dcd108b
2019-08-23 14:32:07 -07:00
6c83424620 Optimize performance for unboxed-only kernels (#25055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25055

An ATen kernel registered with the c10 dispatcher doesn't need a cache,
so let's not call a cache creator function when the kernel is looked up.
ghstack-source-id: 88834902

Test Plan: unit tests

Differential Revision: D16974248

fbshipit-source-id: 5f9e65d706ec5f836804cb6e5f693f5a01f66714
2019-08-23 14:09:50 -07:00
5b84514a9f Fix lint checker breakage caused by #25111 (#25122)
Summary:
fix lint by flake8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25122

Differential Revision: D16995103

Pulled By: zrphercule

fbshipit-source-id: 810be4d8073cae73d4b0f6d82b410fd235a73bbb
2019-08-23 14:07:31 -07:00
199e15faf2 fix clang-tidy failing on master (#25121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25121

Turns out there is a more idiomatic way to use azure variables. This
also fixes clang-tidy failing on master

Test Plan: Imported from OSS

Differential Revision: D16994595

Pulled By: suo

fbshipit-source-id: 5c5b1b47ced57cff85c4302cde43ff8c8c3f54c0
2019-08-23 13:50:24 -07:00
2ec23804e2 dictPop: dereference dict.find() iterator before calling dict.erase() (#25056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25056

For some combinations of key and entry ordering (and only on an OSX
build) dict.pop() would return a value other than the popped one,
failing test_pop in test_jit.py. Caused by erase() mutating the
iterator returned from find(), fixed by dereferencing it first.

Test Plan: Imported from OSS

Differential Revision: D16975020

Pulled By: bhosmer

fbshipit-source-id: ce84e9aed6b90010121c0ef5d6c9ed8d2d1356b8
2019-08-23 13:16:46 -07:00
ab38059bc7 fix annotated assignment (#25094)
Summary:
Fixing parsing for annotated assignment
`List[int] a = []`.

See https://github.com/pytorch/pytorch/pull/24989/files?file-filters%5B%5D=.py for changes to the test_jit_py3 & run_test files.

follow up to https://github.com/pytorch/pytorch/pull/24477 and fix for https://github.com/pytorch/pytorch/issues/25086
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25094

Differential Revision: D16985016

Pulled By: eellison

fbshipit-source-id: 6be1363f2503303b96bd2e6a9f188ad72441f4eb
2019-08-23 13:14:38 -07:00
1c4495d8ac Clean up after running doc tests (#25036)
Summary:
](https://our.intern.facebook.com/intern/diff/16965612/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25036

Pulled By: driazati

Differential Revision: D16965612

fbshipit-source-id: 494a734c27c1330ea0917397efbad6bc4f40be73
2019-08-23 12:52:48 -07:00
d1f0823d23 fix clang-tidy failing all the time on random lines (#25078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25078

Our script is set up to only run on lines generated by diffing your branch against the base branch.

But we were using `$TRAVIS_BRANCH` to refer to the target branch, which was causing the script to diff against master, generating many spurious lines of diff output to be clang-tidy'd

Test Plan: Imported from OSS

Differential Revision: D16993054

Pulled By: suo

fbshipit-source-id: 7bffa890f6a1a2d5566ef01b9798c4eb86d8169f
2019-08-23 12:50:06 -07:00
2cccad2c56 Turn off fbgemm for libtorch android build (#25113)
Summary:
https://github.com/pytorch/FBGEMM (USE_FBGEMM is ON by default for x86, x86_64)

Build libtorch for android_abi x86_64 fails due to this.

Turning it off for android builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25113

Reviewed By: dreiss

Differential Revision: D16992459

Pulled By: IvanKobzarev

fbshipit-source-id: 3cf35a67043288cb591cc3b23c261258c28cf304
2019-08-23 12:47:53 -07:00
e42b238f7f pin_memory thread now uses 1 thread only (#25111)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25010
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25111

Differential Revision: D16992718

Pulled By: soumith

fbshipit-source-id: fe23721d4cc293fa245c84c656241730335077dd
2019-08-23 12:42:11 -07:00
9a793a49e7 Add thread-local-state NamesMode and NoNamesGuard (#24942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24942

NamesMode determines whether or not to ignore the names field of
TensorImpl. In particular, when it is disabled, all tensors are treated
as unnamed.

Test Plan: - New tests [namedtensor ci]

Differential Revision: D16930708

Pulled By: zou3519

fbshipit-source-id: 867b31c4daff4e1eabafea45ed489efda4471efb
2019-08-23 11:46:54 -07:00
56245ffe05 Fix python lints for generate_test_torchscripts.py (#25107)
Summary:
Fix lints, checked with flake8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25107

Reviewed By: zrphercule

Differential Revision: D16991296

Pulled By: IvanKobzarev

fbshipit-source-id: 5b69d716e3c458dc2cfe5b668a390c7272b1c74f
2019-08-23 11:37:23 -07:00
649c9cd1ca Enable UBSAN test for FBGEMM in dynamic quant test (#25099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25099

As title says
ghstack-source-id: 88875870

Test Plan: CI

Differential Revision: D16986248

fbshipit-source-id: 2a0de41e89e413a32957b12308e5e6f48715477f
2019-08-23 10:44:38 -07:00
d62bca9792 jni-java wrapper for pytorchScript api (#25084)
Summary:
TLDR; initial commit of android java-jni wrapper of pytorchscript c++ api

The main idea is to provide java interface for android developers to use pytorchscript modules.
java API tries to repeat semantic of c++ and python pytorchscript API

org.pytorch.Module (wrapper of torch::jit::script::Module)
 - static Module load(String path)
 - IValue forward(IValue... inputs)
 - IValue runMethod(String methodName, IValue... inputs)

org.pytorch.Tensor (semantic of at::Tensor)
 - newFloatTensor(long[] dims, float[] data)
 - newFloatTensor(long[] dims, FloatBuffer data)

 - newIntTensor(long[] dims, int[] data)
 - newIntTensor(long[] dims, IntBuffer data)

 - newByteTensor(long[] dims, byte[] data)
 - newByteTensor(long[] dims, ByteBuffer data)

org.pytorch.IValue (semantic of at::IValue)
 - static factory methods to create pytorchscript supported types

Examples of usage api could be found in PytorchInstrumentedTests.java:

Module module = Module.load(path);
IValue input = IValue.tensor(Tensor.newByteTensor(new long[]{1}, Tensor.allocateByteBuffer(1)));
IValue output = module.forward(input);
Tensor outputTensor = output.getTensor();

ThreadSafety:
Api is not thread safe, all synchronization must be done on caller side.

Mutability:
org.pytorch.Tensor buffer is DirectBuffer with native byte order, can be created with static factory methods specifing DirectBuffer.
At the moment org.pytorch.Tensor does not hold at::Tensor on jni side, it has: long[] dimensions, type, DirectByteBuffer blobData

Input tensors are mutable (can be modified and used for the next inference),
Uses values from buffer on the momment of Module#forward or Module#runMethod calls.
Buffers of input tensors is used directly by input at::Tensor

Output is copied from output at::Tensor and is immutable.

Dependencies:
Jni level is implemented with usage of fbjni library, that was developed in Facebook,
and was already used and opensourced in several opensource projects,
added to the repo as submodule from personal account to be able to switch submodule
when fbjni will be opensourced separately.

ghstack-source-id: b39c848359a70d717f2830a15265e4aa122279c0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25084
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25105

Reviewed By: dreiss

Differential Revision: D16988107

Pulled By: IvanKobzarev

fbshipit-source-id: 41ca7c9869f8370b8504c2ef8a96047cc16516d4
2019-08-23 10:42:44 -07:00
3a59a9b36c Implement name inference for t(), transpose(...) (#24941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24941

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16930707

Pulled By: zou3519

fbshipit-source-id: 833a2bfd27f3bb3b7bc4327ac62a1d02ec526127
2019-08-23 09:01:53 -07:00
f583f2e657 Fixed test_numba_integration (#25017)
Summary:
The semantic of the _auto-convert GPU arrays that support the __cuda_array_interface__ protocol_ has changed a bit.

It used to throw an exception when using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`. Now, this is supported and will result in an implicit copy.

I do not what have changes but `from_blob()` now supports that the input and the output device differ.
I have updated the tests to reflect this, which fixes https://github.com/pytorch/pytorch/issues/24968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25017

Differential Revision: D16986240

Pulled By: soumith

fbshipit-source-id: e6f7e2472365f924ca155ce006c8a9213f0743a7
2019-08-23 08:58:08 -07:00
5254b12002 cleanup tmp name generation (#25065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25065

Using global atomic variables is bad because sending the same AST through
the compiler twice will produce different graphs. This makes it a
member of the translation struct.

Test Plan: Imported from OSS

Differential Revision: D16975355

Pulled By: zdevito

fbshipit-source-id: 23e15ffd58937a207898a4c4bed82628237e3c2e
2019-08-22 22:49:16 -07:00
0ae030f87e Typo correction in cuda_deterministic_backward.rst (#25011)
Summary:
I presume this is what was intended.
cc t-vi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25011

Differential Revision: D16980939

Pulled By: soumith

fbshipit-source-id: c55b22e119f3894bd124eb1dce4f92a719ac047a
2019-08-22 21:19:39 -07:00
192a26249d Temporarily fix hub SSL cert issue (#25042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25042

[ci pytorch_linux_xenial_cuda9_cudnn7_py2_test]

Test Plan: Imported from OSS

Differential Revision: D16974162

Pulled By: jamesr66a

fbshipit-source-id: 52b00dec748b2704941f634b7a9a3671a2627b89
2019-08-22 18:08:45 -07:00
5c78e0c470 Fix a bug in creating a prefix string in jit log. (#25051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25051

In #24355 I factored out a function for creating a prefix in jit_log,
but I made a copypasta error there: the prefix stringstream was
initialized from the input string instead of an empty string.

Test Plan: Imported from OSS

Differential Revision: D16974156

Pulled By: ZolotukhinM

fbshipit-source-id: 014fe0e3366e85e984a6936ec9bb17f571107f6e
2019-08-22 17:44:42 -07:00
e92506a258 BlackBoxPredictor OSS part N + 1 : strip fb/predictor/Transforms.h dependency (#23350) (#23350)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23350
ghstack-source-id: 87121538
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24928

Test Plan: waitforsandcastle

Differential Revision: D16445133

Pulled By: salexspb

fbshipit-source-id: a93106489611dfe427b0f144717bc720d04e47f3
2019-08-22 17:11:00 -07:00
9764c2e6f0 Adding quantized mul kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24444

Test Plan: Imported from OSS

Differential Revision: D16844824

Pulled By: zafartahirov

fbshipit-source-id: 626c40e1cad8329c3d8517156f2d36d7a7472890
2019-08-22 16:54:15 -07:00
f9f5af0ed7 Revert D16949314: [jit] Fix bugs in assignment to optionals
Test Plan: revert-hammer

Differential Revision:
D16949314

Original commit changeset: 7f63d88b30a3

fbshipit-source-id: d1f00de2ad9c3484b731ad1b24205ca60024355d
2019-08-22 16:50:48 -07:00
bb79b61ce7 Fix bugs in assignment to optionals (#24989)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24989

This fixes the cases where a type annotated with optional cannot
be conditionally assigned to none:

```
x : Optional[int] = 4
if ...:
 x = None
```

Test Plan: Imported from OSS

Differential Revision: D16949314

Pulled By: zdevito

fbshipit-source-id: 7f63d88b30a3f5b024c2a539aa74967c9202af00
2019-08-22 16:27:46 -07:00
f8611eaa7e Disable tsan for test_dataloader.py. (#25005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25005

Seeing a bunch of failures in TSAN mostly with the following error:

```
ThreadSanitizer: starting new threads after multi-threaded fork is not
supported. Dying (set die_after_fork=0 to override)
```

TSAN is unsafe to use in a multi-threaded program after fork() and setting
die_after_fork can lead to deadlocks. As a result, I'm disabling tsan.
ghstack-source-id: 88765698

Differential Revision: D16954347

fbshipit-source-id: 18895cd82b5052938284b46479d8470af2d74a06
2019-08-22 16:20:54 -07:00
149c646b74 Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (#25012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25012

Resubmitting https://github.com/pytorch/pytorch/pull/22907 with build fix.

This change adds the following functionality:
1) WorkNCCL isCompleted, isSuccess methods check for NCCL errors and set the
appropriate exception.
2) Added a watchdog thread to ProcessGroupNCCL which checks for errors in the
cached communicators and removes them from the cache.
3) Use ncclCommAbort in NCCLComm destructor since ncclCommDestroy can block
forever waiting for work.
4) Added a simulate_nccl_errors.py script to simulate NCCL errors.

https://github.com/pytorch/pytorch/issues/17882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22907

Test Plan: 1) Run the simulate_nccl_errors.py to verify NCCL errors are caught.

Differential Revision: D16958078

fbshipit-source-id: 662b0b8b8ee250e2b6d15bdfc9306d71c4f66219
2019-08-22 16:12:41 -07:00
1037652224 disable custom class logic for mobile build to avoid rtti (#24994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24994

Use C10_MOBILE to gate CustomClass lookup logic for mobile build, which uses
typeid() and requires "-frtti" which is off by default for internal mobile build.

Not sure whether we ever need CustomClass for internal use cases. Feel the change
is not too intrusive - but I'm willing to hear others' thoughts.
ghstack-source-id: 88754932

Reviewed By: dreiss

Differential Revision: D16951430

fbshipit-source-id: 445f47ee4e9c16260e2fd2c43f5684cea602e3d9
2019-08-22 14:44:30 -07:00
a805a0d3ca Remove deprecated TH(topk) code. #24778 (#24857)
Summary:
Remove deprecated TH(topk) code. https://github.com/pytorch/pytorch/issues/24778
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24857

Differential Revision: D16916614

Pulled By: VitalyFedyunin

fbshipit-source-id: 00299fb85614b87f69b77d9672a4ace33d6cdfaa
2019-08-22 13:19:46 -07:00
664555c757 Fix fbcode weak ordering (#25026)
Summary:
Same FBCode-only weak ordering issue as previously encountered :(  Internal assert fails a test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25026

Differential Revision: D16966994

Pulled By: eellison

fbshipit-source-id: 649331ae1317df870f26a968e3f40f2b7a3a072a
2019-08-22 12:00:19 -07:00
e2ccccee9a Load tensors directly from pickle archive
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23281

Test Plan: Imported from OSS

Differential Revision: D16452815

Pulled By: zdevito

fbshipit-source-id: 918eef3ad444b598ab655c39037e4baafdcb51e1
2019-08-22 11:48:09 -07:00
c33adf539c Fix for cdist backward for non-batch tensors (#22915)
Summary:
Fix for: https://github.com/pytorch/pytorch/issues/22353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22915

Differential Revision: D16291406

Pulled By: ifedan

fbshipit-source-id: 2decbe94c95165f7ddb2c8e2f4c4747c19069a4c
2019-08-22 11:36:37 -07:00
4b77cae360 Add qconv_test to benchmarking tests (#24913)
Summary:
Adds the tests defined in `qconv_tests.py` to `benchmark_all_tests.py` so that they are ran by `benchmark_all_tests`.

The next diff will create another `ai_benchmark_test` specifying the qconv operations similar to D16768680. Since AI-PEP integrates with benchmark_all_tests, this should add these qconv benchmarks to AI-PEP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24913

Test Plan:
`buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test` (runs only test who's `tag` is `short`)

`buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --tag_filter resnext101_32x4d` (runs test who's `tag` is `resxnet101_32x4d`).

This runs the tests for all the imported modules in `benchmark_all_test.py` (i.e. add_test, batchnorm_test, qconv_test, etc)

```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operators QConv2d,QLinear
```
tests the QConv and QLinear operators

Relevant output for `qconv_test.py` (for short tag):

```
# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 957.848

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC256_OC256_H56_W56_G32_kernel3_stride1_pad1
# Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 32, kernel: 3, stride: 1, pad: 1
Forward Execution Time (us) : 3638.806

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC256_OC256_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 256, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 3870.311

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H56_W56_G32_kernel3_stride2_pad1
# Input: N: 1, IC: 512, OC: 512, H: 56, W: 56, G: 32, kernel: 3, stride: 2, pad: 1
Forward Execution Time (us) : 10052.192
```

For resnext tag:

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : resnext101_32x4d

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H14_W14_G32_kernel3_stride1_pad1
# Input: N: 1, IC: 512, OC: 512, H: 14, W: 14, G: 32, kernel: 3, stride: 1, pad: 1
Forward Execution Time (us) : 543.171

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC1024_H28_W28_G1_kernel1_stride2_pad0
# Input: N: 1, IC: 512, OC: 1024, H: 28, W: 28, G: 1, kernel: 1, stride: 2, pad: 0
Forward Execution Time (us) : 1914.301

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC256_H28_W28_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 512, OC: 256, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1809.069

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H28_W28_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 3100.579

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC512_OC512_H28_W28_G32_kernel3_stride2_pad1
# Input: N: 1, IC: 512, OC: 512, H: 28, W: 28, G: 32, kernel: 3, stride: 2, pad: 1
Forward Execution Time (us) : 2247.540

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC128_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 128, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1001.731

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_N1_IC64_OC256_H56_W56_G1_kernel1_stride1_pad0
# Input: N: 1, IC: 64, OC: 256, H: 56, W: 56, G: 1, kernel: 1, stride: 1, pad: 0
Forward Execution Time (us) : 1571.620
```

Differential Revision: D16908445

Pulled By: rohan-varma

fbshipit-source-id: b711bc3591ce5dcd3ab2521134cff2b12188e3ac
2019-08-22 11:28:49 -07:00
049284e14d Make observer scriptable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24996

Test Plan: Imported from OSS

Differential Revision: D16952938

Pulled By: jamesr66a

fbshipit-source-id: 3d08e0c746603d0fe090fb3dbf13c5fc9dc022f4
2019-08-22 11:28:45 -07:00
956a347e68 Fix the lint error in transformer doc. (#25027)
Summary:
Fix the lint error in transformer doc.
https://github.com/pytorch/pytorch/pull/24837
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25027

Differential Revision: D16963508

Pulled By: zhangguanheng66

fbshipit-source-id: 3f70e32c74d2319ffb8e2143d3181ed38e62067d
2019-08-22 11:28:41 -07:00
3385693edd gradient clipping by norm
Summary: as titled

Reviewed By: hbjerry, alyssawangqq

Differential Revision: D16797498

fbshipit-source-id: 4ea05ab9f06b309d32faa3218e79899c9f8d9cf2
2019-08-22 11:20:40 -07:00
1a2a9fab31 Remove Symmetric Quantizer in backend (#24964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24964

To reduce complications in quantized kernel implementation, we decided not to
have symmetric quantizer, since it can be expressed by affine quantizer,
but we will still have symmetric quantization qscheme in frontend, and user
can still specify tensors to be symmetrically quantized, while the actual quantized
Tensor represetation will only have affine quantization.

Differential Revision: D16965114

fbshipit-source-id: 0e9a5a00131878a302e211fda65a1aa427204eea
2019-08-22 11:18:09 -07:00
e8ea44796e add support for multiple assignment statements (#24477)
Summary:
add support for : `a = b, c = (1, 2)`

partial fix for https://github.com/pytorch/pytorch/issues/24256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24477

Differential Revision: D16963413

Pulled By: eellison

fbshipit-source-id: 0433a1e759b3aa719ef1b766bb5160f2ca814205
2019-08-22 10:17:14 -07:00
901f9eaa89 Migrate erfinv and erfinv_ from the TH to Aten(CPU) (#24908)
Summary:
https://github.com/pytorch/pytorch/issues/24698
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24908

Differential Revision: D16962714

Pulled By: ifedan

fbshipit-source-id: 51309b601fbd29e09f5a0efa67eb24a184fee81a
2019-08-22 10:13:28 -07:00
632aeb034d Fix log_prob() in torch.distributions.Uniform, HalfCauchy and Gamma (#23017)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/22970. Specifically, `torch.distributions.uniform.Uniform.log_prob()` now works even if `value` is passed as a python float.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23017

Differential Revision: D16383258

Pulled By: vincentqb

fbshipit-source-id: 26943c33431d6da6f47e0897d6eda1c5f5541d28
2019-08-22 08:19:41 -07:00
b9a5188178 Fixed Error in Transformer Example (#24837)
Summary:
In the examples for creating an instance of the Transformer module, src and tgt parameters (from forward) were added which are not present in the __init__ .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24837

Differential Revision: D16938065

Pulled By: zhangguanheng66

fbshipit-source-id: 7b2d2180d95ddb65930ad83c87c926e35f2bf521
2019-08-22 07:37:24 -07:00
789f4ad87b Fixing size implementation for struct slot_list_impl (#24351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24351

**Context:**
I was doing some exploration on APIs for jit script module internals.
I found there can be a bug(cannot cast Module to Slot) when I try to check size of sub_modules in one module. (please also provide suggestions if you think my diff is not optimal or wrong)

See the following:

  for (auto m1 : module.get_modules()) { // module is the module loading from P79892263.
    std::cout << "test module  " << "   " << m1.get_modules().size() << "\n";
  }

With this change, its going to return 0 (expected)
Without this change, the following error will throw: P79892732

Also, I put a RFC here since I am looking for some ideas for any tests I should add, and where I should add those tests.

Reviewed By: smessmer

Differential Revision: D16803759

fbshipit-source-id: 1e2ae6b69d9790c700119d2d0b9f9f85f41616d4
2019-08-22 03:39:45 -07:00
310c5be005 Skip setting CUDA_NVCC_EXECUTABLE if CACHE_WRAPPER_DIR not set. (#25006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25006

Builds without sccache or ccache would run into issues since
`CACHE_WRAPPER_DIR` would not be set. As a result `CUDA_NVCC_EXECUTABLE` would
be set to /nvcc and the build would fail.
ghstack-source-id: 88766907

Differential Revision: D16954651

fbshipit-source-id: fea41da52dc9f8f03e6356d348f5900978db3651
2019-08-21 21:27:23 -07:00
74b65c32be Add align_corners option to grid_sample and affine_grid, change default to False (#24929)
Summary:
Resolves: https://github.com/pytorch/pytorch/issues/20785
Addresses https://github.com/pytorch/pytorch/issues/24470 for `affine_grid`
Subsumes and closes: https://github.com/pytorch/pytorch/pull/24878 and likewise closes: https://github.com/pytorch/pytorch/issues/24821

Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0.

In short, setting `align_corners` to `False` allows these functions to be resolution agnostic.
This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts.

Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details.

#### BC-Breaking Changes

- **Important**: BC-Breaking change because of new default for `align_corners`
The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`.

- **Should not cause BC issues**: BC-Breaking change for pathological use case
2D affine transforms on 1D coordinates and 3D affine transforms on 2D coordinates (that is, when one of the spatial dimensions has an empty span) are ill-defined, and not an intended use case of `affine_grid`. Whereas before, all grid point components along such dimension were set arbitrarily to `-1` (that is, before multiplying be the affine matrix), they are now all set instead to `0`, which is a much more consistent and defensible arbitrary choice. A warning is triggered for such cases.

#### Documentation

- Update `affine_grid` documentation to express that it does indeed support 3D affine transforms. This support was already there but not documented.
- Add documentation warnings for BC-breaking changes in `grid_sample` and `affine_grid` (see above).

#### Refactors

- `affine_grid` no longer dispatches to cuDNN under any circumstances.
The decision point for when the cuDNN `affine_grid_generator` is compatible with the native PyTorch version and when it fails is a headache to maintain (see [these conditions](5377478e94/torch/nn/_functions/vision.py (L7-L8))). The native PyTorch kernel is now used in all cases.

- The kernels for `grid_sample` are slightly refactored to make maintenance easier.

#### Tests
Two new tests are added in `test_nn.py`:
- `test_affine_grid_error_checking` for errors and warnings in `affine_grid`
- `test_affine_grid_3D` for testing `affine_grid`'s 3D functionality. The functionality existed prior to this, but wasn't tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24929

Differential Revision: D16949064

Pulled By: ailzhang

fbshipit-source-id: b133ce0d47a2a5b3e2140b9d05fb05fca9140926
2019-08-21 21:17:49 -07:00
420b37f3c6 Deprecate tensor.data<T>(), and codemod tensor.data<T>() to tensor.data_ptr<T>() (#24886)
Summary:
This PR adds deprecation message for `tensor.data<T>()` (91d94e7d41), and changes all call sites of `tensor.data<T>()` to `tensor.data_ptr<T>()`  in PyTorch core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24886

Differential Revision: D16924576

Pulled By: yf225

fbshipit-source-id: 0943d6be73245c7c549c78597b74c3b07fa24440
2019-08-21 20:11:24 -07:00
aa66146974 Add ASAN instructions to CONTRIBUTING.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24848

Test Plan: Imported from OSS

Differential Revision: D16896673

Pulled By: suo

fbshipit-source-id: 32e58abe9fd79a8217cecbc7832e436684edaf80
2019-08-21 19:16:21 -07:00
173dc5d16f __reduce__ for QScheme (#24969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24969

This allows pickling qscheme objects

Test Plan: Imported from OSS

Differential Revision: D16946567

Pulled By: jamesr66a

fbshipit-source-id: 57dbedb1e1aca2a2e17916eed662f727053ea926
2019-08-21 19:08:54 -07:00
4966268a1d Move CPU-only jobs to xenial
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24506

Test Plan: Imported from OSS

Differential Revision: D16862005

Pulled By: jamesr66a

fbshipit-source-id: cc4b3eee7f442a63ddc68667ac42404fe0b49d6c
2019-08-21 18:12:55 -07:00
6dca147946 Misc doc updates #2 (#24445)
Summary:
Another pass over the docs, this covers most of the remaining stuff

* content updates for new API
* adds links to functions instead of just names
* removes some useless indentations
* some more code examples + `testcode`s
](https://our.intern.facebook.com/intern/diff/16847964/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24445

Pulled By: driazati

Differential Revision: D16847964

fbshipit-source-id: cd0b403fe4a89802ce79289f7cf54ee0cea45073
2019-08-21 16:45:19 -07:00
0eb55f9ddd PrepareQuant step (#24425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24425

- _jit_pass_prepare_quant: clone the observer module in argument and insert that to the
module we want to quantize, insert observer calls for the Tensor values we want to observe

Differential Revision: D16933120

fbshipit-source-id: 7248de6132429ba943a09831a76486f7a3cd52a3
2019-08-21 16:39:48 -07:00
14ac7a1d87 Add epsilon argument to Adagrad optimizer (#24980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24980

We'll need this internally, so just updating the open source version. the other optimizers have this argument anyways.

Test Plan: Imported from OSS

Differential Revision: D16945279

Pulled By: li-roy

fbshipit-source-id: 0b8cc86f15387cd65660747899d3d7dd870cff27
2019-08-21 16:36:51 -07:00
65d650c6c6 restore default constructor of OutputArchive (#24955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24955

Some third-party code relies on this default constructor. It's not
invalid to construct an OuputArchive with an indepednent CU, so
restoring it.

Test Plan: Imported from OSS

Differential Revision: D16935254

Pulled By: suo

fbshipit-source-id: 40b6494e36d10c5009b3031648bee96b2e38b49a
2019-08-21 16:13:13 -07:00
38314e5b3f Improve c10 dispatcher lookup perf (#24882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24882

Previously, looking up a kernel accidentally copied the DispatchTableEntry, which has as its member a std::function cache creator function.
Being an std::function, it was expensive to copy and cost us more than 50ns on each op call.
This diff fixes this by not copying DispatchTableEntry anymore.
ghstack-source-id: 88611173

Differential Revision: D16910530

fbshipit-source-id: 44eeaa7f6ffead940b4a124f0c31d8ef71404db3
2019-08-21 14:12:27 -07:00
a99a4485fa Added relu6 kernel (#24799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24799

Differential Revision: D16875493

Test Plan: Imported from OSS

Pulled By: zafartahirov

fbshipit-source-id: 0d256db193c6a8e0d37dbdf6cf35dd031fd4ec6c
2019-08-21 13:57:00 -07:00
81ac6260d8 Use absolute import of the parent folder without alias. (#24792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24792

This will prevent the circular dependencies in the future

Differential Revision: D16868861

Test Plan: Imported from OSS

Pulled By: zafartahirov

fbshipit-source-id: 92cf77094b2c56560d380c1fd1df8e1e24a86359
2019-08-21 13:36:23 -07:00
e6b0ebdfd5 Fix named tensor build (#24940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24940

We're missing an include for named tensors in templates/TypeDefault.h.

Test Plan: - run ci [namedtensor ci]

Differential Revision: D16930709

Pulled By: zou3519

fbshipit-source-id: c15d631761a78d5e50fe265a3129239e72042a83
2019-08-21 11:31:14 -07:00
f6daab5686 bind autograd.grad function into TorchScript (#24871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24871

Bind the torch.autograd.grad function into TorchScript so that well
formed inputs can directly call this from a TorchScript function.

This also change the serliazation a bit, it fixes a small bug where node
output type can never be tensor type in prim::ListConstruct(only its elementype can be), and add the case where we need to annotate the ListType if the element type is optional type to preserve type information when reimport

Differential Revision: D16923273

fbshipit-source-id: 151cc13411c8c287def35b4e65122d9fc083ccfd
2019-08-21 11:22:23 -07:00
f21265203e Update onnxruntime CI version (#24414)
Summary:
Use explicit versioned nightly whl such that to provide coverage of ONNX updates not in release yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24414

Differential Revision: D16940810

Pulled By: bddppq

fbshipit-source-id: 7bf76554898958e0f48883a1da7a3bdc781be7f8
2019-08-21 11:19:42 -07:00
b99ab492ea Fix missing super call error
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24852

Pulled By: driazati

Differential Revision: D16902742

fbshipit-source-id: a72403dc37a406ee228d3b19afc22bd86812f962
2019-08-21 10:53:38 -07:00
3d27e6327e Remove torch.contrib._graph_vis (#24874)
Summary:
This hasn't been edited in a while and doesn't work anymore. Its use
case is also served pretty well by `script_module.code`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24874

Pulled By: driazati

Differential Revision: D16941025

fbshipit-source-id: 11acd05cea5e44eeb1d48188a2de645669b21610
2019-08-21 10:48:07 -07:00
4659269d1b Remove unused ATen headers for mobile (#24850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24850

### Summary

There are 1373 header files in total that have been installed on mobile, many of which are not being used. Take ATen for example, there are 165 header files in total. Folders like `cuda/`, `cudann`, `miopen`, etc are not needed. This PR will remove 33 unnecessary header files as well as some cuda files.

### Test Plan

- `build_ios.sh` finished successfully
- `libtorch.a` can be compiled and run on mobile

Test Plan: Imported from OSS

Differential Revision: D16897314

Pulled By: xta0

fbshipit-source-id: 54e046936439a549fe633ec791a10a2a3d36fa8b
2019-08-21 10:04:49 -07:00
b3008fad2e Revert D16220638: [pytorch][PR] Detect and handle NCCL errors appropriately in ProcessGroupNCCL.
Differential Revision:
D16220638

Original commit changeset: fbc8881ea0c3

fbshipit-source-id: 10d2f3d446064adb3cf44e1f9911dcf259bbfbfb
2019-08-21 09:40:38 -07:00
1b8efd3d92 Avoid race condition in intrusive_ptr.reset_() (#24464)
Summary:
This is a spin-off from https://github.com/pytorch/pytorch/issues/24368 (high priority inherited from https://github.com/pytorch/pytorch/issues/3818).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24464

Differential Revision: D16928898

Pulled By: ezyang

fbshipit-source-id: 2d66c7adbd52de52869b1fe69ce2842d035dbf86
2019-08-21 03:48:35 -07:00
da860bda3d Use correct WARP_SIZE for ROCm for EmbeddingBag
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24868

Differential Revision: D16936540

Pulled By: ezyang

fbshipit-source-id: 08def671416888eb6ce57a690c7ff7743c21ad8c
2019-08-21 03:17:52 -07:00
1d53d07566 Add docs to CI (#24435)
Summary:
Stacked PRs
 * #24445 - [jit] Misc doc updates #2
 * **#24435 - [jit] Add docs to CI**

This integrates the [doctest](http://www.sphinx-doc.org/en/master/usage/extensions/doctest.html) module into `jit.rst` so that we can run our code examples as unit tests. They're added to `test_jit.py` under the `TestDocs` class (which takes about 30s to run). This should help prevent things like #24429 from happening in the future. They can be run manually by doing `cd docs && make doctest`.

* The test setup requires a hack since `doctest` defines everything in the `builtins` module which upsets `inspect`
* There are several places where the code wasn't testable (i.e. it threw an exception on purpose). This may be resolvable, but I'd prefer to leave that for a follow up. For now there are `TODO` comments littered around.
](https://our.intern.facebook.com/intern/diff/16840882/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24435

Pulled By: driazati

Differential Revision: D16840882

fbshipit-source-id: c4b26e7c374cd224a5a4a2d523163d7b997280ed
2019-08-20 21:40:44 -07:00
0a23151293 Detect and handle NCCL errors appropriately in ProcessGroupNCCL. (#22907)
Summary:
This change adds the following functionality:
1) WorkNCCL isCompleted, isSuccess methods check for NCCL errors and set the
appropriate exception.
2) Added a watchdog thread to ProcessGroupNCCL which checks for errors in the
cached communicators and removes them from the cache.
3) Use ncclCommAbort in NCCLComm destructor since ncclCommDestroy can block
forever waiting for work.
4) Added a simulate_nccl_errors.py script to simulate NCCL errors.

https://github.com/pytorch/pytorch/issues/17882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22907

Test Plan: 1) Run the simulate_nccl_errors.py to verify NCCL errors are caught.

Differential Revision: D16220638

fbshipit-source-id: fbc8881ea0c38a4d09a77045691e36557b7b0b25
2019-08-20 20:37:37 -07:00
8d46741bae Updating submodules
Reviewed By: zpao

fbshipit-source-id: d2447223283ea7ac6e2f01f5bee4fd84163f0fe0
2019-08-20 17:31:54 -07:00
9c9f14029d Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16929363

Original commit changeset: 69d302929e18

fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f
2019-08-20 17:08:31 -07:00
8e3c0210a5 extend torch.jit._overload to module methods (#24259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24259

Follow up to https://github.com/pytorch/pytorch/pull/23886, add the same overload api specified in PEP 484 to module methods to reduce the friction of adding method overloads that was brought up in #23266.

The usage is:
```
torch.jit.overload
def add(self, y: int) -> int: ...
torch.jit.overload
def add(self, y: float) -> float: ...
def add():
   ...
```

Test Plan: Imported from OSS

Differential Revision: D16921304

Pulled By: eellison

fbshipit-source-id: 784e2f26f7ca9a330a434a603c86b53725c3dc71
2019-08-20 16:47:35 -07:00
4b3ea92787 Test if descriptions of args are in the template (#24161)
Summary:
As in https://github.com/pytorch/pytorch/issues/23439, some descriptions of arguments in `_torch_docs.py` have been replaced by `common_args`, it would be helpful to check if any descriptions can be replaced for new docs in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24161

Differential Revision: D16889293

Pulled By: ezyang

fbshipit-source-id: bf6f581494482d6eb32e634f73e84a4586766230
2019-08-20 16:34:50 -07:00
7ebac74d0a Fix deprecation warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24841

Differential Revision: D16904301

Pulled By: cpuhrsch

fbshipit-source-id: 01b90d18619f51cd6b5b6a2a5a3ee0617f7b4f41
2019-08-20 16:30:11 -07:00
bd6cf5099b Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16048264

Original commit changeset: ad1e50951273

fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b
2019-08-20 16:26:18 -07:00
8ca6220509 Remove unused DynamicDAG class.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24890

Test Plan: Imported from OSS

Differential Revision: D16912935

Pulled By: ZolotukhinM

fbshipit-source-id: 3e4b160119cb73e47811d8636b64a86088a33102
2019-08-20 16:17:59 -07:00
8756ec989e bind autograd functions into C++ (#24342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24342

Right now the two APIs that provided in autograd package only have
python bindings and we could not call them either in C++ API or in
TorchScript. This PR make these two APIs available purely in C++ (with
preserving semantics) and can be used in C++ API and TorchScript

Differential Revision: D16923271

fbshipit-source-id: 049d6fbd94cd71ecc08b2716f74d52ac061f861e
2019-08-20 15:36:34 -07:00
b28a2b3a38 Attempt to fix windows build. (#24916)
Summary:
It looks like https://github.com/pytorch/pytorch/pull/24455 broke the windows build, probably an instance of:
https://github.com/pytorch/pytorch/issues/12117.

I don't have a windows machine handy and I'm not sure:
1) what the rules are with dependent constructors
2) if the type is used transitively

but this is a minimal attempt at fixing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24916

Differential Revision: D16922615

Pulled By: gchanan

fbshipit-source-id: 0dfde84d0c462a14c479eed02ffafb5c4b3c12bb
2019-08-20 14:29:10 -07:00
907f5020c3 Revert D16914345: [pytorch][PR] Move the detection of cuDNN to FindCUDNN.cmake
Differential Revision:
D16914345

Original commit changeset: fd261478c01d

fbshipit-source-id: b933ad7ed49028ab9ac6976c3ae768132dc9bacb
2019-08-20 14:23:12 -07:00
012526dd6b Fix Typing Error for Padding with asymmetric signatures (#24895)
Summary:
This PR resolves https://github.com/pytorch/pytorch/issues/24806
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24895

Differential Revision: D16925208

Pulled By: ezyang

fbshipit-source-id: f4a374ca86e2e99faa30ca4b41c681e9976fe2de
2019-08-20 14:14:12 -07:00
a77cb2ccd1 Revert D16915800: Implement name inference for t(), transpose(...)
Differential Revision:
D16915800

Original commit changeset: d8e5beff3daa

fbshipit-source-id: f8b966fdc485d8250ae74d8bbbda157b45c2d1a0
2019-08-20 14:07:06 -07:00
cf30ec1b83 Revert D16915806: Add thread-local-state NamesMode and NoNamesGuard
Differential Revision:
D16915806

Original commit changeset: 21f7ff1eadeb

fbshipit-source-id: 5d17dd3463d3e23f5adce36a71b63bd5d66a8e9c
2019-08-20 14:07:02 -07:00
d750ab13dc Add thread-local-state NamesMode and NoNamesGuard (#24367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24367

NamesMode determines whether or not to ignore the names field of
TensorImpl. In particular, when it is disabled, all tensors are treated
as unnamed.

Test Plan: - New tests [namedtensor ci]

Differential Revision: D16915806

Pulled By: zou3519

fbshipit-source-id: 21f7ff1eadebd678d6cd9a16ff25dd6134272b76
2019-08-20 13:46:51 -07:00
acf3b76bf0 Implement name inference for t(), transpose(...) (#24203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24203

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16915800

Pulled By: zou3519

fbshipit-source-id: d8e5beff3daa7e5fd5bfed5b02d8089cac300de8
2019-08-20 13:46:47 -07:00
39e8d71dbd Use a ptr to store autograd profiler rng (#24889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24889

Trying to fix #2575. [Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is all TLS in libtorch.so (thanks ezyang for figuring how to find this)

I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000,
which seems bigger than the other ones. So make it heap-allocated
instead.

Caveat: I have no idea if this will actually fix anything, or whether
making this variable heap-allocated is a bad idea.

Test Plan: Imported from OSS

Differential Revision: D16912540

Pulled By: suo

fbshipit-source-id: 71eb0391bf4c6e85b090f8650a2fbfc2107f2707
2019-08-20 13:43:13 -07:00
896e4b6e09 Support QScheme in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24358

Test Plan: Imported from OSS

Differential Revision: D16811412

Pulled By: jamesr66a

fbshipit-source-id: 2b0c981f7e8793bf036e398e02aca3c62ddcb64b
2019-08-20 13:09:44 -07:00
bdc57d3833 Merge ProfiledTensorType and TensorType (#24284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284

This PR finishes the unification of all Tensor types into a single object.
ProfiledTensorType is renamed to TensorType and the old TensorType is
deleted.

Notes:
* Fixes bug in merge for VaryingShape by changing its representation to an
 optional list of optional ints.
* Removes ProfiledTensorType::create(type) invocations that can now
  simply be expect calls on tensor type.

Test Plan: Imported from OSS

Differential Revision: D16794034

Pulled By: zdevito

fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc
2019-08-20 13:01:28 -07:00
6824c9018d Add static dispatch mode to reduce mobile code size
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335

Test Plan: Imported from OSS

Differential Revision: D16048264

Pulled By: li-roy

fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e
2019-08-20 12:21:32 -07:00
0c5c442cb1 Clang formatting the code [1/2] (#24867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24867

The code is formatted by `clang-format -i *.h *.cpp`.
ghstack-source-id: 88589458

Differential Revision: D16904273

fbshipit-source-id: acc8981ca4ce28b333af331b252ea23b10f5b9a0
2019-08-20 11:45:15 -07:00
3463583349 Fix some typos in documentation (#23507)
Summary:
~~In case of tensor indexing with a scalar index, index_select returns a tensor with the same rank as the input. To match this behavior in ONNX, we make index a 1D tensor so that with a gather
it also produces a tensor with the same rank as the input.~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23507

Differential Revision: D16586805

Pulled By: bddppq

fbshipit-source-id: 8f5d964d368873ec372773a29803b25f29a81def
2019-08-20 10:50:13 -07:00
1efdf57aa7 throw remote exception on client side (#24138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24138

catch exception thrown on server, send the exception message back to client and rethrow it.

Reviewed By: mrshenli

Differential Revision: D16748748

fbshipit-source-id: ce18b3ea1b1d28645ec292f58aa0c818d93e559e
2019-08-20 09:40:35 -07:00
d33623f7c1 Make SobolEngine use random seed if not specified (#24884)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/24881. Makes behavior consistent with the rest of the random functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24884

Test Plan: Unit tests

Reviewed By: sdsingh

Differential Revision: D16912036

Pulled By: Balandat

fbshipit-source-id: eff00cca989926a5d9e20d8846a8674f7cd270cb
2019-08-20 09:22:41 -07:00
6ce6939be9 Move the detection of cuDNN to FindCUDNN.cmake (#24784)
Summary:
Currently they sit together with other code in cuda.cmake. This commit
is the first step toward cleaning up cuDNN detection in our build system.

Another attempt to https://github.com/pytorch/pytorch/issues/24293,  which breaks manywheels build because it does not handle `USE_STATIC_CUDNN`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24784

Differential Revision: D16914345

Pulled By: ezyang

fbshipit-source-id: fd261478c01d879dc770c1f1a56b17cc1a587be2
2019-08-20 01:55:46 -07:00
d9b4149e99 Fix cmake backslash syntax error on Windows. (#24420)
Summary:
```
[1/1424] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj
CMake Warning (dev) at torch_generated_weighted_sample_op.cu.obj.Release.cmake:82 (set):
  Syntax error in cmake code at

    C:/Users/Ganzorig/pytorch/build/caffe2/CMakeFiles/torch.dir/operators/torch_generated_weighted_sample_op.cu.obj.Release.cmake:82

  when parsing string

    C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googlemock/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/googletest/googletest/include;;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/cmake/../third_party/benchmark/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/eigen;C:/Users/Ganzorig/Anaconda3/envs/code/include;C:/Users/Ganzorig/Anaconda3/envs/code/lib/site-packages/numpy/core/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/pybind11/include;C:/Users/Ganzorig/pytorch/cmake/../third_party/cub;C:/Users/Ganzorig/pytorch/build/caffe2/contrib/aten;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/build/third_party/foxi;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Program Files/NVIDIA Corporation/NvToolsExt/include;C:/Users/Ganzorig/pytorch/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/../aten/src/ATen;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc;C:/Users/Ganzorig/pytorch/caffe2/../torch/../third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api;C:/Users/Ganzorig/pytorch/caffe2/../torch/csrc/api/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/miniz-2.0.8;C:/Users/Ganzorig/pytorch/caffe2/core/nomnigraph/include;C:/Users/Ganzorig/pytorch/caffe2/;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THC;C:/Users/Ganzorig/pytorch/aten/src/THCUNN;C:/Users/Ganzorig/pytorch/aten/src/ATen/cuda;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src/TH;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src;C:/Users/Ganzorig/pytorch/build/aten/src;C:/Users/Ganzorig/pytorch/aten/src;C:/Users/Ganzorig/pytorch/aten/../third_party/catch/single_include;C:/Users/Ganzorig/pytorch/aten/src/ATen/..;C:/Users/Ganzorig/pytorch/build/caffe2/aten/src/ATen;C:/Users/Ganzorig/pytorch/third_party/protobuf/src;C:/Users/Ganzorig/pytorch/c10/../;C:/Users/Ganzorig/pytorch/build;C:/Users/Ganzorig/pytorch/third_party/cpuinfo/include;C:/Users/Ganzorig/pytorch/third_party/FP16/include;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/foxi;C:/Users/Ganzorig/pytorch/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/build/third_party/onnx;C:/Users/Ganzorig/pytorch/c10/cuda/../..;C:/Users/Ganzorig/pytorch/build;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1\include;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/include

  Invalid escape sequence \i

  Policy CMP0010 is not set: Bad variable reference syntax is an error.  Run
  "cmake --help-policy CMP0010" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.
```

Compared to https://github.com/pytorch/pytorch/issues/24044 , this commit moves the fix up, and uses [bracket arguments](https://cmake.org/cmake/help/v3.12/manual/cmake-language.7.html#bracket-argument).

PR also sent to upstream: https://gitlab.kitware.com/cmake/cmake/merge_requests/3679
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24420

Differential Revision: D16914193

Pulled By: ezyang

fbshipit-source-id: 9f897cf4f607502a16dbd1045f2aedcb49c38da7
2019-08-20 01:25:20 -07:00
b0737ccdc1 Revert D16887357: [pytorch][PR] [BC-BREAKING] Add align_corners option to grid_sample and affine_grid, change default to False
Differential Revision:
D16887357

Original commit changeset: ea09aad7853e

fbshipit-source-id: 0bebb159be4e6ebe479771b42c0b483f5a84a094
2019-08-19 22:05:56 -07:00
f01548e5a4 Removes SymbolicVariable from tests (#24007)
Summary:
This PR removes SymbolicVariable from all tests as well as the specialize_autogradzero and canonicalize_ops passes. These passes used SymbolicVariable in a relatively simple way compared to its few remaining uses.

Removing SymbolicVariable means graphs must be constructed by other methods. IRParser was preferred for tests, but tests requiring pointers to graph internals or differentiation use direct construction instead. See https://github.com/pytorch/pytorch/issues/23989, which was discovered during this process, for why IRParser cannot be used when differentiation is required. Direct construction was also used in the updated passes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24007

Test Plan: Only refactors existing tests and preserves current checks; no additional testing needed.

Differential Revision: D16906045

Pulled By: mruberry

fbshipit-source-id: b67df4611562cd7618f969890e2b6840750c7266
2019-08-19 20:49:37 -07:00
755f91b400 serializing function calls (#23799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799

Before, we inlined as part of the initial IR generation process, which
has a few disadvantages:

1. It loses information about what nodes came from which function/method
calls. Other parties who want to implement transformations on the
function/module level don't have a reliable way of doing so.
2. It duplicates a ton of code if we are inlining the same
function/method a tons of times.

After this PR: inline is deferred to the optimization stage, so
optimizations that rely on inlining will still work. But things get
serialized with the function/method calls in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23799

Differential Revision: D16652819

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Pulled By: suo

fbshipit-source-id: a11af82aec796487586f81f5a9102fefb6c246db
2019-08-19 18:42:43 -07:00
eb7b39e02f Templatize Tensor.data_ptr() (#24847)
Summary:
This PR templatizes `Tensor.data_ptr()`, to prepare for the deprecation of `Tensor.data<T>()` and introduction of `Tensor.data()` that has the same semantics as `Variable.data()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24847

Differential Revision: D16906061

Pulled By: yf225

fbshipit-source-id: 8f9db9fd105b146598a9d759aa4b4332011da8ea
2019-08-19 17:02:18 -07:00
bf978e7890 cumsum (#24476)
Summary:
Added support for cumsum in symbolic opset 11 + op and ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24476

Differential Revision: D16896780

Pulled By: bddppq

fbshipit-source-id: b52355796ee9f37004c9258f710688ad4b1ae8a2
2019-08-19 16:57:04 -07:00
e0e5813b72 Fix unicode in comments (#24218)
Summary:
Fixes #24164
](https://our.intern.facebook.com/intern/diff/16901789/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24218

Pulled By: driazati

Differential Revision: D16901789

fbshipit-source-id: 8f1c7af437e66119bec616bc906c96d5d92cfb13
2019-08-19 16:33:21 -07:00
7f86fb8995 Moves (most) ops to symbolic script (#23794)
Summary:
This PR removes the following operators to symbolic script:

- add
- sub
- mul
- div
- threshold
- clamp
- addmm
- comparison ops (lt, le, ge, ...)
- fmod
- remainder
- max_pool2d_with_indices

Additionally, the view and reshape operations were removed from autodiff.cpp (they were already written in symbolic script).

The functionality of these operators is mostly preserved, except clamp and threshold have been modified to be gradient preserving at the boundary. Moving clamp also changed the graph tested in test_jit.py, which I think is expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23794

Test Plan: Existing tests provided sufficient coverage.

Differential Revision: D16902986

Pulled By: mruberry

fbshipit-source-id: 478f2a59d9a5b0487fc523fd594cb775cb617525
2019-08-19 15:49:33 -07:00
ef14d88f27 Make torch.jit.Attribute work with PYTORCH_ENABLED=0
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23851

Test Plan: Imported from OSS

Differential Revision: D16840394

Pulled By: suo

fbshipit-source-id: b72e081513de73f565f3aeaa61125b7d3aa9c3e7
2019-08-19 15:23:21 -07:00
6100205eb8 TensorIterator::binary_op input-output overlap check (#24058)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8212

This fix is based on the idea that in-place ops(e.g. add_(...)) and out ops(e.g. tensor.add(..., out=...)) must check that the output tensor does not partially overlap with any of it's input tensors. Otherwise the result of such op is unexpected to the user. Since TensorIterator is a common backend for such ops and it's already used to check output self-overlapping, this fix is implemented in the same place.

MemOverlapStatus enum class is introduced to model two tensors overlapped state:

- TOO_HARD if at least one of them is not contiguous
- FULL if both are contiguous and share exactly the same memory array [data(), data() + numel() *itemsize()]
- PARTIAL is both are contiguous but underlying memory is shared partially, in other words memory arrays overlap but not identical.
- NO if both are contiguous but have independent non overlapping memory arrays

Performance test of clone/addcmul_/addcdiv_ with check_mem_overlaps:

a = torch.empty(10000000, device='cpu')
b = torch.randn(10000000, device='cpu')
timeit a.copy_(b)
master: 10.3 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
branch: 10.2 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

a = torch.empty(10000000, device='cuda')
b = torch.randn(10000000, device='cuda')
timeit a.copy_(b)
master: 373 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
branch: 373 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cpu')
b = torch.randn(1000000, device='cpu')
c = torch.randn(1000000, device='cpu')
timeit a.addcmul_(b, c)
master: 2.02 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
branch: 2.11 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cuda')
b = torch.randn(1000000, device='cuda')
c = torch.randn(1000000, device='cuda')
timeit a.addcmul_(b, c)
master: 72.6 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	72.4 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(1000000, device='cpu')
b = torch.randn(1000000, device='cpu')
c = torch.randn(1000000, device='cpu')
timeit a.addcdiv_(b, c)
master: 2.19 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 1000 loop each)
branch:	1.97 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = torch.randn(1000000, device='cuda')
b = torch.randn(1000000, device='cuda')
c = torch.randn(1000000, device='cuda')
timeit a.addcdiv_(b, c)
master: 71.3 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	71.7 µs ± 3.96 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.empty(100, device='cpu')
b = torch.randn(100, device='cpu')
timeit a.copy_(b)
master: 12.1 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
branch:	11.1 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

a = torch.empty(100, device='cuda')
b = torch.randn(100, device='cuda')
timeit a.copy_(b)
master: 20.9 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	22.8 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cpu')
b = torch.randn(100, device='cpu')
c = torch.randn(100, device='cpu')
timeit a.addcmul_(b, c)
master: 24.1 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	24 µs ± 91.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cuda')
b = torch.randn(100, device='cuda')
c = torch.randn(100, device='cuda')
timeit a.addcmul_(b, c)
master: 34.5 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	29.8 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cpu')
b = torch.randn(100, device='cpu')
c = torch.randn(100, device='cpu')
timeit a.addcdiv_(b, c)
master: 21.3 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	23.8 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = torch.randn(100, device='cuda')
b = torch.randn(100, device='cuda')
c = torch.randn(100, device='cuda')
timeit a.addcdiv_(b, c)
master: 30.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
branch:	31.8 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24058

Differential Revision: D16767892

Pulled By: pbelevich

fbshipit-source-id: 0cdaaa471d003a2886b1736f8985842226b8493a
2019-08-19 15:06:04 -07:00
4358cbe01b Allow torch.tril / triu to handle bool and half inputs (#24163)
Summary:
Changelog:
- Enable torch.tril / triu for bool and float16 dtypes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24163

Test Plan:
- Tests added in test_torch.py for all devices and dtypes (except bfloat16)

Fixes https://github.com/pytorch/pytorch/issues/24035

Differential Revision: D16793315

Pulled By: ezyang

fbshipit-source-id: 2bbc51ce567405a7cb2d8ab567eee6c2e40aa76a
2019-08-19 15:02:53 -07:00
f849ebf1fe Enable torch.eye for bool and half (#24148)
Summary:
Changelog:
- Enable torch.eye for bool and float16 dtypes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24148

Test Plan:
- Tests added in test_torch.py for all available devices and dtypes (except torch.bfloat16)

Fixes https://github.com/pytorch/pytorch/issues/24088

Differential Revision: D16891048

Pulled By: ezyang

fbshipit-source-id: 3e86fe271bd434300c396e63f82c1a1f3adac2b4
2019-08-19 14:59:37 -07:00
6cf14361f4 Add the default_weight_observer for the dynamic quantization path (#24231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24231

As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r309528932, we will add a default weight observer for the dynamic quantization path.

We need to move `observer` and `qconfig` to a separate namespace.
ghstack-source-id: 88583658

Differential Revision: D16781092

fbshipit-source-id: 5cd59c881a7f98b82704ca318b1e63650d73062a
2019-08-19 14:54:22 -07:00
d7c6debc14 Remove gradient value as input from SparseNormalize op (#24357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24357

SparseNormalize does not need to know the gradient value to the lookup table, only the indices of the embeddings that need to be updated. By removing this input, we allow SparseNormalize to be used alongside SparseAdagradFusion

Differential Revision: D16809919

fbshipit-source-id: cc19692ba4dea8854663ae1ed8cf9365e90c99bc
2019-08-19 14:47:09 -07:00
9ebdf01962 For int64_t atomicAdd, use the available compiler builtin on ROCm. (#24854)
Summary:
Do not use the explicit CAS loop. This will perform better if there is
any contention. Since this feature is ROCm-only, the HIP layer provides no
helper function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24854

Differential Revision: D16902292

Pulled By: ezyang

fbshipit-source-id: df192063c749f2b39f8fc304888fb0ae1070f20e
2019-08-19 14:30:03 -07:00
927fb56ee0 Allow SyncBatchNorm without DDP in inference mode (#24815)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22538
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24815

Test Plan:
Can run a detectron2 evaluation without entering DDP.

#sandcastle

Differential Revision: D16883694

Pulled By: ppwwyyxx

fbshipit-source-id: 3195bc4e7f43a994821069f229b26302e2988739
2019-08-19 13:43:42 -07:00
a04f729b51 Fix VaryingShape::merge
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24455

Test Plan: Imported from OSS

Differential Revision: D16853270

Pulled By: zdevito

fbshipit-source-id: 328aab6873fbff64aa9a4c1d5917d302f6b45397
2019-08-19 12:31:16 -07:00
60518e0035 Add resnext 32x4d shapes to benchmark (#24503)
Summary:
Adds resnext-1011 32x4d shapes to the qconv benchmarks. (Also ran the code formatter)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24503

Test Plan:
Run tests on devserver:

```buck run mode/opt
caffe2/benchmarks/operator_benchmark/pt:qconv_test -- --omp_num_threads 1
--mkl_num_threads 1```

Reviewed By: dskhudia

Differential Revision: D16845746

Pulled By: rohan-varma

fbshipit-source-id: d9f842e5f455fccecf547129c5faffa253a49e23
2019-08-19 12:04:48 -07:00
a6a13e36f5 Change kernel_size to self.kernel_size to resolve error in quantized conv module (#24499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24499

We are attempting to subscript a variable `kernel_size` that may not be
an iterable in the `conv.py` module currently. This leads to errors unless the
user passes in an iterable for kernel_size. D16830855 changed
`self.kernel_size` to be a pair type, but did not actually use the variable.
We want to use `self.kernel_size` which is a pair even if the user passed in an int for `kernel_size` so that we stop having the subscripting error.

Differential Revision: D16859809

fbshipit-source-id: cd2a5497e89d88e518ca7b8a97bf9e69803ee2ba
2019-08-19 11:59:44 -07:00
5aa0f89d65 Build libtorch binary with new ABI (#23908)
Summary:
This PR enables building libtorch with new ABI, using gcc 5.4 on Ubuntu 16.04.

Accompanying PR: https://github.com/pytorch/builder/pull/335.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23908

Differential Revision: D16898009

Pulled By: yf225

fbshipit-source-id: 516b444c1fc94c7b05d3be84ef81ef23e9041bfc
2019-08-19 11:23:05 -07:00
b6803d62fd Use snake names for all files in distributed.rpc (#24502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24502

Files in distributed.rpc package mixes snake camel names. This
commit cleans that up and all files use snake names now.
ghstack-source-id: 88548990

Reviewed By: xush6528

Differential Revision: D16860155

fbshipit-source-id: 3a22a89bf6c4e11aac5849564fc53296a04d6a8b
2019-08-19 10:58:59 -07:00
3b22bbeb5b enable "keeps" from BoxWithNMSLimit and caffe2_fastrcnn_outputs_inference
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24451

Reviewed By: newstzpz

Differential Revision: D16850259

fbshipit-source-id: 22f69d71a558d63c32a27d271a7557fc35a55176
2019-08-19 10:54:22 -07:00
c6617b370b Cache node operators to speed up optimization (#24827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24827

We already cache the node's schema, but alias analysis wants operators.
This ends up being almost 70% of the on-cpu time optimizing a large
graph.

Here's some results on a [sample model](https://gist.github.com/suo/63ab9638516002176f94553a37060f61)
(the units are seconds).

Before:
```
compiled in:  20.256319999694824
first run in:  313.77824568748474
```

After:
```
compiled in:  18.8815860748291
first run in:  42.58739233016968
```

More than a 7x speedup! Still slower than I'd like though so I'll keep
digging.

Test Plan: Imported from OSS

Differential Revision: D16887540

Pulled By: suo

fbshipit-source-id: 2449be2898889d00ac094c3896e37b0e6a8c5f08
2019-08-19 10:30:23 -07:00
c0a796d95d Update docs for softmax in onnx supported operators (#24832)
Summary:
Update the softmax in onnx supported operators from `softmax (only dim=-1 supported)` to `softmax`, as all cases of dim options are supported in:
[https://github.com/pytorch/pytorch/issues/18482](https://github.com/pytorch/pytorch/pull/18482): ONNX Export All Cases of Softmax
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24832

Differential Revision: D16896538

Pulled By: bddppq

fbshipit-source-id: 284039ffa42f09b0043e95cfe9f17e1afde53814
2019-08-19 10:13:41 -07:00
cd622f7655 C++ ModuleList
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24317

Differential Revision: D16893634

Pulled By: yf225

fbshipit-source-id: 9d810ad5a41bd46e5d0fa766e851178c60226866
2019-08-19 10:02:40 -07:00
87217cfd2a Add align_corners option to grid_sample and affine_grid, change default to False (#23923)
Summary:
Resolves: https://github.com/pytorch/pytorch/issues/20785

Adds the `align_corners` option to `grid_sample` and `affine_grid`, paralleling the option that was added to `interpolate` in version 0.4.0.

In short, setting `align_corners` to `False` allows these functions to be resolution agnostic.
This ensures, for example, that a grid generated from a neural net trained to warp 1024x1024 images will also work to warp the same image upsampled/downsampled to other resolutions like 512x512 or 2048x2048 without producing scaling/stretching artifacts.

Refer to the documentation and https://github.com/pytorch/pytorch/issues/20785 for more details.

**Important**: BC-Breaking Change because of new default
The old functionality can still be achieved by setting `align_corners=True`, but the default is now set to `align_corners=False`, since this is the more correct setting, and since this matches the default setting of `interpolate`.

The vectorized 2D cpu version of `grid_sampler` is refactored a bit. I don’t suspect that this refactor would affect the runtime much, since it is mostly done in inlined functions, but I may be wrong, and this has to be verified by profiling.

~The tests are not yet updated to reflect the new default. New tests should probably also be added to test both settings of `align_corners`.~ _Tests are now updated._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23923

Differential Revision: D16887357

Pulled By: ailzhang

fbshipit-source-id: ea09aad7853ef16536e719a898db8ba31595daa5
2019-08-19 09:45:44 -07:00
9e7083d0a9 Remove unused files from THNN and THCUNN (#24820)
Summary:
Spin-off from https://github.com/pytorch/pytorch/issues/24818.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24820

Differential Revision: D16890917

Pulled By: ezyang

fbshipit-source-id: 88df6d3ba98600acc810eda406daa1d850ed3320
2019-08-19 07:52:08 -07:00
92c63d90e8 Remove support for old architectures in cpp_extension and CMake (#24442)
Summary:
This is a follow-up to gh-23408.  No longer supported are any arches < 3.5 (numbers + 'Fermi' and 'Kepler+Tegra').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24442

Differential Revision: D16889283

Pulled By: ezyang

fbshipit-source-id: 3c0c35d51b7ac7642d1be7ab4b0f260ac93b60c9
2019-08-19 06:23:33 -07:00
dfdb86a595 big cpp test reorg (#24801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24801

This is to fix the ODR-violations in fbcode static builds, which have been broken for several months.

This PR is unfortunately quite large, but the changes are only mechanical:
1. Tests defined in header files -> tests defined in cpp files
2. Remove the `torch::jit::testing` namespace -> `torch::jit`.
3. Single `test.h` file that aggregates all tests.
4. Separate out files for gtest and python versions of the tests instead of using a build flag
5. Add a readme for how to add a new test, and explaining a bit about why the cpp tests are the way they are.

Test Plan: Imported from OSS

Differential Revision: D16878605

Pulled By: suo

fbshipit-source-id: 27b5c077dadd990a5f74e25d01731f9c1f491603
2019-08-18 16:49:56 -07:00
85564c1456 Record function name as an attribute of CallFunction nodes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24446

Test Plan: Imported from OSS

Differential Revision: D16845758

Pulled By: ZolotukhinM

fbshipit-source-id: fc1536d597eb6ac4076c04de56f93899b52d6cda
2019-08-18 15:36:30 -07:00
9228dd766a Modify symmetric eigendecomposition derivative (#23018)
Summary:
The derivative of the symmetric eigendecomposition was previously a triangular matrix.

Changelog:
- Modify the derivative of symeig from a triangular matrix to a symmetric matrix with reason specified as a comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23018

Test Plan: - Existing gradcheck and gradgradchecks are ported to test_autograd to verify that the change is correct. Input to symeig is symmetrized before passing

Differential Revision: D16859070

Pulled By: ezyang

fbshipit-source-id: 2d075abdf690909f80781764cfaf938b581d0ef6
2019-08-17 12:57:00 -07:00
5a032f02ed Added .pyi file for flatten (#24459)
Summary:
Generated with `stubgen` and moved from `out/flatten.pyi` to `flatten.pyi.in`.

https://github.com/pytorch/pytorch/pull/22245#issuecomment-521875658
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24459

Differential Revision: D16881182

Pulled By: ezyang

fbshipit-source-id: 5e25fad55f169b5a58ab7522b583d7c923314d4d
2019-08-17 12:19:12 -07:00
0ce7264ed6 Don't require slow test reporting in run_tests.py --pytest (#24448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24448

The setting `--durations=10` was hard-coded, which is annoying as I
don't necessarily care. A good alternative to get the same behavior is:

```
python run_tests.py --pytest -- --durations=10
```

Test Plan: Imported from OSS

Differential Revision: D16876380

Pulled By: suo

fbshipit-source-id: 1e14d366db45b6b9bf4a4ab1633b0f6ece29f6bc
2019-08-17 01:26:07 -07:00
a0b13b4fa5 extra_repr for quantized modules (#24443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24443

This gives us useful information about the Module when we print it, like so:

```
FloatModule(
  (quant): Quantize()
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1), scale=0.08209919929504395, zero_point=128)
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1), scale=0.16885940730571747, zero_point=128)
  (fc1): Linear(in_features=800, out_features=500, bias=True, scale=0.12840059399604797, zero_point=128)
  (fc2): Linear(in_features=500, out_features=10, bias=True, scale=0.260015606880188, zero_point=128)
  (dequant): DeQuantize()
)
```

Test Plan: Imported from OSS

Differential Revision: D16847140

Pulled By: jamesr66a

fbshipit-source-id: 8c995108f17ed1b086d1fb30471a41c532c68080
2019-08-16 22:38:45 -07:00
99dea08e60 Use c10::ThreadPool to send and receive messages (#23968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23968

Existing ProcessGroupAgent uses a single thread to send all messages, and
a single thread to listen and process all received messages. This causes
both performance issues and also prevents nested RPCs. For example, when
running nested RPC A->B->A->B, the second recv on B cannot start until
the first recv on B finishes. If the second recv is triggered by a nested
RPC in the first recv, it will deadlock. Ideally, we should expose sth like
responder or FutureResult to the Python land to support nested asynchronous
UDFs.

This diff adds a shared ThreadPool for send and recv. Send use it do send
out messages, and recv use it to process received messages. There is still
a dedicated thread to listen for incoming messages and add it to task queue.
There are two goals: 1) speed up ProcessGroupAgent 2) use ThreadPool as a
temporary solution for (a small number of) nested RPCs

ghstack-source-id: 88476246

Differential Revision: D16695091

fbshipit-source-id: fd18a5c65e7fcd1331b73d1287673e6e10d2dd86
2019-08-16 17:49:05 -07:00
dd97743de7 Enables inplace in the quantized relu (#24374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24374

This is a duplicate to bring back #23704 with diff revision D16634539

Test Plan: Imported from OSS

Differential Revision: D16818664

Pulled By: zafartahirov

fbshipit-source-id: c8f7965356555a6a995eaeea6820ea62cbbea6fd
2019-08-16 16:53:09 -07:00
aed306dcf7 Add @ignore for script classes (#23614)
Summary:
This lets you mark a class so that it won't be recursively compiled.

This also runs up against a weird thing on the UX side, that to ignore a
module you have to `ignore` its `forward()` method but to ignore a
class you use `ignore` on the class declaration. The `ignore` on the
class declaration matches the use of `script` for script classes but is
confusing to those that don't know the difference between script classes
/ modules.
](https://our.intern.facebook.com/intern/diff/16770068/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23614

Pulled By: driazati

Differential Revision: D16770068

fbshipit-source-id: bee9a9e88b6c798ce779f622c4f929adae4eaf45
2019-08-16 16:34:22 -07:00
10c456417c Clear recursive error stack on each compilation (#23458)
Summary:
Previously we weren't clearing the stack, so any failures that didn't
stop the program stayed around in the stack and would show up if
something else accessed the stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23458

Pulled By: driazati

Differential Revision: D16866719

fbshipit-source-id: 29739b11f79de91c6468129da1bdcbf3c53b42d9
2019-08-16 16:10:19 -07:00
eee3e92936 Enabled torch.mm and torch.mv for bfloat16
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24224

Test Plan: Imported from OSS

Differential Revision: D16779996

Pulled By: izdeby

fbshipit-source-id: c859d8945a564edfa3f8a1430f140ae30d484d19
2019-08-16 15:46:15 -07:00
cf57f73c11 Module: add dump function that recursively prints contents of the module. (#24356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24356

Test Plan: Imported from OSS

Differential Revision: D16864133

Pulled By: ZolotukhinM

fbshipit-source-id: 1af757334bc8e156427783bc37500de3c934378b
2019-08-16 15:13:02 -07:00
9b73c77390 jit_log: Extract a function that prefixes all lines of a string with another string. (#24355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24355

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24355

Test Plan: Imported from OSS

Differential Revision: D16864134

Pulled By: ZolotukhinM

fbshipit-source-id: 8b456858d8ee07fd4ca3fb1759237756df897cd9
2019-08-16 15:12:58 -07:00
76716f6c06 Respect pre-defined DOCKER_IMAGE value in binary_populate_env.sh (#24787)
Summary:
`binary_populate_env.sh` is used by `binary_linux_test`, and for libtorch with new ABI we need to run the tests on a docker image different from `soumith/manylinux-cudaXX`. In such cases, we should respect the actual DOCKER_IMAGE value defined in the CircleCI job description.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24787

Differential Revision: D16867976

Pulled By: yf225

fbshipit-source-id: dc0a68bffc5789249ae14491ef485c7cc2fc1c34
2019-08-16 14:23:45 -07:00
2e44630d35 fix double copying of constants (#24412)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/24369
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24412

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D16843311

Pulled By: eellison

fbshipit-source-id: b25552c49b963c031c98749bcda31f65cd82f19d
2019-08-16 13:29:22 -07:00
af908d57ea Increasing precision for avg pool (#23906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23906

The 1-off error is expected for the average pool due to double rounding.
Increasing unittest precision tolerance to 1.0 to avoid failing.

Test Plan: Imported from OSS

Differential Revision: D16678044

Pulled By: zafartahirov

fbshipit-source-id: 4e73934e4379b1d108af649ec77053998e44c560
2019-08-16 13:07:41 -07:00
mal
6b656565ab Hooks for C++ API (#24393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24393

Ability to register hook on a variable, similar to python autograd API. register_hook will take a function as argument and create a CppFunctionPreHook similar to PyFunctionPreHook.
It will return the index of the hook which can be passed to remove_hook to disable the hook.

Test Plan: Added tests.

Differential Revision: D16861722

fbshipit-source-id: d08047f932e38c7bde04283a18b2d0311c8ad604
2019-08-16 12:44:20 -07:00
a3b8607811 Fix test_jit_cuda_archflags failure on py27 due to changing dict order. (#24501)
Summary:
See gh-23408.

Was failing for `pytorch_linux_xenial_cuda9_cudnn7_py2_test`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24501

Differential Revision: D16860932

Pulled By: soumith

fbshipit-source-id: 715858d905f74a23e42a9a1da97f036a3e30f0c9
2019-08-16 12:44:16 -07:00
562c5cd73b Adds a placeholder for the 'mul' operator.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24421

Test Plan: Imported from OSS

Differential Revision: D16833438

Pulled By: zafartahirov

fbshipit-source-id: 51b20ac060ad657b3f12e4f1cf47369414b342b6
2019-08-16 11:32:51 -07:00
50161f3b3c Add ONNX Export Support to empty and empty_like (#24166)
Summary:
Empty and empty_like return uninitialized tensors with specific sizes.
The values in the tensor cannot be predicted, that's why tests in test_pytorch_onnx_onnxruntime.py and test_pytorch_onnx_caffe2.py are not added.
The tests in test_operators.py verify the onnx graph and output shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24166

Differential Revision: D16831571

Pulled By: bddppq

fbshipit-source-id: b2500f36ced4735da9a8418d87a39e145b74f63a
2019-08-16 10:40:18 -07:00
1df57c943f pickler read guard (#24433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24433

bounds checker was only used once per instruction. If a read in the
middle of an instruction went of the end of the stream, it would just
read invalid memory. This replaces bounds checker with just one
guarded read function.

Test Plan: Imported from OSS

Differential Revision: D16836178

Pulled By: zdevito

fbshipit-source-id: a7f70d0f293bf26c3220a12bafb8a06678931016
2019-08-16 10:19:13 -07:00
ee898bffc3 fix IR parsing bug
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24294

Test Plan: Imported from OSS

Differential Revision: D16797690

Pulled By: zdevito

fbshipit-source-id: f89664dc7da3547c316aa5875bf67bef672430c2
2019-08-16 10:10:42 -07:00
d7b86d0c11 added test_tensorboard.py to TARGETS (#24040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24040

This diff fixes failed test in test_tensorboard.py:
  - fixed test_image_with_boxes: tests compares serialized protobuf Summary object with image against expected serialized protobuf in file. Turns out that comparing images string by string might not work (e.g. if they were serialized with different versions of image library) - images can be equal, though due to differences in metadata or compression methods actual strings might differ. Changed to compare images using == from PIL.Image

Reviewed By: orionr

Differential Revision: D16715831

fbshipit-source-id: 7dd4a7cfc8e63767ed727656f1891edd273d95da
2019-08-16 09:44:00 -07:00
c676db230d Revert D16834297: Move the search of cuDNN files to FindCUDNN.cmake.
Differential Revision:
D16834297

Original commit changeset: ec2c0ba0c659

fbshipit-source-id: 028a727f4baaaf4439c7ca17c999bba7ea6d419f
2019-08-16 08:30:21 -07:00
e166811598 Documentation for Tensor.record_stream() (#24078)
Summary:
This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729.

The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078

Differential Revision: D16743526

Pulled By: zou3519

fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3
2019-08-16 08:07:33 -07:00
cef0443464 Ensure proper file executable permissions in CI. (#24214)
Summary:
Some files have inproper executable permissions (which git tracks). This
commit adds a test in CI to ensure that executable permissions are off
for files that shouldn't have such a permission. This also ensures fixes
such as https://github.com/pytorch/pytorch/issues/21305 are complied in the future.

 ---

Disclaimer: I'm the author of flake8-executable, and I've been using it
on my end for over a month and thus I think it should be stable enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24214

Differential Revision: D16783437

Pulled By: ezyang

fbshipit-source-id: 018e55798f1411983c65444e6304a25c5763cd19
2019-08-16 06:11:09 -07:00
482607c16c Move the search of cuDNN files to FindCUDNN.cmake.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24293

Test Plan: Imported from OSS

Differential Revision: D16834297

Pulled By: ezyang

fbshipit-source-id: ec2c0ba0c659d82fffd40d52ae723934377aa49c
2019-08-16 06:07:25 -07:00
e78dad3593 Add BPR loss to TTSN (#24439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24439

many literatures mentioned BPR is useful for improving recommendation quality. Add a BPR loss so that we can train TTSN with it. Would like to see if it can improve retrieval models.

reference: https://arxiv.org/pdf/1205.2618.pdf

Reviewed By: dragonxlwang

Differential Revision: D16812513

fbshipit-source-id: 74488c714a37ccd10e0666d225751a845019eb94
2019-08-15 23:20:15 -07:00
5c57cedc16 change the location of wipe cache (#24454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24454

We want to change the place of wipe_cache. From what we observed, the original place does not help.

Reviewed By: mingzhe09088

Differential Revision: D16853205

fbshipit-source-id: 1f6224a52433cbe15c0d27000b4ac140fb9cd4c3
2019-08-15 20:55:47 -07:00
0b3a63b048 skip broken test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24453

Test Plan: Imported from OSS

Differential Revision: D16852563

Pulled By: suo

fbshipit-source-id: 2338574d53cb7ae7e0e922f0efc7ae99477b021c
2019-08-15 19:55:25 -07:00
64974ae71e Fix naming convention inconsistency and formats in test_rpc.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24407

Reviewed By: xush6528

Differential Revision: D16830605

Pulled By: mrshenli

fbshipit-source-id: 795962e56a8433f8015f44b6ed4b6183488b00d6
2019-08-15 19:43:15 -07:00
a1b111709d Assert weight_observer has the correct dtype
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24436

Test Plan: Imported from OSS

Differential Revision: D16847141

Pulled By: jamesr66a

fbshipit-source-id: 1dde5c26449115b53e71d410b41204d743787c44
2019-08-15 19:40:14 -07:00
354ecc42bc Exposing the API for use with pytorch/tvm repo. (#24430)
Summary:
Exposing the API for use with pytorch/tvm repo. PR: https://github.com/pytorch/tvm/pull/86
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24430

Test Plan: Just exposing API.

Differential Revision: D16834888

Pulled By: kimishpatel

fbshipit-source-id: 29955e75908e68988a46b7e9c37e6eb6aea1b20f
2019-08-15 17:59:55 -07:00
1a74bd407d Fixes the adding of the observer to the FloatFunctional (#24418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24418

Fixes #24394

The observer is not added correctlty, because one of the conditions is not met.

Test Plan: Imported from OSS

Differential Revision: D16833951

Pulled By: zafartahirov

fbshipit-source-id: bb4699e6a1cf6368c7278272a68e5e7c6d3f59a8
2019-08-15 17:27:00 -07:00
49efbdce88 Convert bias to float in quantized conv module (#24424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24424

att

Differential Revision: D16839540

fbshipit-source-id: 1cc8b128a6403dd19b4cd405fae49e10b5cd44e1
2019-08-15 15:56:08 -07:00
696cabae9b Baseline observer module, ensuring that (min,max) range includes zero. (#24297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24297

ghstack-source-id: 88252409

Differential Revision: D16635637

fbshipit-source-id: fcef20b9c88b2c3bd97e311514e5b2d0339ff28a
2019-08-15 15:25:23 -07:00
f03700b997 Fix QConfig_dynamic typename (#24431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24431

Pickle's fully-qualified name lookup would fail when trying to serialize QConfig_dynamic since the __name__ on the instance would refer to the wrong class name

Test Plan: Imported from OSS

Differential Revision: D16835705

Pulled By: jamesr66a

fbshipit-source-id: e146835cbe10b08923d77298bc93b0f5b0ba37c5
2019-08-15 15:25:19 -07:00
cd20773701 Set CUDA arch correctly when building with torch.utils.cpp_extension (#23408)
Summary:
The old behavior was to always use `sm_30`. The new behavior is:

- For building via a setup.py, check if `'arch'` is in `extra_compile_args`.  If so, don't change anything.
- If `TORCH_CUDA_ARCH_LIST` is set, respect that (can be 1 or more arches)
- Otherwise, query device capability and use that.

To test this, for example on a machine with `torch` installed for py37:
```
$ git clone https://github.com/pytorch/extension-cpp.git
$ cd extension-cpp/cuda
$ python setup.py install
$ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so

ELF file    1: lltm.1.sm_61.cubin
```

Existing tests in `test_cpp_extension.py` for `load_inline` and for compiling via `setup.py` in test/cpp_extensions/ cover this.

Closes gh-18657

EDIT: some more tests:

```
from torch.utils.cpp_extension import load

lltm = load(name='lltm', sources=['lltm_cuda.cpp', 'lltm_cuda_kernel.cu'])
```

```
# with TORCH_CUDA_ARCH_LIST undefined or an empty string
$ cuobjdump --list-elf /tmp/torch_extensions/lltm/lltm.so
ELF file    1: lltm.1.sm_61.cubin

# with TORCH_CUDA_ARCH_LIST = "3.5 5.2 6.0 6.1 7.0+PTX"
$ cuobjdump --list-elf build/lib.linux-x86_64-3.7/lltm_cuda.cpython-37m-x86_64-linux-gnu.so
ELF file    1: lltm_cuda.cpython-37m-x86_64-linux-gnu.1.sm_35.cubin
ELF file    2: lltm_cuda.cpython-37m-x86_64-linux-gnu.2.sm_52.cubin
ELF file    3: lltm_cuda.cpython-37m-x86_64-linux-gnu.3.sm_60.cubin
ELF file    4: lltm_cuda.cpython-37m-x86_64-linux-gnu.4.sm_61.cubin
ELF file    5: lltm_cuda.cpython-37m-x86_64-linux-gnu.5.sm_70.cubin
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23408

Differential Revision: D16784110

Pulled By: soumith

fbshipit-source-id: 69ba09e235e4f906b959fd20322c69303240ee7e
2019-08-15 15:25:15 -07:00
02dd9a4058 Fix CUDNN location related build issue on Antergos Linux (based on Arch) (#24300)
Summary:
The issue is that `python setup.py install` will fail right at the end
of the build, with:

```
  File "setup.py", line 380, in run
    report('-- Detected cuDNN at ' + CUDNN_LIBRARY + ', ' + CUDNN_INCLUDE_DIR)
TypeError: must be str, not NoneType
```

This is due to `USE_CUDNN` being True, but CUDNN library and include dir
not being auto-detected.  On this distro, the CUDA install goes into
`/opt/cuda/` while CUDNN goes into `/usr/lib`.

```
$ locate libcudnn.so
...
/usr/lib/libcudnn.so
/usr/lib/libcudnn.so.7
/usr/lib/libcudnn.so.7.6.1

$ locate libcublas.so  # targets/... symlinked from /opt/cuda/lib64
...
/opt/cuda/targets/x86_64-linux/lib/libcublas.so
```

One could work around this by setting `CUDNN_LIB_DIR`, but that's
annoying and you only find out after running into this.

The path is added after `CUDA_HOME`, so should not be a problem on
systems which have multiple CUDA installs and select one via `CUDA_HOME`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24300

Differential Revision: D16839323

Pulled By: soumith

fbshipit-source-id: 5285fff604584ccfbe6368c5ee5a066f8fc10802
2019-08-15 15:22:49 -07:00
b10a3e916f Remove redundant assignment (#24408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24408

As Title says.
ghstack-source-id: 88388745

Differential Revision: D16830709

fbshipit-source-id: 87eafcd3236abcec94cf87009fc705ad26d87eca
2019-08-15 13:38:33 -07:00
498276631b Remove type subclassing (#24257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24257

Type subclassing was used to support our old hierarchy of
Tensor types. Now that we have one tensor type it is not needed.
This removes:

* isSubclass, since it is now always false.
* type slicing, which was only needed for subclasses.
* AutogradZeroTensor, which is folded into ProfiledTensorType

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D16794035

Pulled By: zdevito

fbshipit-source-id: 9a3e6101df0d51029a5e667a9c9137d2ae119aa7
2019-08-15 13:31:37 -07:00
0cbd7fa46f remove CompleteTensorType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24169

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D16765329

Pulled By: zdevito

fbshipit-source-id: 88560cefba635c3d586a3e4dee67f9b1d901a642
2019-08-15 13:31:34 -07:00
5ca612b55e Let logical_xor support non-bool tensors.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23978

Test Plan: Imported from OSS

Differential Revision: D16719299

Pulled By: gchanan

fbshipit-source-id: 2fe170be6090733e20410db7cf99266543299c58
2019-08-15 12:21:31 -07:00
00e4870001 Let logical_not support non-bool tensors. (#23916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916

Test Plan: Imported from OSS

Differential Revision: D16719300

Pulled By: gchanan

fbshipit-source-id: 5be6aeea9a38cc40ad59d0449d25a25f7dfa2787
2019-08-15 12:21:27 -07:00
6f08be46b0 Implement gradient operator for GatherByKeys. (#24348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24348

Partition + GatherByKeys pair is pretty handy for implementing strategy where
part of the keys will be on local machine, while part of the keys will end up
on the remote machin (for cases when there is exactly 1 id).

Reviewed By: aazzolini

Differential Revision: D16802988

fbshipit-source-id: 4c7ac97fc0db3ce88575fccab0c7bf69dcbef965
2019-08-15 12:19:22 -07:00
b0e794e6e9 Configure pytorch-probot (#24423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24423

This enables auto-CC'ing based on labels, see
https://github.com/pytorch/pytorch/issues/24422

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16833974

Pulled By: ezyang

fbshipit-source-id: de07ea5f0ade9d5ed2160ce8308cf146321bb354
2019-08-15 12:09:01 -07:00
74ea28322d Replacing axis with dim in quantized cat
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24151

Test Plan: Imported from OSS

Differential Revision: D16754347

Pulled By: zafartahirov

fbshipit-source-id: c2ebab2f25e0423f16d4f329f98b2e9e138ed549
2019-08-15 12:08:57 -07:00
b53ff49c1e Fix Caffe2 Windows build by switching to ninja. (#24330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24330

In principle, we should be able to use the MSVC generator
to do a Windows build, but with the latest build of our
Windows AMI, this is no longer possible.  An in-depth
investigation about why this is no longer working should
occur in https://github.com/pytorch/pytorch/issues/24386

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24330

Test Plan: Imported from OSS

Differential Revision: D16828794

Pulled By: ezyang

fbshipit-source-id: fa826a8a6692d3b8d5252fce52fe823eb58169bf
2019-08-15 12:06:13 -07:00
83bfd76b2f Relax precision constraint on ONNXRuntime._gru_test (#24340)
Summary:
https://github.com/pytorch/pytorch/issues/24268
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24340

Differential Revision: D16833477

Pulled By: bddppq

fbshipit-source-id: d256d6bdd950c38ecc835af848222f03cfc6130c
2019-08-15 11:55:04 -07:00
32ed676b46 Make aten_to_numpy_dtype in tensor_numpy.h public. (#23943)
Summary:
The corresponding numpy_dtype_to_aten is public already so this
should be fine. Tests still pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23943

Differential Revision: D16690742

Pulled By: soumith

fbshipit-source-id: 81431a3316509cff8a9122e10e8f6a362bbcc9c0
2019-08-15 11:52:46 -07:00
3574d9ff70 updated pixel_shuffle in opset 11 to use depthToSpace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23739

Differential Revision: D16800355

Pulled By: bddppq

fbshipit-source-id: 1502c5b7ec1495286bad17b6ffa359cf995f78fb
2019-08-15 11:37:44 -07:00
b59fa077b3 Misc doc updates / fixes (#24371)
Summary:
This is a bunch of changes to the docs for stylistic changes,
correctness, and updates to the new script API / recent TorchScript
changes (i.e. namedtuple)

For reviewers, ping me to see a link of the rendered output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24371

Pulled By: driazati

Differential Revision: D16832417

fbshipit-source-id: a28e748cf1b590964ca0ae2dfb5d8259c766a203
2019-08-15 11:31:24 -07:00
5df773415b Add _pair for quantized conv module (#24409)
Summary:
Add _pair for kernel_size
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24409

Differential Revision: D16830855

Pulled By: jerryzh168

fbshipit-source-id: 3d6cc49b8088dd522338ab0e13911d8627df63d7
2019-08-15 11:13:57 -07:00
c5e1e5c300 Put ParseBlackListOps() into caffe2::glow namespace (#24384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24384

So that we can use them in other functions.

Reviewed By: yinghai

Differential Revision: D16824289

fbshipit-source-id: 3cb33cfa9a5c479a63db6438aef518209bdfb1f4
2019-08-15 10:53:10 -07:00
754bf383b1 Change return type of observer to two tensors (#24339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24339

Att

Differential Revision: D16820813

fbshipit-source-id: 3e7301f1700176e19f46e8677a644ba167209254
2019-08-15 10:26:44 -07:00
53eba982bd kill TK_NAMED_TUPLE_DEF (#24350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24350

`TK_NAMED_TUPLE_DEF` shouldn't exist, because NamedTuples are not
distinct syntactic things. The fact that NamedTuples and Classes are
treated differently is a property of our implementation, not the
language grammar.

This PR kills it and re-uses `CLASS_DEF` instead.

Test Plan: Imported from OSS

Differential Revision: D16825273

Pulled By: suo

fbshipit-source-id: f6d97d7e4fbdf789fd777f514eac97f32e2bbae2
2019-08-15 10:15:52 -07:00
c6eddbb90f copy methods when creating a derived class type (#24349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24349

Methods that derive a new class type from the old one need to copy the
`method_` field as well as the attributes.

Test Plan: Imported from OSS

Differential Revision: D16825274

Pulled By: suo

fbshipit-source-id: 938334e0733d2a89f00ec46984cbd5beecb4c786
2019-08-15 10:15:48 -07:00
761ae8e9b6 Add intrinsic module mappings (#23753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23753

Add intrinsic(fused) module mappings in quantize.py to enable mapping fused modules
in both QAT and post PTQ

Differential Revision: D16820749

fbshipit-source-id: 07de76a4f09b44bde8b193c103eac02c22b875b6
2019-08-15 09:37:24 -07:00
52b4221bfa Enabled masked methods for bfloat16 (#24183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24183

-----------
Fix: Enabled masked select/scatter/fill  for BFloat16 on CPU
Test: via unit tests

Test Plan: Imported from OSS

Differential Revision: D16763461

Pulled By: izdeby

fbshipit-source-id: fe733635a2064e5a088a108ff77c2a1a1487a27c
2019-08-15 08:45:24 -07:00
bc92ce9e07 Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (#23860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860

Close #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860

Test Plan: Imported from OSS

Differential Revision: D16678299

Pulled By: gchanan

fbshipit-source-id: b08e77f6a41c3994240849985caaff7c559d3f83
2019-08-15 08:40:29 -07:00
338f9c860f Add logical_xor operator (#23847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847

Related to #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847

Test Plan: Imported from OSS

Differential Revision: D16678300

Pulled By: gchanan

fbshipit-source-id: 67020aca5830b6bec2f561105954e0a8c2ee37e0
2019-08-15 08:40:25 -07:00
1f4c73618c Add logical_not operator. (#23839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839

Close #23836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839

Test Plan: Imported from OSS

Differential Revision: D16678301

Pulled By: gchanan

fbshipit-source-id: 54e7b3f3b04c577e239b88493247e1c036266774
2019-08-15 08:40:21 -07:00
10d2ada17d Fix Z7_MSVC_OVERRIDE for C source files (#24389)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24145#issuecomment-521507234
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24389

Differential Revision: D16828222

Pulled By: ezyang

fbshipit-source-id: dcf652fbd8b8945c71993e9b99394e18ac542e6b
2019-08-15 06:52:42 -07:00
0745591855 Vectorize LowerCholeskyTransform (#24131)
Summary:
Removes older `torch.stack`-based logic in favor of `torch.diagonal()` and `torch.diag_embed()`.

I see 100x speedup in my application, where my batched matrix has shape `(800, 32 ,32)`.
```py
import torch
from torch.distributions import constraints, transform_to
x = torch.randn(800, 32, 32, requires_grad=True)

# Before this PR:
%%timeit
transform_to(constraints.lower_cholesky)(x).sum().backward()
# 579 ms ± 34.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# After this PR:
%%timeit
transform_to(constraints.lower_cholesky)(x).sum().backward()
# 4.5 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24131

Differential Revision: D16764035

Pulled By: ezyang

fbshipit-source-id: 170cdb0d924cdc94cd5ad3b75d1427404718d437
2019-08-15 06:46:19 -07:00
59094c409e Refactor and expose metadata of tum_history layer for online prediction
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24290

Reviewed By: xianjiec

Differential Revision: D16570968

fbshipit-source-id: f68d42f3a8e1a6c8d30e00c2dd7f7efc1fb35d7c
2019-08-15 00:27:11 -07:00
1b38a6f602 add wipe cache
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24390

Reviewed By: mingzhe09088

Differential Revision: D16808041

fbshipit-source-id: 1b19f47706e4e2f2e03356469315b55c6ff76d20
2019-08-14 23:48:52 -07:00
ab39a55331 python udf over rpc (#23569)
Summary:
This diff is to support python user defined function over rpc for https://github.com/pytorch/pytorch/issues/23110, work flow is like this:
1. pickle python udf
2. pass pickle to C++
3. C++ pass over rpc from client to server
4. server call runPythonUDF() python function to unpickle and run python udf and pickle the udf result using python embedder
6. pass back serialized result from server to client
7. client call loadPythonUDFResult() python function to unpickle result
7. return it to python

right now, put rpc_sync_builtin() and rpc_async_builtin() as temporary interfaces for builtin operator remote calls, they accept qualified name string, this interface can execute builtin operators in C++ land.

rpc_sync() and rpc_async() accept python callables only right now, it could be user define python functions or builtin operator python functions, the python functions will be executed in python land.

once we can resolve builtin operator python callables to qualified name string, we can merge rpc_sync_builtin() into rpc_sync() then
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23569

Test Plan: unit tests

Differential Revision: D16390764

Pulled By: zhaojuanmao

fbshipit-source-id: 2cf2c22a979646830b5581bd75eabf8b3cca564c
2019-08-14 23:13:33 -07:00
de58df4c6f JIT trace testing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23987

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D16744208

Pulled By: jamesr66a

fbshipit-source-id: 8e65898cc8edebcc46b862e3d33f85071d701a04
2019-08-14 22:11:32 -07:00
Jie
064d156511 (#23574)
Summary:
Assert that there's no multiple written-to to a single memory location, which
caused corrupted output.
Fixed batched matrix trlu logic, which relies on the previous copy behavior to
support tensors with stride 0 at leading dimension.

This fixes the issue proposed at: https://github.com/pytorch/pytorch/issues/23063
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23574

Differential Revision: D16600717

Pulled By: ezyang

fbshipit-source-id: e41e14f03eccf97398b64ba43647110beb1529e6
2019-08-14 21:12:07 -07:00
d9d5d9a913 Sanity fixes for bitwise_not (#24296)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24296

Differential Revision: D16809598

Pulled By: ezyang

fbshipit-source-id: 00718faf1ece06b6af0160763ac22d9cb10c2575
2019-08-14 21:07:26 -07:00
e2a6212912 Resolve unused variables in tests (#24075)
Summary:
Variables such as `device` and `sparse` in for loops should be used in tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24075

Differential Revision: D16763073

Pulled By: ezyang

fbshipit-source-id: 8735cbc8d9ed695db8489cfc949c895180a7b826
2019-08-14 21:02:52 -07:00
f66c90469b Fix Lint (#24381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24381

As pointed out in https://github.com/pytorch/pytorch/pull/24299#issuecomment-521471089, the previous PR broke the Lint.
ghstack-source-id: 88339887

Reviewed By: jamesr66a

Differential Revision: D16822443

fbshipit-source-id: 3aed5b9404b0f0fcf453c05b59189974243b0df2
2019-08-14 19:22:09 -07:00
806b24f168 Temporarily disable warnings in dynamic quantization ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24376

Test Plan: Imported from OSS

Differential Revision: D16819772

Pulled By: jamesr66a

fbshipit-source-id: da6514bc1b96c3860039538f4d851064bad78d61
2019-08-14 18:14:13 -07:00
7597741159 Run quantization tests first
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24366

Test Plan: Imported from OSS

Differential Revision: D16815295

Pulled By: jamesr66a

fbshipit-source-id: 01478ce2fcbe0620cd5cf9854121602e0663c057
2019-08-14 18:09:32 -07:00
6a48a5b65c Fix more warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24291

Test Plan: Imported from OSS

Differential Revision: D16795898

Pulled By: zdevito

fbshipit-source-id: cbd5f2dd4e3bbd361909ae13c243561899568ad0
2019-08-14 17:47:54 -07:00
a919fc3704 test {__init__,from_float} on nnq{,d}.Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24364

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D16812543

Pulled By: jamesr66a

fbshipit-source-id: be05a658fa4562f3fcf3548e30b1fe9a77d1151c
2019-08-14 17:42:23 -07:00
79710604cc fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24375

Test Plan: Imported from OSS

Differential Revision: D16819647

Pulled By: jamesr66a

fbshipit-source-id: 84eefe1ee27bd05ed9b8745d8011dddf6cb3ddbf
2019-08-14 17:37:39 -07:00
0f64043b49 Remove the activation observer for default_qconfig (#24299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24299

As suggested in https://github.com/pytorch/pytorch/pull/24232, we will remove the activation observer for dynamic quantization path.
ghstack-source-id: 88287094

Differential Revision: D16798590

fbshipit-source-id: 07a245d5584b5b15c6895d9b09deef4a0605073a
2019-08-14 17:21:50 -07:00
5b0de85868 Register FC/Conv DNNLowp separately for supporting both tensor type (#24361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361

Currently we only support Conv in kernel but have entrance for both type using one same class
It is time make change

Reviewed By: csummersea

Differential Revision: D16604713

fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb
2019-08-14 17:15:42 -07:00
0647a3f4c7 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: a85a5b8258ce777e5001b0973e173707c729b8e4
2019-08-14 16:39:07 -07:00
e8d2ddc2c4 Make the default qconfig_dict (#24232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24232

As suggested in https://github.com/pytorch/pytorch/pull/23128#discussion_r306650311, we will make the keys of default_qconfig_dict as `torch.nn.Linear`. That is, we will do the dynamic quantization on the `torch.nn.Linear` by default, if the user just specify `torch.quantize_dynamic(model)`.
ghstack-source-id: 88287089

Differential Revision: D16781191

fbshipit-source-id: 991a5e151a9ea32b879d6897cd9862855d747135
2019-08-14 15:12:55 -07:00
53fbfd8fe8 Fix the dimension mismatch issues when running the BERT model (#23330)
Summary:
We found the following dimension mismatch issues when running the BERT model with the dynamic quantization:
```
Traceback (most recent call last):
  File "bert.py", line 75, in <module>
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 709, in forward
    head_mask=head_mask)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 437, in forward
    layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 415, in forward
    attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 372, in forward
    self_outputs = self.self(input_tensor, attention_mask, head_mask)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 539, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 303, in forward
    query_layer = self.transpose_for_scores(mixed_query_layer)
  File "/home/jianyuhuang/anaconda3/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 296, in transpose_for_scores
    return x.permute(0, 2, 1, 3)
RuntimeError: number of dims don't match in permute
```

Before the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[1, 14, 12, 64]```;
After the quantization, the dimension of ```x``` in ```transpose_for_scores``` is ```[14, 12, 64]```.

There is a dimension mismatch on the output of the ```torch.ops.quantized.fbgemm_linear_dynamic``` operators. The first dimension is missing, which cause the issues with the abvove permute.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23330
ghstack-source-id: 88287092

Differential Revision: D16463334

fbshipit-source-id: 4bdb836d1df31ba7c0bd44e3339aabdc8b943ae1
2019-08-14 14:20:50 -07:00
40be39e4c7 Fix perf bug with indexed assignment (index_put_) (#24083)
Summary:
TensorIterator was incorrectly moving the stride 0 dimension to the
inner-most dim in the assignment:

  a[idx] = b

Note that the corresponding read was still fast:

  c = a[idx]

This was noticed by adamlerer

```
import torch
import time
import sys
N = 300000

torch.set_num_threads(1)
a = torch.zeros(N, 128)
b = torch.zeros(N, 128)
idx = torch.arange(N)

%timeit c = a[idx]  # before and after: ~91.3 ms
%timeit a[idx] = b  # before: 4.38 sec after: 44.1 ms
```

Note that the indexed read is slower than the indexed assignment on
my computer because the read has to allocate a new output (which is
zero'ed by the kernel). The indexed assignment doesn't allocate any new
Tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24083

Differential Revision: D16805440

Pulled By: colesbury

fbshipit-source-id: 70a2e74ae79691afbfa9f75b3d7d1e6806f603f5
2019-08-14 14:14:43 -07:00
9fe4052b6c Add trace_module to docs (#24258)
Summary:
Stacked PRs
 * **#24258 - [jit] Add `trace_module` to docs**
 * #24208 - [jit] Cleanup documentation around `script` and `trace`

](https://our.intern.facebook.com/intern/diff/16811342/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24258

Pulled By: driazati

Differential Revision: D16811342

fbshipit-source-id: 893be85a711ab180319b790ed1c72b93022373c1
2019-08-14 14:04:14 -07:00
716abd8705 Cleanup documentation around script and trace (#24208)
Summary:
Stacked PRs
 * #24258 - [jit] Add `trace_module` to docs
 * **#24208 - [jit] Cleanup documentation around `script` and `trace`**

Examples / info was duplicated between `ScriptModule`, `script`, and
`trace`, so this PR consolidates it and moves some things around to make
the docs more clear.

For reviewers, if you want to see the rendered output, ping me for a
link
](https://our.intern.facebook.com/intern/diff/16746236/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24208

Pulled By: driazati

Differential Revision: D16746236

fbshipit-source-id: fac3c6e762a31c897b132b8421baa8d4d61f694c
2019-08-14 14:04:10 -07:00
0619b57c4c Add the ability to compile exports on traced modules (#24298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24298

This helps in situations like when you have `__{g,s}etstate__` on an `nn.Module` and you'd like to trace the module, but still preserve the serialization methods to make the module semantically correct

Test Plan: Imported from OSS

Differential Revision: D16799800

Pulled By: jamesr66a

fbshipit-source-id: 91c2957c94c9ec97a486ea376b2a3e3a821270af
2019-08-14 13:51:22 -07:00
45962ac5b6 equal() for QuantizedCPU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24211

Test Plan: Imported from OSS

Differential Revision: D16799801

Pulled By: jamesr66a

fbshipit-source-id: d3c17a7b5305f94217aef2740124506f34fe2458
2019-08-14 13:51:18 -07:00
584c6986fd Add the type matching rule for qconfig_dict (#23212)
Summary:
We want to use the Module type as the key for the qconfig_dict for the module replacement during the quantization.

Before this Diff, to dynamic quantize the BERT model, we have to specify each layer:
```
qconfig_dict = {
    'encoder.layer.0.attention.self.query': default_qconfig,
    'encoder.layer.0.attention.self.key': default_qconfig,
    'encoder.layer.0.attention.self.value': default_qconfig,
    'encoder.layer.0.attention.output.dense': default_qconfig,
    'encoder.layer.0.intermediate.dense': default_qconfig,
    'encoder.layer.0.output.dense': default_qconfig,
    'encoder.layer.1.attention.self.query': default_qconfig,
    'encoder.layer.1.attention.self.key': default_qconfig,
    'encoder.layer.1.attention.self.value': default_qconfig,
    'encoder.layer.1.attention.output.dense': default_qconfig,
    'encoder.layer.1.intermediate.dense': default_qconfig,
    'encoder.layer.1.output.dense': default_qconfig,
   ...
}
```
After this Diff, we only need the following
```
qconfig_dict = {
     torch.nn.Linear : default_qconfig
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23212
ghstack-source-id: 88287091

Reviewed By: zafartahirov

Differential Revision: D16436542

fbshipit-source-id: 11fbe68ee460560c1a7cdded63581eb7a00e5a89
2019-08-14 13:07:36 -07:00
bb9996509b Fix expansion of stride argument in avg_pool3d (#23963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23963

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16695160

Pulled By: ezyang

fbshipit-source-id: dc8fd1f0c7096fcd4eb48ce42069307915052a77
2019-08-14 12:47:10 -07:00
897245c16d Fix expansion of stride argument in avg_pool2d (#23961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23961

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16695162

Pulled By: ezyang

fbshipit-source-id: 28eca6920bd1b4e72286b4ab859cf513dcd0db44
2019-08-14 12:47:07 -07:00
d373dac817 Fix expansion of stride argument in max_pool3d (#23960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23960

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16695161

Pulled By: ezyang

fbshipit-source-id: 36d1777467bbe3f8842736c570b029b72954e027
2019-08-14 12:47:03 -07:00
4952224455 Fix expansion of stride argument in max_pool2d (#23954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23954

There is only one substantive change: when stride.size() == 1,
we expand it to size 2.  However, I also took the opportunity
to give a better error message.

Testing here is bare minimum, because I'm in a hurry.  Just make
sure C++ API with all size 1 inputs works.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16695163

Pulled By: ezyang

fbshipit-source-id: 31674bf97db67e60e4232514c88a72be712bd9ae
2019-08-14 12:46:59 -07:00
4bfd33ed36 Name inference for softmax, log_softmax and Dimname overloads. (#24087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24087

Added name inference rules for softmax and log_softmax.

Added the overloads for Dimname dim to softmax and log_softmax.

Test Plan: - [namedtensor ci]

Differential Revision: D16763391

Pulled By: zou3519

fbshipit-source-id: 676a14666d42441eb7d3c9babef7461c7b78d290
2019-08-14 12:19:27 -07:00
5cb8a7b396 Fix out= function semantics for named tensors. (#24028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24028

Previously, torch.abs(tensor, out=out) would ignore the names of the
`out` tensor and overwrite them with the names of `tensor`.

This patch changes the behavior to the following:
1) If `out` does not have names, then overwite them with `tensor.names`.
2) If `out` does have names, then check that `out.names` equals
`tensor.names`.

This patch also includes the following clean ups:
- renamed `default_names` to `FIXME_default_names` because it is
inefficient and needs to be fixed.
- Renamed impl::internal_get_names / impl::internal_has_names to
impl::get_names / impl::set_names. Devs should feel free to use them, so
I removed the internal_ prefix.
- Moved internal_set_names to NamedTensor.{h, cpp}. These functions
still have the internal_ prefix because their use requires caution.

Test Plan: - [namedtensor ci]

Differential Revision: D16763387

Pulled By: zou3519

fbshipit-source-id: 57dcc7c759246def0db2746d1dca8eddd5e90049
2019-08-14 12:19:23 -07:00
a5872a16a0 Rename torchtest.test_all_device_types to torchtest.for_all_device_types (#24337)
Summary:
Rename decorator to `for_all_device_types` as `test_` prefixed name recognized as test in some environments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24337

Differential Revision: D16806807

Pulled By: VitalyFedyunin

fbshipit-source-id: 3132366046e183329ba5838a4bc29441fdb5bd4e
2019-08-14 12:09:51 -07:00
8a7e57c416 clean up import_source (#24282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24282

This moves a test from Python to cpp, and in doing so lets us clean up a
bunch of otherwise unused code.

Test Plan: Imported from OSS

Differential Revision: D16800562

Pulled By: suo

fbshipit-source-id: ebc29bb81f4fb2538081fa309ead1739980f1093
2019-08-14 11:26:26 -07:00
c158848abe class_table_ to deps_table_ (#24281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24281

These are not just classes anymore, rename

Test Plan: Imported from OSS

Differential Revision: D16800564

Pulled By: suo

fbshipit-source-id: 8b8d508944c26a8916fc7642df43f22583dfcf82
2019-08-14 11:26:22 -07:00
735df86caa make FunctionType a NamedType (#24280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24280

This simplifies the groundwork for serializing functions.

Test Plan: Imported from OSS

Differential Revision: D16800560

Pulled By: suo

fbshipit-source-id: 129b32dddb39494daeade33c87d76248486a86b2
2019-08-14 11:26:18 -07:00
025116cf4a make NamedType an interface (#24279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24279

As title. I want to let children be able to define how to get their own
name

Test Plan: Imported from OSS

Differential Revision: D16800563

Pulled By: suo

fbshipit-source-id: 6a12ffef96b0dfa5543c5463386170de7726ad58
2019-08-14 11:26:14 -07:00
5839a59ae3 simplify NamedType interface (#24278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24278

We had a lot of redundant methods. Killing them.

Test Plan: Imported from OSS

Differential Revision: D16800561

Pulled By: suo

fbshipit-source-id: 60acc1d5b0f34130a1f66a1e5bc7df364a5feb57
2019-08-14 11:26:10 -07:00
abadf0079f fix list comprehension type assumed to the same as input type (#24271)
Summary:
Previously we didn't handle list comprehensions where the expression produced a different type than the input list.
`[float(x) for x in [1, 2, 3]`

Fix for https://github.com/pytorch/pytorch/issues/24239
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24271

Differential Revision: D16806564

Pulled By: eellison

fbshipit-source-id: 1af6a174b9d17a6ea7154511133c12c691eb9188
2019-08-14 11:20:03 -07:00
a69a62cf83 fix test_jit.py so it can be run in parallel (#24311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24311

Now you can run tests with `pytest -n auto test/test_jit.py` to get
tests to run in parallel. On my devfair in opt mode, this takes < 30
seconds, which is a huge improvement.

The actual changes are places where we hard-coded certain things that
got changed due to how pytest-xdist distributes tests:
1. Warnings are filtered after they are tripped once, so
`test_trace_warn` shouldn't rely on warning counts.

2. various file/save things hardcoded paths inappropraitely

Test Plan: Imported from OSS

Differential Revision: D16801256

Pulled By: suo

fbshipit-source-id: 62a3543dd7448a7d23bdef532953d06e222552ee
2019-08-14 11:15:14 -07:00
88b1f6619e Return list of AccessedFeatures from get_accessed_features (#23983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23983

While testing I realized that model layers can extract different types of features from the same column.  For example, MultifeedFeaturesTransform uses float and ID list features from the "features" column.

get_accessed_features returns a map from column to AccessedFeatures, and AccessedFeatures only has the feature IDs for one feature type.  This is incompatible with have multiple types of features per column, one type ends up overwriting another in the map.

To fix this, I've modified get_accessed_features to return a map from column to a list of AccessedFeatures objects.

Reviewed By: itomatik

Differential Revision: D16693845

fbshipit-source-id: 2099aac8dc3920dd61de6b6ad5cf343c864803bc
2019-08-14 10:50:27 -07:00
b53916a373 C2/glow: assign net_pos to a net before applying onnxifi_blacklist_ops (#24262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24262

Previously for onnxifi_blacklist_ops option, we figure out the net_pos based on the order of ops in the net. But this logic is wrong if the net already has net_pos assigned and we may end up blacklisting unintended ops. Fix this issue to always assign net_pos before computing any blacklist.

Reviewed By: yinghai

Differential Revision: D16789166

fbshipit-source-id: 2d08a7737d417822f2209adb4dcb24dbb258ff90
2019-08-14 10:39:15 -07:00
f996f8d61d Update tensor.view_names / tensor.names_ API (#23973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23973

Without loss of generality, I describe the API for `tensor.view_names`.
`tensor.names_` has an analogous API.

`tensor.view_names(*names)` returns a view on tensor with named dims `names`.
`names` must be of length `tensor.dim()`; otherwise, if '*' is in `names`,
then it (known as the "glob") is expanded greedily to be equal to the
corresponding names from `tensor.names`.

For example,
```
>>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W'))
>>> x.view_names('*', 'height', 'width').names
('N', 'C', 'height', 'width')

>>> x.view_names('batch', '*', 'width').names
('batch', 'C', 'H', 'width')
```

tensor.view_names(**rename_map) returns a view on tensor that has
renamed dims as specified in the mapping `rename_map`.

For example,
```
>>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W'))
>>> x.view_names(W='width', H='height').names
('N', 'C', 'height', 'width')
```

These are different(!!!) from the C++ API, which only allows the
following:
- tensor.view_names(optional<DimnameList>)

C++ API parity for named tensors is not important right now; I am
punting that to the future.

Test Plan: - [namedtensor ci]

Differential Revision: D16710916

Pulled By: zou3519

fbshipit-source-id: 7cb8056c0fb4c97b04c3a2d1dd0f737e0a67ce34
2019-08-14 09:40:35 -07:00
2fcdb3a1f3 Rename set_names -> view_names, set_names_ -> names_ (#23962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23962

This change should make the semantics clearer.

`tensor.names_(names)` sets tensor.names to be `names`.

`tensor.view_names(names)` returns a view of the tensor with names
`names`.

Test Plan
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16710915

Pulled By: zou3519

fbshipit-source-id: c82fa9812624d03c86f7be84b0a460e3c047aaa0
2019-08-14 09:40:31 -07:00
7030f2c623 Implement tensor.align_to(names), torch.align_tensors(*tensors) (#23804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23804

`output = tensor.align_to(names)` returns a view of `tensor` such that
`output.names = names`. Dimensions with the same names in `tensor` and
`output` have the same sizes; dimensions with new names have size 1.

The following must be true for this operation to succeed:
1) tensor.names must be a subsequence (not necessarily contiguous) of `names`
2) Aligning tensor.names to names must not change the absolute position from the
   right of any unnamed dimension.

In practice, these constraints mean that aligning cannot transpose
names.

Some examples:
- Tensor[C].align_to(C) -> Tensor[C]
- Tensor[N].align_to([N, C]) -> Tensor[N, C]
- Tensor[H, W].align_to([N, H, W, C]) -> Tensor[N, H, W, C]
- Tensor[None].align_to([N, None]) -> Tensor[N, None]
- Tensor[N].align_to([N, None None]) -> Tensor[N, None, None]

Examples of error cases:
- Tensor[W, H].align_to([N, H, W, C]) -> Error (not a subsequence)
- Tensor[None, H].align_to([None, H, W]) -> Error (would change the
absolute position from the right of a None dimension)

`torch.align_tensors(*tensors)` aligns the named dimensions of each
tensor according to the alignment rules so that they can be used in an
operation. More concretely, it aligns each tensor to the
longest names among the names of the tensors in `tensors`.

This allows users to emulate "broadcasting by names", which is one of
the things named tensors tries to enable. Here is an example:

```
imgs: Tensor[N, C, H, W]
scale: Tensor[N]

// Doesn't work because we do broadcasting by alignment by default
imgs * scale

// Does work
imgs, scale = torch.align_tensors(imgs, scale)
imas * scale
```

Future:
- Consider allowing broadcasting by names by default.

Test Plan:
- The diff looks pretty large but more than half of it is testing.
- new tests [namedtensor ci]

Differential Revision: D16657927

Pulled By: zou3519

fbshipit-source-id: e2f958bf5146c8ee3b694aba57d21b08e928a4e6
2019-08-14 09:40:27 -07:00
eabfca3577 Named inference for contiguous(), bernoulli variants, and dropout. (#24109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24109

See title.

Test Plan: - New tests [namedtensor ci]

Differential Revision: D16763389

Pulled By: zou3519

fbshipit-source-id: ea14af0fe812d04ca7127a080e56c273b21c30bc
2019-08-14 06:19:28 -07:00
ad42c7d0f3 Implement name inference rule for empty_like, clone (#24108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24108

`torch.empty_like(tensor)` and `tensor.clone()` both propagate names to
the output tensor.

As a part of this change, I fixed the empty(..., names=) overload to
include the `memory_format` argument in the normal `empty` declaration
in native_functions.yaml.

Test Plan: - [namedtensor ci]

Differential Revision: D16763392

Pulled By: zou3519

fbshipit-source-id: c7b2bc058d26a515a5fd8deef22c2acb290c8816
2019-08-14 06:19:24 -07:00
65fa0233c5 Add names argument to ones, rand, randn, zeros, full; fix empty (#24107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24107

In the short term, we implement this by having overloads for each of
these functions. In the long term, the plan is to move DimnameList to
TensorOptions so that we do not have to duplicate work.

Also fixes the implementation of empty. If there are no names, we should
just return an unnamed tensor instead of telling the user we don't
support their backend/layout.

Test Plan: - [namedtensor ci]

Differential Revision: D16763393

Pulled By: zou3519

fbshipit-source-id: 7324a6b157187d4f74abc5459052f3323a417412
2019-08-14 06:19:21 -07:00
e4c9aa8124 format changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24270

Reviewed By: yinghai

Differential Revision: D16775953

fbshipit-source-id: 8a77e770aa52c0afdd60cf44330dda35846d434b
2019-08-14 01:08:31 -07:00
7afe0a8c6d no_deadline on ModuleAPITests and skip on dynamic quantization test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24307

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D16800749

Pulled By: jamesr66a

fbshipit-source-id: 7ce466794c13d598b4396bd33fcdcffb57bac1cb
2019-08-13 23:27:15 -07:00
9492a5e0b6 Add logging to autodiff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23664

Differential Revision: D16800787

Pulled By: Krovatkin

fbshipit-source-id: e59a34bff0fb91eb8151c7a5504cdfa6fa23c32b
2019-08-13 23:22:16 -07:00
93d2cd7619 Skip test_quantized_nn_mods tests if theres no FBGEMM
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24302

Test Plan: Imported from OSS

Differential Revision: D16800352

Pulled By: jamesr66a

fbshipit-source-id: 56650d8c937afca77005ad39a5bc38ebd6e71414
2019-08-13 21:23:19 -07:00
514285890c Enable QNNPACK for iOS (#24030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24030

The cmake arg - `USE_QNNPACK` was disabled for iOS build due to its lack of support for building multiple archs(armv7;armv7s;arm64) simultaneously.To enable it, we need to specify the value of IOS_ARCH explicitly in the build command:

```
./scripts/build_ios.sh \
-DIOS_ARCH=arm64 \
-DBUILD_CAFFE2_MOBILE=OFF \
```
However,the iOS.cmake will overwirte this value according to the value of `IOS_PLATFORM`. This PR is a fix to this problem.

Test Plan:
- `USE_QNNPACK` should be turned on by cmake.
- `libqnnpack.a` can be generated successfully.
- `libortch.a` can be compiled and run successfully on iOS devices.

<img src="https://github.com/xta0/AICamera-ObjC/blob/master/aicamera.gif?raw=true" width="400">

Differential Revision: D16771014

Pulled By: xta0

fbshipit-source-id: 4cdfd502cb2bcd29611e4c22e2efdcdfe9c920d3
2019-08-13 21:10:59 -07:00
e94ba742b0 Dynamic Quantized Linear Module (#23128)
Summary:
- ~~Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.~~ Move this to D16404027 for a separate review.
- Add the Dynamic Quantized Linear module in ```torch/nn/quantized/modules/linear.py```. ~~This is in a rudimentary stage. Will add more functions later~~.
- Add the torch.quantize logic (prepare, eval, convert) for dynamic quantization.
- Add a unit test for the Dynamic Quantized Linear module  in ```test_nn_quantized.py```.
- Add a unit test for the Model-level Quantization API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23128
ghstack-source-id: 88257232

Differential Revision: D16258664

fbshipit-source-id: 4be3ac39ee27c088b341c741d3f09f51d5a23ef0
2019-08-13 21:01:23 -07:00
0b1fee0819 Remove escape_path in our build system. (#24044)
Summary:
Which was added in https://github.com/pytorch/pytorch/issues/16412.

Also make some CUDNN_* CMake variables to be build options so as to avoid direct reading using `$ENV` from environment variables from CMake scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24044

Differential Revision: D16783426

Pulled By: ezyang

fbshipit-source-id: cb196b0013418d172d0d36558995a437bd4a3986
2019-08-13 20:38:19 -07:00
c771d50ca2 Remove hard Caffe2 dependency for TensorBoard (#24295)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24175 and https://github.com/pytorch/pytorch/issues/15618

We should not be importing caffe2 (and dependencies like future, etc) unless needed within `torch.utils.tensorboard`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24295

Reviewed By: NarineK

Differential Revision: D16797594

Pulled By: orionr

fbshipit-source-id: 984935e2121b62ea1b87a9de33af18ec45b7837b
2019-08-13 20:33:24 -07:00
ec1e53b462 Add dynamic quantized Linear op in PyTorch (#23464)
Summary:
As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for torch.fbgemm_linear_int8_weight (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear.

The previous Diff D16381552 is reverted because `quantize_linear` expects the scale to be `float`, and the zero_point to be `int`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23464
ghstack-source-id: 88257231

Differential Revision: D16527741

fbshipit-source-id: 66585f668c6e623c50514eb11633bb711d8767f2
2019-08-13 19:59:35 -07:00
3e5e18d2e9 Fix tensor construction from array (#24283)
Summary:
fixes https://github.com/pytorch/xla/issues/929
The original issue complains about no storage because it is trying to construct a xla tensor from tensor_cpu method.
56fb5e03b5/aten/src/ATen/native/TensorFactories.cpp (L731)
In general for backend other than CPU,  this `at::tensor` should construct a CPU tensor and move the tensor to the right backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24283

Differential Revision: D16793872

Pulled By: ailzhang

fbshipit-source-id: bdb502a4e1ee4e78d24751917c4cda6f9928b1d2
2019-08-13 19:47:14 -07:00
45ca36faaf Add out variant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23971

Test Plan: Imported from OSS

Differential Revision: D16695592

Pulled By: zafartahirov

fbshipit-source-id: 210dfceae90ac75c53f56bbb96170bdd8e6b8ff3
2019-08-13 17:36:24 -07:00
4e0af295c1 Fix and test conv2d constructor and from_float
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24277

Test Plan: Imported from OSS

Differential Revision: D16793043

Pulled By: jamesr66a

fbshipit-source-id: bbf74c87aa11adfe15e31ea8190e7542b8127c65
2019-08-13 17:07:19 -07:00
e7f1977bae test_nn_quantized -> test_quantized_nn_mods (#24201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24201

It turns out that the `run_test` script uses a blacklist of "exclude" tests and tests if the test name [starts with](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L342) the given blacklist item. `nn` was passed as a blacklist item in CI, and that meant that not only was test_nn skipped, but also test_nn_quantized. This renames the test to avoid this situation, and imo puts it in a better position lexicographically next to the other quantization tests.

Test Plan: Imported from OSS

Differential Revision: D16772820

Pulled By: jamesr66a

fbshipit-source-id: 4cde0729b48ae3e36fcedab9c98197831af82dde
2019-08-13 17:07:15 -07:00
98a3b3d565 Add name propagation for at::alias, add tensor.set_names (#24202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24202

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D16773014

Pulled By: zou3519

fbshipit-source-id: 61024303c1a34db631cc4cb2c53757345e40d72c
2019-08-13 17:01:18 -07:00
517b3c4cd2 Fix validation of dynamic axes names (#23974)
Summary:
Existing code adds two enumerators to the set instead of forming their union.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23974

Differential Revision: D16732762

Pulled By: ezyang

fbshipit-source-id: 787737b7cf4b97ca4e2597e2da4a6ade863ce85c
2019-08-13 16:33:27 -07:00
74765c0015 Fix rotated rect intersection error (#24171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24171

There can be up to 24, instead of 16, intersections (including duplications) returned from rotated_rect_intersection_pts, which caused errors of num <= 16 assertions in https://fburl.com/scuba/mzmf49xc (thanks to Ananth's report) when the boxes are extremely close (e.g., the newly added unit test case)

Differential Revision: D16760676

fbshipit-source-id: 289c25ef82c094d98bfe570c5d35c055e49703cb
2019-08-13 16:23:13 -07:00
0ea8f22951 Enabled comparison ops for bfloat16 dtype on CPU (#24182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24182

-----
Fix: Enabled comparison operations for BFloat16 on CPU
Test: via unit tests

Test Plan: Imported from OSS

Differential Revision: D16763460

Pulled By: izdeby

fbshipit-source-id: 885ff9006d3bd60bb945147c3b86f97cd0d26f7b
2019-08-13 15:44:24 -07:00
98d3d1659e Document benchmarking practice for CUDA
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23910

Differential Revision: D16732365

Pulled By: ezyang

fbshipit-source-id: 24e055602d479293da3e00a7143bba8f92bb7c4a
2019-08-13 15:07:23 -07:00
f511abb701 increase default warmup iter and iter (#24272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24272

As title, plus some lint

Reviewed By: mingzhe09088

Differential Revision: D16792312

fbshipit-source-id: 1386c369c96da04a584d1f7127b708b29d4b47d2
2019-08-13 14:35:19 -07:00
0f8d1fbe96 Revert D16611883: [jit] simplify NamedType interface
Differential Revision:
D16611883

Original commit changeset: a32c0a8b8b7e

fbshipit-source-id: c0829ec8432a32b0174c26a2cd18f85c0e7f8a3f
2019-08-13 14:07:04 -07:00
1c5e48bbd0 Observer returns original tensor for post training quantization (#24196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24196

Observer returns output with no changes for post training quant. This unifies observer semantics for QAT and PTQ.
ghstack-source-id: 88140887

Differential Revision: D16768277

fbshipit-source-id: fae7c94e3dc0eeda363e9982b3865a15113e11bd
2019-08-13 14:01:37 -07:00
1439152e72 Make hashing default for bucket-weighted pooling (#24266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24266

As title

Reviewed By: huginhuangfb

Differential Revision: D16775870

fbshipit-source-id: f919fdffa014ef3ce9a69fe173dd240e91813c3e
2019-08-13 13:56:32 -07:00
19528d4106 Revert D16611885: [jit] make NamedType an interface
Differential Revision:
D16611885

Original commit changeset: 620b22c314ed

fbshipit-source-id: 5b9cd23ab39dfdb0182a34d4dfc8a3393c862243
2019-08-13 13:48:01 -07:00
c2d352138c Fix missing version < 2 guard in import (#24255)
Summary:
This was accidentally removed in #23241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24255

Pulled By: driazati

Differential Revision: D16788490

fbshipit-source-id: 9465570ade0299a845ec1b51cf88efe9c49b439b
2019-08-13 13:43:00 -07:00
6e442a3fe6 Revert D16611884: [jit] make FunctionType a NamedType
Differential Revision:
D16611884

Original commit changeset: 620d3446cb35

fbshipit-source-id: d2daa30a84dec796a2c8d8309ef41fd27d601825
2019-08-13 13:27:07 -07:00
f36c3e9e4a Revert D16684391: [jit] class_table_ to deps_table_
Differential Revision:
D16684391

Original commit changeset: af0024c0b7fb

fbshipit-source-id: c9b98ac60b460963dc50f4837100909ff8f6c3ea
2019-08-13 13:27:03 -07:00
94aae71ba9 Revert D16684390: [jit] clean up import_source
Differential Revision:
D16684390

Original commit changeset: fca81ca14d1a

fbshipit-source-id: eb229097560ab1ead43756175e552764c8a14703
2019-08-13 13:26:59 -07:00
4e6698573b Ignoring the test logs in case the tests are ran from the parent directory
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24212

Test Plan: Imported from OSS

Differential Revision: D16775806

Pulled By: zafartahirov

fbshipit-source-id: e1a2290129447b847c6bf6fa1aa3514c7e63aaf3
2019-08-13 12:24:17 -07:00
bd054e7cef reduce memory usage for centered rmsprop (#24170)
Summary:
Reduce gpu memory usage by using in-place operation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24170

Differential Revision: D16784495

Pulled By: vincentqb

fbshipit-source-id: 03820cdc9a3952b95b9af0f87d3a9bb0f21e9b4d
2019-08-13 12:18:31 -07:00
5ae909b443 Revert D15920763: Move TensorOptions to ATen/core
Differential Revision:
D15920763

Original commit changeset: c3429973180a

fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d
2019-08-13 12:07:18 -07:00
14ab44f834 Fix flake8 issues in ./torch/jit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24240

Differential Revision: D16783395

Pulled By: ezyang

fbshipit-source-id: 8427b7cd7d0552820cbbf20ebfca86898f3f53f7
2019-08-13 11:50:02 -07:00
c2549cb8d3 Remove DimensionedTensorType (#24077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24077

This replaces all uses of DimensionedTensorType with ProfiledTensorType.
For places where we propagate shape information, we still follow the
dimension-only propagation rules, meaning that even if full size information
is known on inputs the outputs will only have dimension information.

This fixes several bugs in existing implentations that this change uncovered:
* requires_grad was not propgated correctly across loops
* requires_grad on ProfiledTensorType returned false when requires_grad information
  is unknown but the conservative result is true
* some equality code on ProfiledTensorType contained bugs.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D16729581

Pulled By: zdevito

fbshipit-source-id: bd9f823c1c6b1d06a236a1b5b2b2fcdf0245edce
2019-08-13 10:05:47 -07:00
4cc16782f3 Removing the make_module script. (#23635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23635

It appears it is the same complexity to add new modules using a base class and using a generation script.

Test Plan: Imported from OSS

Differential Revision: D16593364

Pulled By: zafartahirov

fbshipit-source-id: 852dcf41f3dfa2a89152042b8e61d0b6defa8feb
2019-08-13 09:58:28 -07:00
6d14f7a214 Simplify tests that should cover all possible devices (#23824)
Summary:
This PR introduce `pytorchtest.test_all_device_types()` decorator which helps to write CPU, CUDA tests faster, iterating single test through all available devices

Simple `test_var_mean_some_dims`  becomes
```
test_var_mean_some_dims (__main__.TestTorch) ... ok
test_var_mean_some_dims_cpu (__main__.TestTorch) ... ok
test_var_mean_some_dims_cuda (__main__.TestTorch) ... ok
```

```python

class pytorchtest():
    """Allows to generate and run per-device unittests.

    This decorator class allows to generate and run per-device unittest.

    Example:

    class _TestTorchMixin(pytorchtest):

        pytorchtest.test_all_device_types()
        def test_zeros_like(self, device):
            expected = torch.zeros((100, 100,), device=device)

    Will execute:

        test_zeros_like (__main__.TestTorch) ... skipped 'Look at test_zeros_like_cpu, test_zeros_like_cuda results.'
        test_zeros_like_cpu (__main__.TestTorch) ... ok
        test_zeros_like_cuda (__main__.TestTorch) ... ok

    To work properly, test class should be inherited from the `pytorchtest`.
    test_all_device_types decorator does not guarantee proper functionality in
    combination with other decorators.

    Please do not extend this decorator to support other cases (such as dtype,
    layouts, etc) without consulting with bigger group. Devices is the special
    case as build flags control additions/removals (see
    https://github.com/pytorch/pytorch/pull/23824 for the reference).
    """
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23824

Differential Revision: D16716959

Pulled By: VitalyFedyunin

fbshipit-source-id: ba39af0f9bce2c4a64da421bbc24d6a1c1d9139d
2019-08-13 09:36:31 -07:00
dc870a3761 Hypothesis tests: add ability to enforce shape inference (#23935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23935

Add parameter to enforce that outputs are inferred

Reviewed By: yinghai

Differential Revision: D16667772

fbshipit-source-id: 44f9c47133749b48c0db25a54f9bd9f4698f3e7d
2019-08-13 05:32:41 -07:00
5bf299b140 Add out variant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23956

Test Plan: Imported from OSS

Differential Revision: D16692445

Pulled By: zafartahirov

fbshipit-source-id: 75c1befb4c9fae7bbe5fb7b1e9bc1a89bf0e4f51
2019-08-13 00:28:54 -07:00
199398bbd1 Disambiguate tensor and string ops (#23748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23748

This extends the changes from https://github.com/pytorch/pytorch/pull/23532

ghstack-source-id: 88157704

Differential Revision: D16629907

fbshipit-source-id: ffcf937ec34a798a971e7d28ad85afb3b646d1fe
2019-08-12 20:35:08 -07:00
a0ddb728e6 toString(FunctionSchema) shows overload name (#23694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23694

-
ghstack-source-id: 88157699

Differential Revision: D16611686

fbshipit-source-id: a48ef1dd49e785e059fb027d9809e9b6deeb6e67
2019-08-12 20:35:04 -07:00
ca9456f10f Use JIT function schema parser to parse builtin RPC ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24207

Reviewed By: mrshenli

Differential Revision: D16774405

Pulled By: smessmer

fbshipit-source-id: 5a1771ebfbd2e505c4d83155e0e1da63e4fa3b25
2019-08-12 20:35:01 -07:00
bb4f4e4d03 clean up import_source (#23846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23846

This moves a test from Python to cpp, and in doing so lets us clean up a
bunch of otherwise unused code.

Test Plan: Imported from OSS

Differential Revision: D16684390

Pulled By: suo

fbshipit-source-id: fca81ca14d1ac9e4d6b47ae5eecaa42b38d69147
2019-08-12 20:30:06 -07:00
2dbd36b384 class_table_ to deps_table_ (#23845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23845

These are not just classes anymore, rename

Test Plan: Imported from OSS

Differential Revision: D16684391

Pulled By: suo

fbshipit-source-id: af0024c0b7fbcca68785ec3fc6dc288ec46a1b84
2019-08-12 20:30:01 -07:00
3f90b85ebc make FunctionType a NamedType (#23697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23697

This simplifies the groundwork for serializing functions.

Test Plan: Imported from OSS

Differential Revision: D16611884

Pulled By: suo

fbshipit-source-id: 620d3446cb353befde090a81a250cdd2d5e35aa8
2019-08-12 20:29:57 -07:00
873e86acbe make NamedType an interface (#23696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23696

As title. I want to let children be able to define how to get their own
name

Test Plan: Imported from OSS

Differential Revision: D16611885

Pulled By: suo

fbshipit-source-id: 620b22c314eddf95159546810e1a00b1646663b8
2019-08-12 20:29:53 -07:00
a0836cb8da simplify NamedType interface (#23691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23691

We had a lot of redundant methods. Killing them.

Test Plan: Imported from OSS

Differential Revision: D16611883

Pulled By: suo

fbshipit-source-id: a32c0a8b8b7e909b386a70abb0827c26cbd37e20
2019-08-12 20:29:49 -07:00
6cae07a668 search class type for methods (#23689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23689

We store methods, no reason to try to lock the CU to find a method on a
class type

Test Plan: Imported from OSS

Differential Revision: D16610045

Pulled By: suo

fbshipit-source-id: d84ad81faa42c4e2da20b666fa3645e22f11dac3
2019-08-12 20:29:45 -07:00
7923884a03 Fix incorrect type annotation on Linear __setstate__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24209

Test Plan: Imported from OSS

Differential Revision: D16777886

Pulled By: jamesr66a

fbshipit-source-id: 4f75b3c16458f093a5ae658d36dcb7a6d313410a
2019-08-12 19:21:41 -07:00
c00c9b2be2 fix py2 imports in _intrinsic/modules (#24206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24206

`unicode_literals` messes up python2 when the literals are put in `__all__`, because the python interpreter expects str and not unicode for elements in an import statement. This fixes that

Test Plan: Imported from OSS

Differential Revision: D16774391

Pulled By: jamesr66a

fbshipit-source-id: fee2562f58b2e2c6480726d8809696961a37c8dd
2019-08-12 19:21:37 -07:00
40db964455 Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (#23658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23658

**How things work for caffe2:**
Caffe2 Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (caffe2 shim) -> caffe2::ThreadPool

**Before this PR:**
Pytorch Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (third_party implementation without mobile optimization)

caffe2::ThreadPool is optimized for mobile. This change leverages this logic for pytorch mobile as a temporary solution improve pytorch mobile perf. It is guarded by the C10_MOBILE macro.
For server side we return nullptr.

**Plan for next steps:**
Implement a mobile version of "at::parallel_for" which uses caffe2::ThreadPool internally so all ATen/TH multithreading usage is mobile optimized.
Refactor QNNPACK and/or pthreadpool to explicitly using "at::parallel_for" primitive to replace pthreadpool_compute_1d for Pytorch.
After QNNPACK is refactored, we will delete the mobile_threadpool() API.

ghstack-source-id: 88073396

Reviewed By: dreiss

Differential Revision: D16594020

fbshipit-source-id: 9f94600756d5f86d24a12a2fd7df3eebd0994f1d
2019-08-12 18:14:15 -07:00
f510409281 Enable FBGEMM tests under UBSAN as well (#23570)
Summary:
Enabling tests under UBSAN as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23570

Test Plan:
buck test mode/dev caffe2/test:quantized
```
Running 29 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649677415136
      ✓ caffe2/test:quantized - test_qtensor (test_quantized_tensor.TestQuantizedTensor) 0.536 1/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_per_channel_affine (test_quantized_tensor.TestQuantizedTensor) 0.453 2/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_reshape (test_quantized_tensor.TestQuantizedTensor) 0.302 3/29 (passed)
      ✓ caffe2/test:quantized - test_qadd_relu_same_qparams (test_quantized.TestQuantizedOps) 0.332 4/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_view (test_quantized_tensor.TestQuantizedTensor) 0.351 5/29 (passed)
      ✓ caffe2/test:quantized - test_qadd_relu_different_qparams (test_quantized.TestQuantizedOps) 0.348 6/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_dequantize_linear (test_quantized_tensor.TestQuantizedTensor) 0.338 7/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_copy (test_quantized_tensor.TestQuantizedTensor) 0.267 8/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_clone (test_quantized_tensor.TestQuantizedTensor) 0.330 9/29 (passed)
      ✓ caffe2/test:quantized - test_qrelu (test_quantized.TestQuantizedOps) 1.774 10/29 (passed)
      ✓ caffe2/test:quantized - test_pool_api (test_nn_quantized.ModuleAPITest) 0.418 11/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_load_save (test_quantized_tensor.TestQuantizedTensor) 0.724 12/29 (passed)
      ✓ caffe2/test:quantized - test_relu_api (test_nn_quantized.FunctionalAPITest) 1.013 13/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_quant_dequant (test_quantized_tensor.TestQuantizedTensor) 1.055 14/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_permute (test_quantized_tensor.TestQuantizedTensor) 0.696 15/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_dtypes (test_quantized_tensor.TestQuantizedTensor) 0.841 16/29 (passed)
      ✓ caffe2/test:quantized - test_quant_dequant_api (test_nn_quantized.ModuleAPITest) 0.616 17/29 (passed)
      ✓ caffe2/test:quantized - test_qtensor_creation (test_quantized_tensor.TestQuantizedTensor) 0.698 18/29 (passed)
      ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 4.743 19/29 (passed)
      ✓ caffe2/test:quantized - test_cat (test_quantized.TestQuantizedOps) 6.992 20/29 (passed)
      ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 8.970 21/29 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 9.403 22/29 (passed)
      ↷ caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) 0.000 23/29 (skipped)
Test output:
> Skipped: QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.
> test_qnnpack_linear (test_quantized.TestQNNPackOps) ... skipped 'QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.'
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.000s
>
> OK (skipped=1)
      ↷ caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) 0.000 24/29 (skipped)
Test output:
> Skipped: QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.
> test_qnnpack_relu (test_quantized.TestQNNPackOps) ... skipped 'QNNPACK does not play well with UBSAN at the moment, so we skip the test if we are in a UBSAN environment.'
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.000s
>
> OK (skipped=1)
      ✓ caffe2/test:quantized - test_max_pool2d (test_quantized.TestQuantizedOps) 8.453 25/29 (passed)
      ✓ caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQuantizedLinear) 0.664 26/29 (passed)
      ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 2.965 27/29 (passed)
      ✓ caffe2/test:quantized - test_qlinear (test_quantized.TestQuantizedLinear) 1.915 28/29 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 60.804 29/29 (passed)
      ✓ caffe2/test:quantized - main 0.000 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649677415136
Summary (total time 68.66s):
  PASS: 28
  FAIL: 0
  SKIP: 2
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: jianyuh

Differential Revision: D16569166

Pulled By: dskhudia

fbshipit-source-id: 53522b4162eb1ebb35b408a1503d9664305c85b0
2019-08-12 17:59:22 -07:00
71fd30e33b JIT serialization for Conv2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24117

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D16765475

Pulled By: jamesr66a

fbshipit-source-id: 4e6f91efac01cd26e2f1e21569242e4a45e4f8de
2019-08-12 16:24:58 -07:00
f66bfa7ec4 state_dict serialization for Conv2d + some bugfixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24116

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D16765476

Pulled By: jamesr66a

fbshipit-source-id: 96275cea87d7f5e7de5d1925cbce220066f1a465
2019-08-12 16:24:54 -07:00
9559c1af3a Re-work Conv2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24115

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D16765474

Pulled By: jamesr66a

fbshipit-source-id: 2bb24ad828f5ff325bd978e384c5ec6a0c9610b0
2019-08-12 16:24:50 -07:00
4a754dc3e3 cleanup warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133

Test Plan: Imported from OSS

Differential Revision: D16746249

Pulled By: zdevito

fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9
2019-08-12 16:12:30 -07:00
1daac9c0a2 Update tensorboard.rst (#22026)
Summary:
**Patch Description**:
Update the docs to reflect one no longer needs to install tensorboard nightly, as Tensorboard 1.14.0 was [released last week](https://github.com/tensorflow/tensorboard/releases/tag/1.14.0).

**Testing**:
Haven't actually tested pytorch with tensorboard 1.14 yet. I'll update this PR once I have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22026

Differential Revision: D16772136

Pulled By: orionr

fbshipit-source-id: 2e1e17300f304f50026837abbbc6ffb25704aac0
2019-08-12 15:02:26 -07:00
936632b120 Thread local debug info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22365

Test Plan:
USE_CUDA=0 python setup.py develop
./build/bin/test_jit

Imported from OSS

Reviewed By: ajyu

Differential Revision: D16065129

Pulled By: ilia-cher

fbshipit-source-id: f300985459a83c2c1049ed8c4fefd23a3144047a
2019-08-12 14:53:57 -07:00
90895c8f85 Fix trace docs (#24191)
Summary:
These were incorrect and didn't run before

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24191

Pulled By: driazati

Differential Revision: D16770604

fbshipit-source-id: 0d8547185871f7f4b1e44c660e45699ed8240900
2019-08-12 14:48:42 -07:00
497bc3f283 Remove unused parameter from FORALL macros and rename STUBS to QINTS.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23340

Test Plan: Imported from OSS

Differential Revision: D16467981

Pulled By: gchanan

fbshipit-source-id: f4535c21ea54838d2086b2887a73e02e28b783d9
2019-08-12 14:43:39 -07:00
f5fefd62e2 Align AT_FORALL macros with AT_DISPATCH macros.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23339

Test Plan: Imported from OSS

Differential Revision: D16467983

Pulled By: gchanan

fbshipit-source-id: 84a29a03d3ec9c6416cad254a9ff1005fdc6324f
2019-08-12 14:43:35 -07:00
75c1419b46 Add Pickler C++ API (#23241)
Summary:
This PR adds functions to wrap the Pickler and exposes them to the C++ API

](https://our.intern.facebook.com/intern/diff/16746451/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241

Pulled By: driazati

Differential Revision: D16746451

fbshipit-source-id: 25ea5db4174006ce41e2e8989c8a345b82f637a7
2019-08-12 14:43:31 -07:00
d125b5ffa2 Fix C412 lint from flake8-comprehensions update. (#24184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16764168

Pulled By: ezyang

fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6
2019-08-12 14:34:45 -07:00
06c09a266b Ignore bugprone-lambda-function-name in clang-tidy. (#24190)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23947.

In https://github.com/pytorch/pytorch/pull/23970, I ignored these in dispatch macros, but I think it's more maintainable to just block this globally.  And it's a pretty minor issue if it happens anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24190

Differential Revision: D16766329

Pulled By: gchanan

fbshipit-source-id: 7ae7b7781562a8974d974f7eefa8ec7551eb09fc
2019-08-12 14:21:29 -07:00
4c6c9ffaf8 Move iOS.cmake to the cmake folder (#24029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24029

The cmake toolchain file for building iOS is currently in `/third-pary/ios-cmake`. Since the upstream is not active anymore, It's better to maintain this file ourselves moving forward.This PR is also the prerequisite for enabling QNNPACK for iOS.

Test Plan:
- The `libtorch.a` can be generated successfully
- The `libtorch.a` can be compiled and run on iOS devices

<img src="https://github.com/xta0/AICamera-ObjC/blob/master/aicamera.gif?raw=true" width="400">

Differential Revision: D16770980

Pulled By: xta0

fbshipit-source-id: 1ed7b12b3699bac52b74183fa7583180bb17567e
2019-08-12 14:17:28 -07:00
7583519b87 Provide argument in ONNX export to exclude intializers from graph inputs. (#23284)
Summary:
Starting ONNX IR version 4, the initializers in the ONNX graph do not have to be inputs of the graphs. This constraint, which existed in IR version 3 and earlier, was relaxed in IR version 4. This PR provides an API level argument to allow ONNX export with the relaxed constraint of IR version 4, i.e. provides the option to not include initializers as inputs. This allows backends/runtimes to do certain optimizations, such as constant folding, better.

*Edit*: After discussion with houseroad we have the following behavior. For any OperatorExportType, except OperatorExportTypes.ONNX, the current status of export is maintained in this PR by default. However, the user can override it by setting the `keep_initializers_as_inputs` argument to the export API.  But when exporting to ONNX, i.e. OperatorExportType is OperatorExportTypes.ONNX, the current status is changed in that by default the initializers are NOT part of the input. Again, the default can be overridden by setting the `keep_initializers_as_inputs` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23284

Differential Revision: D16459961

Pulled By: bddppq

fbshipit-source-id: b8f0270dfaba47cdb8e04bd4cc2d6294f1cb39cf
2019-08-12 14:17:25 -07:00
465b4de9d4 add function name to error messages generated by checked_tensor_unwrap (#24187)
Summary:
Improve error messages by showing the relevant function call that failed.

Before:
```
>>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other'
```

After:

```
>>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' in call to _th_lt
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24187

Differential Revision: D16769167

Pulled By: nairbv

fbshipit-source-id: 4992eb4e86bdac2ab8805cc5356f7f92c63e1255
2019-08-12 14:02:22 -07:00
75db368031 Revert D16763388: Add name propagation for at::alias, add tensor.set_names
Differential Revision:
D16763388

Original commit changeset: 4b2fb3acc051

fbshipit-source-id: 5be35bdcc2e7c71378af9e34be19305bdd4ba0d1
2019-08-12 13:42:43 -07:00
6772f537f0 Revert D16763390: Improve test_namedtensor.py with named tensor equality check
Differential Revision:
D16763390

Original commit changeset: 170e27ebc4d7

fbshipit-source-id: dbabe837793d8db6493a221b91e43a065baece75
2019-08-12 13:42:39 -07:00
cb4a6327a3 Delete WeakScriptModuleProxy (#23398)
Summary:
This PR deletes `WeakScriptModuleProxy` and uses `ScriptModule` directly and moves the recursive script stuff into `torch/jit/_recursive.py`. The first commit is just moving code, the latter 2 contain the actual changes
](https://our.intern.facebook.com/intern/diff/16712340/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23398

Pulled By: driazati

Reviewed By: eellison

Differential Revision: D16712340

fbshipit-source-id: f907efcec59bb2694c079ab655304324c125e9bb
2019-08-12 13:36:47 -07:00
ceb9a573d9 Implement virtual memory computation in caffe2_benchmark binary (#24144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24144

Implement virtual memory computation in caffe2_benchmark binary in windows.

Reviewed By: hl475

Differential Revision: D16752250

fbshipit-source-id: aceb13ddd507aa2e6bad07de28d79776e6ee517c
2019-08-12 13:08:47 -07:00
90f3f9d9aa Improve test_namedtensor.py with named tensor equality check (#24106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24106

Test Plan
- Code reading. assertTensorDataAndNamesEqual isn't used in this commit
but it'll be used in future commits.
- [namedtensor ci]

Test Plan: Imported from OSS

Differential Revision: D16763390

Pulled By: zou3519

fbshipit-source-id: 170e27ebc4d79aca939c5d101489b20faedc6133
2019-08-12 12:45:00 -07:00
1108fa1acb Add name propagation for at::alias, add tensor.set_names (#24105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24105

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D16763388

Pulled By: zou3519

fbshipit-source-id: 4b2fb3acc0514515e7ca805dbc5c3d4a9bd96317
2019-08-12 12:44:56 -07:00
bb4f380f35 Optimizing out the division in the fusion
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23275

Test Plan: Imported from OSS

Differential Revision: D16450294

Pulled By: zafartahirov

fbshipit-source-id: 2f1ebf3673ed0467a9f6a912e08e5d95f9b27d0b
2019-08-12 11:35:37 -07:00
b028cc752b Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: bae513d6026fd7526994742db1a77c05ae587657
2019-08-12 11:35:33 -07:00
a671609a41 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 57fcd91545c429f994a6f6183156b848355abc1f
2019-08-12 09:55:16 -07:00
b9a006f947 Make all at::Tensor in-place methods const (#23945)
Summary:
https://github.com/pytorch/pytorch/issues/23901
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23945

Differential Revision: D16738606

Pulled By: pbelevich

fbshipit-source-id: df170374e23901f7486b980584641ae6ffaf6cc4
2019-08-12 08:12:38 -07:00
bde73860c6 Move TensorOptions to ATen/core (#22020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020
ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6

Test Plan: Imported from OSS

Differential Revision: D15920763

Pulled By: zou3519

fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716
2019-08-12 07:41:12 -07:00
a5f697619c Add interfaces in lr_scheduler.pyi (#23934)
Summary:
Some interfaces of schedulers defined in lr_scheduler.py are missing in lr_scheduler.pyi.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23934

Differential Revision: D16726622

Pulled By: ezyang

fbshipit-source-id: 45fd2d28fbb658c71f6fcd33b8997d6ee8e2b17d
2019-08-12 07:03:41 -07:00
77c08aa46c serialize modules as classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23098

Test Plan: Imported from OSS

Differential Revision: D16383328

Pulled By: suo

fbshipit-source-id: 36389b8e45c3febb7f224cd9c630fe643fa90bef
2019-08-11 15:50:29 -07:00
5ec1c293eb Revert D16552212: [jit] fix py-compat fbcode lint warnings
Differential Revision:
D16552212

Original commit changeset: 7c7de5a096ad

fbshipit-source-id: b5ea5f626883e2b213b9d02875e83e64ed206e58
2019-08-10 21:58:25 -07:00
6be24be9ff OpenCV 4 compatibility fix for caffe2/video (#24143)
Summary:
Trying to fix https://github.com/pytorch/pytorch/issues/24073 as in https://github.com/pytorch/pytorch/issues/9966.  Make caffe2 compile with OpenCV 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24143

Differential Revision: D16753624

Pulled By: ezyang

fbshipit-source-id: 524eac10a9285e0c0a803a8566917aa95aa0662c
2019-08-10 14:50:20 -07:00
365b3ff56e send flake8 to stderr (#24100)
Summary:
Doing these one at a time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24100

Differential Revision: D16753599

Pulled By: suo

fbshipit-source-id: cfd317a2463cf6792758abe04c0f01a146a7ec47
2019-08-10 13:35:27 -07:00
d3f6d5885d Replace Module::copy_into with Module::clone. (#24068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24068

The new method has a simpler interface (no arguments).

Resolves #23915.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24068

Differential Revision: D16736379

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 1c1f397ce9cdaa5467fd7da3025cf44d1436ae6b
2019-08-09 18:25:38 -07:00
9843993888 is_quantized support in JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24099

Test Plan: Imported from OSS

Differential Revision: D16742983

Pulled By: jamesr66a

fbshipit-source-id: f760df4e7e91f9f76c7e153db59984b3ae938280
2019-08-09 17:32:42 -07:00
a45dafc66a JIT Serialization of nnq.Linear (#24048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24048

Add `__{g,s}etstate__ methods on `nnq.Linear` for JIT (and torch.{save,load} serialization).

Unfortunately, this unearthed a bug in serialization documented in https://github.com/pytorch/pytorch/issues/24045. The check that triggered the bug has been disabled pending a fix

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D16728347

Pulled By: jamesr66a

fbshipit-source-id: c3b850be3b831f4c77cec3c2df626151b2af8b34
2019-08-09 17:14:58 -07:00
ca2010cfea State dict serialization of nnq.Linear (#24047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24047

Add `_{save_to,load_from}_state_dict` methods to `nnq.Linear` that explicitly deal with conversions from the Python attributes to the serialized state dict form

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D16728346

Pulled By: jamesr66a

fbshipit-source-id: 182c9f5069d509147dc9020b341b6cb87505fe7f
2019-08-09 17:14:52 -07:00
442b3512d4 Simplified nnq.Linear class (#24046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24046

`nnq.Linear` was a confusing mess of buffers/attributes and Tensor/not tensor members. This PR reworks it to consistently have only Python attributes, with the conversions handled explicitly by state_dict or __{get,set}state__ methods (added in PRs further up the stack

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D16728345

Pulled By: jamesr66a

fbshipit-source-id: 47468b776b428fca2409bb55c8b161afb68a3379
2019-08-09 17:14:48 -07:00
b453fd9916 separate input shapes to reduce default execution time (#24136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24136

This diff aims to reduce the execution of benchmark_all_test which runs all the supported operator benchmarks. In the default run, only one shape of each operator will be benchmarked. The rest of the benchmarks can be triggered with tag_filter flag.

Reviewed By: hl475

Differential Revision: D16736448

fbshipit-source-id: 33bd86f6fc2610f87f24240ad559fb11d3e35e89
2019-08-09 17:09:21 -07:00
ca7e2a78e0 support grad and data attribute for tensor in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23842

Differential Revision: D16683556

fbshipit-source-id: 3e262dc7e497f07d0edb3ab18efc89f74f1d5736
2019-08-09 16:46:16 -07:00
2790439b9d add initial support for sparse tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23841

Differential Revision: D16683560

fbshipit-source-id: 3abc098399f73a74c44b22c175b0734d145334aa
2019-08-09 16:46:13 -07:00
83a594cf56 Adding dequantize_val and requantize_val
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23909

Test Plan: Imported from OSS

Differential Revision: D16678276

Pulled By: zafartahirov

fbshipit-source-id: db0f3033774b44d6aed6e60e84b20b6f4c220ec0
2019-08-09 16:27:00 -07:00
be5eb6782b Fix builtin function reference (#24056)
Summary:
This was previously buggy and not being displayed on master. This fixes
the issues with the script to generate the builtin function schemas and
moves it to its own page (it's 6000+ lines of schemas)

Sphinx looks like it will just keep going if it hits errors when importing modules, we should find out how to turn that off and put it in the CI

This also includes some other small fixes:
* removing internal only args from `script()` and `trace()` docs, this also requires manually keeping these argument lists up to date but I think the cleanliness is worth it
* removes outdated note about early returns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24056

Pulled By: driazati

Differential Revision: D16742406

fbshipit-source-id: 9102ba14215995ffef5aaafcb66a6441113fad59
2019-08-09 15:58:15 -07:00
211bafc2ea c10 dispatcher stores autograd kernels (#23666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23666

ghstack-source-id: 88050430

Differential Revision: D16603130

fbshipit-source-id: bc77c218a4664ad3b57d6918043c93c9df3b42ca
2019-08-09 15:10:41 -07:00
296f218ac7 Allow kernels that don't have a boxed version (#23665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23665

For many ATen ops, c10 can't generate boxed kernel versions yet.
We need to allow kernels that have only unboxed versions for them to be registerable with c10.
ghstack-source-id: 88050429

Differential Revision: D16603132

fbshipit-source-id: 84cae4a514da104f5035d23a4059ca6197469f9c
2019-08-09 15:10:37 -07:00
9dbee5f8e5 Unboxed kernels in c10 (#23447)
Summary:
The c10 dispatcher now also stores a `void*` pointer to the unboxed kernel function and this kernel function can be called if the call site knows the exact kernel signature.

It is not clear if this API will survive in the long term, but in the short term this allows an easier migration from ATen to c10 and is supposed to replace ATenDispatch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23447
ghstack-source-id: 88050435

Differential Revision: D16521939

fbshipit-source-id: 7e570df5a721defc677c3cc91758651dbe06ce1c
2019-08-09 15:10:33 -07:00
352032c93c Open up AliasAnalysisKind for any ops (#23834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23834

Re-expot of reverted PR https://github.com/pytorch/pytorch/pull/23810 with the bug fixed

A previous diff removed the special casing for aten:: and prim:: ops in alias analysis and implements alias analysis purely
based on the AliasAnalysisKind. To be sure it doesn't break our existing code base, it added asserts that make sure that
our existing aten:: and prim:: ops set the correct AliasAnalysisKind.

However, we don't need that restriction for future ops. Since we are now certain all existing cases are set up correctly,
we can remove these assertions.
ghstack-source-id: 88050427

Differential Revision: D16657239

fbshipit-source-id: 8a7606da8e9bd961bf47e3e1587b622a9c247ec6
2019-08-09 15:10:29 -07:00
390bfd5220 support dict augment assign in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23639

Differential Revision: D16683559

fbshipit-source-id: bf82df2a93d6dbf2d60f3618c03a650b15453275
2019-08-09 14:57:00 -07:00
c79a07e3a4 Added type annotations to unpooling layers (#24101)
Summary:
Currently, `output_size` gets inferred as a `Tensor` type, which isn't correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24101

Differential Revision: D16742162

Pulled By: Chillee

fbshipit-source-id: 61ba030773dac2830c75974060bd316c35607126
2019-08-09 14:02:11 -07:00
c21a774076 Moves clamp from autodiff cpp to symbolic script (#23927)
Summary:
This PR:

- Moves clamp from autodiff cpp to symbolic script
- Adds an additional tuple lowering pass to the graph executor
- Updates clamp backwards to be maximally gradient preserving

Moving clamp to symbolic script presented two challenges:

- When the backward graph is defined the branch taken in the conditional is known, but communicating this information to the Jit is a little tricky. It turns out the Jit has a quirk where variables that can be None at the time of graph instantiation are treated as constants, so testing min and max against None lets the Jit instantiate only one path branch. It might be more natural to select different backward functions for these cases, but that is not yet supported.
- Moving clamp to symbolic script introduced an extra tuple construction and immediate unpacking which prevented fusion. This was dealt with by adding an additional tuple removal pass. This issue could appear whenever a symbolic script's return value was defined in an if statement, which made the Jit see the unpacked tuple as being constructed from an if, not a TupleConstruct. The graph is later optimized but tuple lowering was not performed again after these optimizations.

Moving clamp to symbolic script also adds some explicit conversions to float in graphs which it appears, but these seem harmless.

If clamp were simply moved to symbolic script then its backward graphs would look like this:

`graph(%0 : Float(*, *),
      %1 : AutogradZeroTensor,
      %2 : Float(*, *),
      %3 : int[]?,
      %4 : Scalar?,
      %5 : int):

  %6 : None = prim::Constant() # <string>:5:31
  %7 : float = aten::Float(%5) # <string>:12:37
  %8 : Float(*, *) = prim::FusionGroup_0(%0, %2, %7)
  %9 : (Float(*, *), None, None) = prim::TupleConstruct(%8, %6, %6)
  %10 : Float(*, *), %11 : None, %12 : None = prim::TupleUnpack(%9)
  return (%10)
with prim::FusionGroup_0 = graph(%0 : Float(*, *),
      %1 : Float(*, *),
      %2 : float):
  %3 : Bool(*, *) = aten::le(%1, %2) # <string>:12:29
  %mask.5 : Float(*, *) = aten::type_as(%3, %1) # <string>:12:29
  %5 : Float(*, *) = aten::mul(%0, %mask.5) # <string>:13:28
  return (%5)`

And adding the additional pass to remove tuples eliminates the prim::TupleConstruct and prim::TupleUnpack. Keeping these included previously would cause test_fuser_iou to fail because multiple fusion groups would be created. Since https://github.com/pytorch/pytorch/issues/23372 this test is disabled, however. When enabled the relevant portion of its graph is now:

`%59 : float = aten::Float(%26) # <string>:314:38

  %60 : float = aten::Float(%27) # <string>:314:61
  %61 : int[] = aten::size(%14) # <string>:41:99
  %62 : int[] = aten::size(%11) # <string>:42:100
  %63 : int[] = aten::size(%15) # <string>:41:99
  %64 : int[] = aten::size(%12) # <string>:42:100
  %65 : Tensor, %66 : Tensor, %67 : Tensor, %68 : Tensor, %69 : Tensor, %70 : Tensor, %71 : Tensor, %72 : Tensor, %73 : Double(*, *) = prim::FusionGroup_0(%w.1, %13, %16, %23, %h.1, %54, %inter.1, %0, %12, %15, %18, %17, %29, %11, %14, %60, %59)
  %74 : Tensor = aten::_grad_sum_to_size(%73, %53)
  %75 : Tensor = aten::_grad_sum_to_size(%73, %52)
  %grad_self.10 : Tensor = aten::_grad_sum_to_size(%65, %61) # <string>:41:30
  %grad_other.10 : Tensor = aten::_grad_sum_to_size(%66, %62) # <string>:42:31
  %78 : Tensor = prim::FusionGroup_1(%grad_self.10, %74, %36)
  %79 : Tensor = prim::FusionGroup_2(%grad_other.10, %75, %44)
  %grad_self.14 : Tensor = aten::_grad_sum_to_size(%67, %21) # <string>:33:30
  %grad_other.14 : Tensor = aten::_grad_sum_to_size(%68, %22) # <string>:34:31
  %grad_self.12 : Tensor = aten::_grad_sum_to_size(%69, %63) # <string>:41:30
  %grad_other.12 : Tensor = aten::_grad_sum_to_size(%70, %64) # <string>:42:31
  %grad_self.16 : Tensor = aten::_grad_sum_to_size(%71, %19) # <string>:33:30
  %grad_other.16 : Tensor = aten::_grad_sum_to_size(%72, %20) # <string>:34:31
  %86 : Tensor, %87 : Tensor = prim::FusionGroup_3(%grad_self.12, %grad_self.16, %74, %39)
  %88 : Tensor, %89 : Tensor = prim::FusionGroup_4(%grad_other.12, %grad_other.16, %75, %47)
  return (%79, %88, %89, %78, %86, %87, %grad_self.14, %grad_other.14)`

Which I think is expected/desired.

Finally, this implementation of clamp backwards is "maximally gradient preserving," which simply means that elements on the boundary now receive gradients. For example, if an element of a tensor is 5 and the clamp is to [2, 5], then that element will now receive a gradient. The prior implementation would zero these gradients. See https://github.com/pytorch/pytorch/issues/7002 for a discussion on preserving gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23927

Test Plan: Existing tests provided sufficient coverage.

Differential Revision: D16739740

Pulled By: mruberry

fbshipit-source-id: c94291d20e1f3f25197afc7b74dc61aeb204b074
2019-08-09 13:57:03 -07:00
1d3d92e770 Port addcdiv operator from the TH code to Aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24086

Differential Revision: D16733306

Pulled By: ifedan

fbshipit-source-id: c103bc44e0bb42dff0229252e1a12ce9b4e5aeae
2019-08-09 13:48:01 -07:00
3c1270a730 Revert D16675418: [jit] Add Pickler C++ API
Differential Revision:
D16675418

Original commit changeset: 76543c81ac67

fbshipit-source-id: f0249d16d363c4ecbceecd1bf610dc280e659cc0
2019-08-09 13:13:15 -07:00
a6c3a95b7b Updating submodules
Reviewed By: yns88

fbshipit-source-id: c525db5ec7c34f3cfa66530dad6d8b24077c94c8
2019-08-09 13:02:01 -07:00
c48fbbf215 Revert D16603913: [pytorch][PR] Enhance Tensor indexSelect performance
Differential Revision:
D16603913

Original commit changeset: baaa02f184a8

fbshipit-source-id: bdbafc65ff0f2eb1962fc1c532fa107ed124a46f
2019-08-09 12:41:31 -07:00
f5cb95113d Don't redefine unecessary type stub.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23338

Test Plan: Imported from OSS

Differential Revision: D16467977

Pulled By: gchanan

fbshipit-source-id: d38d6bad9cc3ba6e7389186a497564f73832b858
2019-08-09 12:36:39 -07:00
2f03205c65 Support torch::tensor and at::tensor with bool and BFloat16 dtypes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23337

Test Plan: Imported from OSS

Differential Revision: D16467979

Pulled By: gchanan

fbshipit-source-id: 2e6ad431c47a61c917d501390d14c55b788958ab
2019-08-09 12:36:35 -07:00
01d98c7cfb Add Pickler C++ API (#23241)
Summary:
This PR adds functions to wrap the Pickler and exposes them to the C++ API
](https://our.intern.facebook.com/intern/diff/16675418/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241

Pulled By: driazati

Differential Revision: D16675418

fbshipit-source-id: 76543c81ac67c3e20a75ebc2073191bcbd6573bf
2019-08-09 12:25:30 -07:00
e81f296807 Fixed Bool in IsIntegralType bug (plus review comments) (#23942)
Summary:
Same as https://github.com/pytorch/pytorch/pull/23887, but also includes review comments, so we can kick off a build.

Original PR:
This [PR](https://github.com/pytorch/pytorch/pull/23346) caused [this](https://github.com/pytorch/pytorch/issues/23882) bug.

Fix:
- Deprecate old isIntegralType and add overload which takes a boolean flag which tells if torch.bool should be included in integral types or not.

Testing:
- Added extra test cases
- Tested via running unit tests locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23942

Differential Revision: D16688056

Pulled By: gchanan

fbshipit-source-id: eff457e27b13e116c05ffd022b2fb0495abe0e97
2019-08-09 12:25:27 -07:00
f45ec71c4e fix py-compat fbcode lint warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23530

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D16552212

Pulled By: suo

fbshipit-source-id: 7c7de5a096ad9a125976e4710d3660294d3991c5
2019-08-09 12:06:21 -07:00
0002448b43 Enhance Tensor indexSelect performance (#23055)
Summary:
This is try to reduce the overhead on the index_select on CPU path at DLRM (https://github.com/facebookresearch/dlrm). To make src as contiguous can make it go into the parallelied path in Tensor indexSelect function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23055

Differential Revision: D16603913

Pulled By: ezyang

fbshipit-source-id: baaa02f184a8e70f1193e5d96ada195a46d140b9
2019-08-09 11:52:04 -07:00
d27fb41167 tensor_numpy: add missing include header (#24042)
Summary:
This patch fixes the following error:
```
In file included from /path/to/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:0,
                 from ../torch/csrc/utils/numpy_stub.h:19,
                 from ../torch/csrc/utils/tensor_numpy.cpp:2:
../torch/csrc/utils/tensor_numpy.cpp: In function 'bool torch::utils::is_numpy_scalar(PyObject*)':
../torch/csrc/utils/tensor_numpy.cpp:223:11: error: 'PyInt_Check' was not declared in this scope
   return (PyArray_IsIntegerScalar(obj) ||
           ^
../torch/csrc/utils/tensor_numpy.cpp:225:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24042

Differential Revision: D16732545

Pulled By: ezyang

fbshipit-source-id: 8d73d228b88b4a95daedcd7a4ef81c268830792e
2019-08-09 11:43:08 -07:00
4f254c3c33 Fix typo "properlyh"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24067

Differential Revision: D16732526

Pulled By: ezyang

fbshipit-source-id: 0f3a5b53c0e46bd40a6e5c838504301766c00a82
2019-08-09 11:43:04 -07:00
928754b67d make more iterator attributes private (#23744)
Summary:
1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list.
2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail.
3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`.

These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744

Differential Revision: D16732507

Pulled By: ezyang

fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26
2019-08-09 11:43:00 -07:00
9b551b1ff7 Fix regression in triangular_solve when number of batches = 1 for CUDA (#23953)
Summary:
Changelog:
- When number of batches = 1, dispatch to trsm instead of trsm_batched in MAGMA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23953

Test Plan: - All triangular_solve tests should pass to ensure that the change is valid

Differential Revision: D16732590

Pulled By: ezyang

fbshipit-source-id: 7bbdcf6daff8a1af905df890a458ddfedc01ceaf
2019-08-09 11:42:57 -07:00
mal
81ba2df554 Allow forward functions with single output to return Variable (#23803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23803

Custom `forward()` can return a `Variable` in case of single outputs instead of returning a `variable_list` of size 1.

Test Plan: Modified tests involving single output forward functions.

Reviewed By: ezyang

Differential Revision: D16673857

Pulled By: ezyang

fbshipit-source-id: c96d9473b48ad99e6736a68d334b333a917498b7
2019-08-09 11:10:14 -07:00
0bba302da5 Revert D16621830: Add name propagation for at::alias, add tensor.set_names
Differential Revision:
D16621830

Original commit changeset: f8a3837d3a37

fbshipit-source-id: 801ab858a0741d98b0b9d56763fa70a9010fe75e
2019-08-09 10:55:18 -07:00
71352fbd9a Revert D16667816: Improve test_namedtensor.py with named tensor equality check
Differential Revision:
D16667816

Original commit changeset: 66519cd5d17b

fbshipit-source-id: 51a26cdfb5624695a492d3ac93fb7a402c44e11a
2019-08-09 10:55:14 -07:00
de97b12dbd Revert D16647820: Add names argument to ones, rand, randn, zeros, full
Differential Revision:
D16647820

Original commit changeset: c6c53c5f26a8

fbshipit-source-id: a341c6eda49f5dd2e1712b65e61fef99791f0668
2019-08-09 10:55:10 -07:00
177a5c3f41 Revert D16647821: Implement name inference rule for empty_like, clone
Differential Revision:
D16647821

Original commit changeset: 43b261f3456b

fbshipit-source-id: 03caecd6898efd292b4f5c5b7254f7d31d502d6a
2019-08-09 10:55:06 -07:00
c23dd83480 Revert D16731478: [pytorch][PR] [C++ Tensor API] Make all at::Tensor in-place methods const
Differential Revision:
D16731478

Original commit changeset: 076d7aea1299

fbshipit-source-id: eaca61af849772d7a842a84bd203eba4d820874d
2019-08-09 10:32:54 -07:00
521484eaec Revert D16657926: Named inference for contiguous(), bernoulli variants, and dropout.
Differential Revision:
D16657926

Original commit changeset: 8cd46765b1c7

fbshipit-source-id: fce2202dd101cfc3153f279a0a4651c9b735e044
2019-08-09 10:32:48 -07:00
bb41e62e3b Updated SGD docs with subscripts (#23985)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23982

Obvious improvement imo.

Also changed `rho` to `mu`, since `rho` and `p` look very similar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23985

Differential Revision: D16733037

Pulled By: Chillee

fbshipit-source-id: 5431615d1983f24d6582da6fc8103ac0093b5832
2019-08-09 10:32:40 -07:00
5d47d85392 added mesh plugin (#24039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24039

This diff adds mesh plugin:
- added tests to test_tensorboard.py
- fixed an error occured after updating tensorboard to the latest version (added "components" argument to create_summary_metadata): 5e5badc666 (diff-068400aa3e34121b7256539582374597)

Reviewed By: orionr

Differential Revision: D16714759

fbshipit-source-id: df349541a058fa90310d1815160e29d20c6ef065
2019-08-09 10:22:43 -07:00
aa02b1adcd Fix qconv benchmark (#24019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24019

Permutes are done inside the module. We don't need them outside.

Setting of scale/zero_point has changed.

Reviewed By: jianyuh

Differential Revision: D16712437

fbshipit-source-id: e3cedf9d63347fbf8070d1a65a196e6d4b2833fc
2019-08-09 09:17:55 -07:00
a0556782a0 fix scale and zero_point names (#23991)
Summary:
scale and zero_point name should match with what's used in other methods of the class.

Closes https://github.com/pytorch/pytorch/issues/23881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23991

Test Plan: buck build mode/opt caffe2/benchmarks/operator_benchmark/pt:qconv_test --show-output

Reviewed By: jianyuh

Differential Revision: D16703956

Pulled By: dskhudia

fbshipit-source-id: 5e894bd84caaa20dc7639d4885d59a72f27d8ec2
2019-08-09 09:17:51 -07:00
4dd2908dd6 Named inference for contiguous(), bernoulli variants, and dropout. (#23808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23808

See title.

Test Plan: - New tests [namedtensor ci]

Differential Revision: D16657926

Pulled By: zou3519

fbshipit-source-id: 8cd46765b1c791b73448ddf4585dae56d635364d
2019-08-09 09:17:47 -07:00
16b6466e5e Implement name inference rule for empty_like, clone (#23746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23746

`torch.empty_like(tensor)` and `tensor.clone()` both propagate names to
the output tensor.

As a part of this change, I fixed the empty(..., names=) overload to
include the `memory_format` argument in the normal `empty` declaration
in native_functions.yaml.

Test Plan: - [namedtensor ci]

Differential Revision: D16647821

Pulled By: zou3519

fbshipit-source-id: 43b261f3456b6bf5fca7b6313e659b259a2ba66d
2019-08-09 09:17:43 -07:00
11cff2981b Add names argument to ones, rand, randn, zeros, full (#23743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23743

In the short term, we implement this by having overloads for each of
these functions. In the long term, the plan is to move DimnameList to
TensorOptions so that we do not have to duplicate work.

Test Plan: - [namedtensor ci]

Differential Revision: D16647820

Pulled By: zou3519

fbshipit-source-id: c6c53c5f26a86b730cbc4d4eb69907ac0e08fc65
2019-08-09 09:17:39 -07:00
5fbe824398 Improve test_namedtensor.py with named tensor equality check (#23801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23801

Test Plan
- Code reading. assertTensorDataAndNamesEqual isn't used in this commit
but it'll be used in future commits.
- [namedtensor ci]

gh-metadata: pytorch pytorch 23801 gh/zou3519/90/head

Test Plan: Imported from OSS

Differential Revision: D16667816

Pulled By: zou3519

fbshipit-source-id: 66519cd5d17bda4c4304a1bc6e2a03ae59d49e39
2019-08-09 09:17:35 -07:00
78f3b883f0 Add name propagation for at::alias, add tensor.set_names (#23624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23624

tensor.set_names(names) is the out-of-place variant of
tensor.set_names_(names). This naming is probably confusing so I am
taking any and all suggestions.

Test Plan:
- run tests [namedtensor ci]

gh-metadata: pytorch pytorch 23624 gh/zou3519/86/head

Differential Revision: D16621830

Pulled By: zou3519

fbshipit-source-id: f8a3837d3a370b41210e938369348dcbb4aee53a
2019-08-09 09:17:31 -07:00
be4e6aff12 Make all at::Tensor in-place methods const (#23945)
Summary:
https://github.com/pytorch/pytorch/issues/23901
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23945

Differential Revision: D16731478

Pulled By: pbelevich

fbshipit-source-id: 076d7aea12995e3d5fb26bc917291e71c2b7ecd4
2019-08-09 09:17:27 -07:00
513c4291c5 Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (#24053)
Summary:
These implicit fallthroughs lead to the following warning on g++ 7, because g++ could not recognize the implicit `abort` call in `LOG(FATAL)`. We suppress by adding explicit `return`s.

    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc: In function void
    caffe2::math::GemmEx(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int
    , int, int, T, const T*, int, const T*, int, T, T*, int, Context*) [with
    T = float; Context = caffe2::CPUContext; Engine = caf
    fe2::DefaultEngine]:
    /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10:
    warning: this statement may fall through [-Wimplicit-fall
    through=]
       ::c10::MessageLogger((char*)__FILE__, __LINE__, n).stream()
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:179:11: note: in
    expansion of macro LOG
               LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B";
               ^
    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:182:5: note: here
         case CblasTrans: {
         ^~~~
    In file included from /home/hong/wsrc/pytorch/c10/util/Logging.h:28:0,
                     from /home/hong/wsrc/pytorch/caffe2/core/logging.h:2,
                     from /home/hong/wsrc/pytorch/caffe2/core/types.h:9,
                     from /home/hong/wsrc/pytorch/caffe2/utils/math.h:17,
                     from
    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:14:
    /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10:
    warning: this statement may fall through [-Wimplicit-fall
    through=]
       ::c10::MessageLogger((char*)__FILE__, __LINE__, n).stream()
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:202:11: note: in
    expansion of macro LOG
               LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B";
               ^
    /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:205:5: note: here
         default:
         ^~~~~~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24053

Differential Revision: D16732530

Pulled By: ezyang

fbshipit-source-id: 90373879f25b52efca5bf151c7ed58d6ad19d925
2019-08-09 09:17:23 -07:00
87508f401c Delete unnecessary file split_types.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23754

Differential Revision: D16732834

Pulled By: ezyang

fbshipit-source-id: 087c573ecde8cd05dd7a28f47939a257e1cc25f3
2019-08-09 09:04:19 -07:00
994f643d9a Do not force USE_SYSTEM_EIGEN_INSTALL to be OFF in Python build scripts (#23990)
Summary:
Not sure whether 34c0043aaee971a0539c8c3c49c4839f67ae001d still makes sense.

`USE_SYSTEM_EIGEN_INSTALL` is OFF by default (as set in CMakeLists.txt). If a user wants to change this build option, I don't see any reason to force them to do it in `CMakeCache.txt`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23990

Differential Revision: D16732569

Pulled By: ezyang

fbshipit-source-id: 4604b4a1d5857552ad02e76aee91641aea48801a
2019-08-09 08:33:48 -07:00
21ea0a115c Revert D16627924: [pytorch][PR] Port addcdiv operator from the TH code to Aten
Differential Revision:
D16627924

Original commit changeset: 960856d30fd3

fbshipit-source-id: a375a3ede5ef956a07fb55c7b4a5d4fc34c96ddb
2019-08-09 08:33:44 -07:00
ce79d5135a Revert D16634539: Enabling inline in quantized relu
Differential Revision:
D16634539

Original commit changeset: 84266f92049c

fbshipit-source-id: 5e1d8e3560483600a61c2ac62b13e9c3fede8301
2019-08-09 08:33:39 -07:00
2e8557778b Refactor randperm test (#23526)
Summary:
CPU and CUDA testing code are largely the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23526

Reviewed By: ezyang

Differential Revision: D16586271

Pulled By: VitalyFedyunin

fbshipit-source-id: 91c70c05789120fde4718ce955de243087a8c993
2019-08-09 08:33:35 -07:00
8659131aa6 Add instruction on how to nest nn::Sequential (#23939)
Summary:
yf225 I mentioned the MNASNet implementaiton in the message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23939

Differential Revision: D16716619

Pulled By: yf225

fbshipit-source-id: a92e4e7a588decce4c5a515370238eb284ae6118
2019-08-09 08:27:17 -07:00
02023d7dba canonicalize_ops pass bugfix: copy metadata for new output (#23809)
Summary:
Without metadata(datatype) for the new output, exporter won't be able to perform implicit scalar datatype casting. This PR covers a large portion of this common issue seen in many exported models, e.g. https://github.com/pytorch/pytorch/issues/23724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23809

Reviewed By: ezyang

Differential Revision: D16707640

Pulled By: bddppq

fbshipit-source-id: 3de985c6b580b9c9ebaec08085c7443bd8d9c7f8
2019-08-09 08:27:13 -07:00
61db8b64ec Build option USE_NUMA should only show up on Linux. (#23673)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23673

Differential Revision: D16627453

Pulled By: vincentqb

fbshipit-source-id: df62f1b26901bec6369b5589b98124165f40e6f1
2019-08-09 08:17:52 -07:00
478c793065 Remove numpy assert that fails on Windows (older numpy versions). (#24012)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24001.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24012

Differential Revision: D16732191

Pulled By: ezyang

fbshipit-source-id: 36660a6635ab64d2f63278b1616deb1282dea037
2019-08-09 07:55:02 -07:00
fb77f14054 Port addcdiv operator from the TH code to Aten (#23683)
Summary:
https://github.com/pytorch/pytorch/issues/22796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23683

Differential Revision: D16627924

Pulled By: ifedan

fbshipit-source-id: 960856d30fd3f79394925eddd0152cc5e27b39b3
2019-08-09 07:44:57 -07:00
9114089d70 port atan2 from TH to ATen (#23558)
Summary:
https://github.com/pytorch/pytorch/issues/22799
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23558

Differential Revision: D16591638

Pulled By: ifedan

fbshipit-source-id: d12d4c8229337a22a3278f0c7a8bbc9a86d4c9b7
2019-08-09 07:44:53 -07:00
9558ccdd76 Enabling inline in quantized relu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23704

Test Plan: Imported from OSS

Differential Revision: D16634539

Pulled By: zafartahirov

fbshipit-source-id: 84266f92049ce4410ec25821b8d4699a9e3f123e
2019-08-09 02:37:12 -07:00
3d23c04a1c serialize all c++ frontend modules to a single CU. (#23645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23645

Previously, every module would get its own CompilationUnit when saving
from the C++ frontend. That's bad because nothing checks that they have
uniquely qualified names or mangles them to make them unique.

This was okay when we were doing model.json, but once we serialize
modules like classes this will cause an error on import (when we try to
re-define the same class a bunch of times.

Test Plan: Imported from OSS

Differential Revision: D16597709

Pulled By: suo

fbshipit-source-id: 0412efd5acfcac26d03f6ed5b5a7dfc023163bc3
2019-08-09 00:52:07 -07:00
61d0624803 [jit[ make sure NameTuples have unique qualified names
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23798

Test Plan: Imported from OSS

Differential Revision: D16652818

Pulled By: suo

fbshipit-source-id: c824f26427105ed5f0c553a67ab61c69a1f89655
2019-08-09 00:52:02 -07:00
3613a30345 Move dict_test.cpp to test folder and fix dict_test.cpp for Aten includes (#24071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24071

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24071

Test Plan: Imported from OSS

Differential Revision: D16728574

Pulled By: wanchaol

fbshipit-source-id: 6952b9703a40dc35f567bf17fbdcef6e0c6c2d6e
2019-08-08 22:41:16 -07:00
e327df3965 SumOp for int32 (#23995)
Summary:
as title, the op can be used to update Length blob values in cuda.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23995

Reviewed By: xianjiec

Differential Revision: D16684065

fbshipit-source-id: da562334c8b61a5e54c3aa78156ce5caff619e60
2019-08-08 21:37:43 -07:00
431d6e2189 minor comment fix (#22140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22140

As title

Reviewed By: protonu

Differential Revision: D15966759

fbshipit-source-id: 15dbf9de60cced29055aeaac3b71c1ff41cfe1d4
2019-08-08 21:08:47 -07:00
29e2b58b00 Back out "[op-bench][experiment] increase predefined_minimum_secs to reduce variation" (#24065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24065

Original commit changeset: d4c034f64b1d

Reviewed By: hl475

Differential Revision: D16726647

fbshipit-source-id: 6cd6cfdad804efb073062809bcbc4c0921a3d007
2019-08-08 18:36:22 -07:00
e80b48390d When matching a line in CMakeCache.txt, ensure A=B and "A"=B are matched (#23745)
Summary:
Currently when reading CMakeCache.txt, only `VAR:TYPE=VAL` can be matched.
This works well for CMake-generated lines, but a user may add a line
without specifying type (`VAR=VAL`), which is totally legitimate in the
eyes of CMake. This improvements in regex ensure that `VAR:TYPE=VAL` is
also matched. The situation of `"VAR":TYPE=VAL` is also corrected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23745

Differential Revision: D16726514

Pulled By: ezyang

fbshipit-source-id: 6c50150d58926563837cf77d156c24d644666ef0
2019-08-08 18:07:28 -07:00
03a40b2bc0 print clang tidy output to stderr (#24052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24052

This will make things show up in the azure pipelines output

Test Plan: Imported from OSS

Differential Revision: D16723846

Pulled By: suo

fbshipit-source-id: d78cbf476be74ccfb28d6e1b21d66b6641d36e26
2019-08-08 17:42:24 -07:00
48c6e5c05a Updating submodules
Reviewed By: yns88

fbshipit-source-id: b69a630eae0260c33a1cd3581015a084a83aa649
2019-08-08 17:02:49 -07:00
7d8dfd6f76 make _overloads importable in nn/functional (#24049)
Summary:
Move `_overload` to `_jit_internal.py` so that it can be imported in nn/functional.py for `conv`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24049

Differential Revision: D16723339

Pulled By: eellison

fbshipit-source-id: 527e6069dbfa81f8133c405be5350a8c76873a12
2019-08-08 16:57:50 -07:00
6dc555cbe6 support tensor as key type in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23638

Differential Revision: D16683557

fbshipit-source-id: 6443acc6772d58cd9082a10e1c2b095d85c9a23e
2019-08-08 16:48:12 -07:00
bdf15311a3 Migration doc fixes (#24033)
Summary:
This time I built the docs to make sure everything looks right
](https://our.intern.facebook.com/intern/diff/16719435/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24033

Pulled By: driazati

Differential Revision: D16719435

fbshipit-source-id: 290c6431e7577ef9fbd595d9ac206df867366937
2019-08-08 16:32:45 -07:00
32efb43129 Don't add local version to Conda packages. (#24014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24014

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16714081

Pulled By: ezyang

fbshipit-source-id: d346fbe8a54d5c182f81d2b908b1cdf191e3d822
2019-08-08 13:26:46 -07:00
4ccb707161 Removing deprecated warning message from torch.h (#24002)
Summary:
discussed [here](https://github.com/pytorch/vision/issues/1173)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24002

Differential Revision: D16710635

Pulled By: yf225

fbshipit-source-id: 95117dd601061691d4cfd0d644777825aeaeaf8c
2019-08-08 12:49:48 -07:00
5b9f55f33f Enable Add, sub, mul, and div on CPU for bfloat16 type. (#22851)
Summary:
Enable Add, sub, mul, and div on CPU for bfloat16 type.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22851

Differential Revision: D16256757

Pulled By: izdeby

fbshipit-source-id: 8b62f7581fc0ca0d2cff48ab40d877a9fcf70a5b
2019-08-08 12:34:25 -07:00
341d5934b7 Move addcmul to Aten(CUDA) (#23814)
Summary:
https://github.com/pytorch/pytorch/issues/22797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23814

Differential Revision: D16712381

Pulled By: ifedan

fbshipit-source-id: aeca4fdb9b10143932f195900b1f424ef6d26c89
2019-08-08 12:34:21 -07:00
3ad940742e save()/load() tests and fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23911

Test Plan: Imported from OSS

Differential Revision: D16698044

Pulled By: jamesr66a

fbshipit-source-id: 88881ea183331aa6e4c8fa042d11cf2b14e0fc4c
2019-08-08 12:06:22 -07:00
a35d2902ef jit.script() testing and fixes (#23891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23891

This adds an initial set of testing coverage for quantization that checks if the modules can be scripted. Testing for tracing and serialization is forthcoming

Test Plan: Imported from OSS

Differential Revision: D16698045

Pulled By: jamesr66a

fbshipit-source-id: 96d80d938b816220af72359165a7b96d998a30c9
2019-08-08 12:06:18 -07:00
7d207363bf Fix master - (#24003)
Summary:
I accidentally removed this in a merge, breaking a test. Fix for master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24003

Differential Revision: D16707108

Pulled By: eellison

fbshipit-source-id: 8b59f46e7932b88a7ae246a261c4daf17f23995f
2019-08-08 00:00:53 -07:00
02d3c302d8 Fix build failure on OSX (#23998)
Summary:
https://github.com/pytorch/pytorch/pull/23228 caused build failure on OSX, because rpc.h is included as long as USE_DISTRIBUTED=1, but rpc/init.cpp (and others) is only included when NOT APPLE. So, it cannot find python_functions defined in init.cpp on MacOS. This PR attempt to fix it by wrapping rpc.h with USE_C10D, which is only set when NOT APPLE.

I tried this fix locally and it works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23998

Differential Revision: D16706087

Pulled By: mrshenli

fbshipit-source-id: d04fe6717a181a3198289cdef51439708c2e291d
2019-08-07 22:05:41 -07:00
ad64789a1e add aligned option to RoIAlign
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23706

Reviewed By: ppwwyyxx

Differential Revision: D16615823

fbshipit-source-id: fd9152af8bc979cb04044413e66af349b032a99d
2019-08-07 21:22:33 -07:00
15d3f0242b support Gather different indices for different examples in one batch (#23813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23813

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23285

for example:

Inputs:
  data:
   [[[2 4 2 0],
     [0 1 2 0],
     [1 1 0 0]],
    [[3 4 1 3],
     [0 3 2 2],
     [4 1 0 4]]]

  idx:
    [[0 2],
     [0 1]]

outputs:
  [[[2 4 2 0],
    [1 1 0 0]],
   [[3 4 1 3],
    [0 3 2 2]]]

data and idx must have the same outer dimension

call Gather or BatchGather with argument match_outer=True

Reviewed By: huayuli00

Differential Revision: D16652485

fbshipit-source-id: 9e144e97a8d6fceaf3b5714df1534338068f4a10
2019-08-07 21:14:30 -07:00
451fc51d8d add support for overloading functions (#23886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23886

This is a series of PRs that will allow us to support adding [padding to conv](https://github.com/pytorch/pytorch/pull/22484) and also reduce the friction of adding method overloads that was brought up in  https://github.com/pytorch/pytorch/pull/23266.

Support for overloaded functions following the specification in [PEP 484](https://www.python.org/dev/peps/pep-0484/#function-method-overloading).

The usage is:
```
torch.jit.overload
def add(x: int, y: int) -> int: ...
torch.jit.overload
def add(x: float, y: float) -> float: ...

def add:
    return x + y
```

Follow up PRs:

- Add same API for methods
- A couple of cleanups for functions:
     - don't require default params specified on the overload as well
     - potentially error if invocation could be matched to multiple overloads. now it just chooses the first one, mypy does the same thing currently

Test Plan: Imported from OSS

Differential Revision: D16694863

Pulled By: eellison

fbshipit-source-id: f94f2933bc1c97fa58f31846acfe962b0630068c
2019-08-07 19:18:19 -07:00
9ecc33d6f2 metacompile isinstance checks (#23885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23885

This is a series of PRs that will allow us to support adding [padding to conv](https://github.com/pytorch/pytorch/pull/22484) and also reduce the friction of adding method overloads that was brought up in  https://github.com/pytorch/pytorch/pull/23266.

This PR only compiles one if branch if the condition is an isinstance check. This is consistent with what mypy does; it does not report errors if a branch can be determined statically to be unreachable.

```
def foo(x):
    # type: (int) -> int
    if isinstance(x, str):
        return x["1"]
    return x + 1

reveal_type(foo) # no error, shows int -> int
```

Test Plan: Imported from OSS

Differential Revision: D16697092

Pulled By: eellison

fbshipit-source-id: d3eb4925cd16d551515ac6ff620a69897dbec130
2019-08-07 19:18:15 -07:00
33a1c30cb1 cleanup torch/nn/functional.py (#23977)
Summary:
Cleanup torch/nn/functional now that JIT:
- Handles multiple returns
- Typechecks exits (exceptions)
- assertions refine types
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23977

Differential Revision: D16697750

Pulled By: eellison

fbshipit-source-id: 1f777d6b9ead1105de50120fffd46d523e1e6797
2019-08-07 16:31:36 -07:00
b8b86de89b Adds torch.random to docs/toc (#23553)
Summary:
Fix for   https://github.com/pytorch/pytorch.github.io/issues/162
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23553

Differential Revision: D16700003

Pulled By: soumith

fbshipit-source-id: 0d988985fee9aeadd01f9caba24987f960ce2470
2019-08-07 16:31:32 -07:00
1a9334ea59 Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. (#23568)
Summary:
This fixes build regression caused by https://github.com/pytorch/pytorch/issues/23528 because we used to let CXXFLAGS equal CFLAGS.

cc suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23568

Differential Revision: D16568820

Pulled By: suo

fbshipit-source-id: 64a0dc923c08ac1751224f42bc4ccdc707341762
2019-08-07 16:25:57 -07:00
c74216d396 add NotIn support in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23637

Test Plan: Imported from OSS

Differential Revision: D16683558

Pulled By: wanchaol

fbshipit-source-id: 27d79850d76506255ba954601fae751e07ad7cd1
2019-08-07 16:07:21 -07:00
e23e4cc356 Back out "Revert D16469619: Add Virtual Memory and CPU percentage computation to AIBench"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23821

Reviewed By: hl475

Differential Revision: D16654854

fbshipit-source-id: f057023e890cbcbd9145ef2ecb449df2fbba592b
2019-08-07 15:44:22 -07:00
e90adf59a0 Make assertions refine types (#23949)
Summary:
Make assertions like `x is not None` refine the type of x. This is easy to do now that typing understands [exits](https://github.com/pytorch/pytorch/pull/23565).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23949

Differential Revision: D16692772

Pulled By: eellison

fbshipit-source-id: 540f28e65a784c72c7c555e0aed0765d5035bc37
2019-08-07 13:06:52 -07:00
0f5d071d52 Add python_requires to help pip (#23863)
Summary:
`python_requires` helps the installer choose the correct version of this package for the user's running Python.

This is especially necessary when dropping Python 2 (https://github.com/pytorch/pytorch/issues/23795) but is useful now too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23863

Differential Revision: D16692908

Pulled By: soumith

fbshipit-source-id: 3c9ba2eb1d1cf12763d6284daa4f18f605abb373
2019-08-07 12:47:53 -07:00
9d1acd6dc2 Disable optimizer for __setstate__ (#23698)
Summary:
Before calling `__setstate__` when loading a module, we need to disable
the optimizer since the module's type does not match the values on the
stack (all the tensors will be `UndefinedTensor`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23698

Pulled By: driazati

Differential Revision: D16690935

fbshipit-source-id: 71e2238fd25cd16271af478ef21a3cf4e514a462
2019-08-07 12:37:24 -07:00
323aad6b20 No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (#23806)
Summary:
Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between
{INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All
we need to do is to pass down INSTALL_TEST.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806

Differential Revision: D16691833

Pulled By: soumith

fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61
2019-08-07 11:34:31 -07:00
5df0cf3fb4 clang-format aten/src/ATen/native/quantized (#23898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23898

These files were not following the clang-format style and as a result, some files (such as TensorFactories.cpp) were extremely hard to read and modify.

Test Plan: Imported from OSS

Differential Revision: D16684724

Pulled By: jamesr66a

fbshipit-source-id: 0600c6dddc778481af5bef798e77072fb7e988aa
2019-08-07 11:04:25 -07:00
5411d1a27b Fix docstring for argmax (#23775)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23775

Differential Revision: D16644692

Pulled By: soumith

fbshipit-source-id: d759bb85f73383021e4657325dbac79913042ad2
2019-08-07 09:42:19 -07:00
10b1254edd fix crash on torch.Tensor.repeat() for 0 repeats (#23766)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/23603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23766

Differential Revision: D16644866

Pulled By: soumith

fbshipit-source-id: ee7d368afdfe874133d0bd90f4d03a191ee22b13
2019-08-07 09:16:00 -07:00
ed19580dc4 Fix dataloader._shutdown_workers if not all workers are started (#23761)
Summary:
Otherwise you may see errors like
```
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x000001F99F5CB9D8>
Traceback (most recent call last):
  File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 883, in __del__
    self._shutdown_workers()
  File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 860, in _shutdown_workers
    if self.workers_status[worker_id]:
IndexError: list index out of range
```

e.g. https://discuss.pytorch.org/t/how-to-construct-dataset-with-iterator-for-multi-process-dataloader/49612/5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23761

Differential Revision: D16644687

Pulled By: soumith

fbshipit-source-id: a60e847431264525079456ff422317af1ac2be4b
2019-08-07 09:06:11 -07:00
ed4ee093cb Make typing understand exceptions (#23565)
Summary:
When we're emitting an if node, if one branch exits allow variables in the other branch to escape scope. This is using the same machinery that already exists for early returns so there are minimal changes to the compiler. Most of the changes are in the exit_transform pass so we don't create terrible graphs when exceptions exist. In a follow up PR i will add a writeup of the transform pass to docs since this should be the last change made to it for a while.

This will allow assertions to refine Optional types, as well as allow JIT to understand things like:
```
def foo(x):
    if x == 1:
        raise Exception()
    else:
        a = 1
    return a
```

If you look in nn/functional.py, like 3/4 of the TODOs are this issue. One note is that if a function always throws, I accepted whatever the annotation for the return type is if it exists and otherwise set it to None. This is consistent with what mypy does.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23565

Differential Revision: D16679572

Pulled By: eellison

fbshipit-source-id: e58c9e9ddaeb13144c803d90e2beae253c851f7f
2019-08-07 09:06:07 -07:00
2635b6262e Remove K and N function arguments for fbgemm_pack_quantized_matrix (#22956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22956

As Title says: remove the extra function arguments for better engineering.

Differential Revision: D16297724

fbshipit-source-id: a31be17708d13508c4ce9a3ce7eb5238e8d17984
2019-08-07 08:50:13 -07:00
8e9f9b424f Replace descriptions of args in doc with template (#23439)
Summary:
Many descriptions of arguments could be replaced by items in the template such as `factory_common_args`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23439

Differential Revision: D16688527

Pulled By: ezyang

fbshipit-source-id: 406ce45d72e297f46b5fa9ea5472b3284c8d4324
2019-08-07 08:50:09 -07:00
a1d945b295 Roll master to 1.3.0 (#23895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23895

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16688489

Pulled By: ezyang

fbshipit-source-id: a56d0180a0bc57775badd9e31ea3d441d5fd4f88
2019-08-07 08:44:32 -07:00
fc36842554 Improve hip-clang support in build_amd.py (#23835)
Summary:
Use the supported way to differentiate and automatically switch between hip-clang and hcc hipification in build_amd.py.

Cleaned up from PR https://github.com/pytorch/pytorch/issues/23699
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23835

Differential Revision: D16659661

Pulled By: vincentqb

fbshipit-source-id: 05a4250ceb28beda7a7bf73a46c5dc46f6e852bc
2019-08-07 07:49:07 -07:00
13a684d50b Fix test TestCuda.test_streams_multi_gpu_query (#23912)
Summary:
This is a similar issue as TestCuda.test_events_wait.

PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
    Whenever a test is run, assertLeaksNoCudaTensors is called,
    which in turn calls CudaMemoryLeakCheck, which in turn calls
    initialize_cuda_context_rng, where it executes torch.randn
    on each device, where a kernel is launched on each device.

    Since the kernel may not finish on device 0, the first assertion
    self.assertTrue(s0.query()) fails.

    The fix is to insert

            torch.cuda.synchronize(d0)
            torch.cuda.synchronize(d1)

    at the beginning of the test so that previously launched kernels finish before the real
    test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23912

Differential Revision: D16688599

Pulled By: ezyang

fbshipit-source-id: 3de2b555e99f5bbd05727835b9d7c93a026a0519
2019-08-07 07:44:30 -07:00
fc82ec298b Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23480.

I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here.

Script:
```
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2)
for i in range(9):
    print(i)
    print(scheduler.get_lr())
    scheduler.step()
```
Output:
```
0
[0.5]
1
[0.5]
2
[0.25]
3
[0.5]
4
[0.42677669529663687]
5
[0.25]
6
[0.07322330470336313]
7
[0.5]
8
[0.4809698831278217]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833

Differential Revision: D16657251

Pulled By: gchanan

fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9
2019-08-07 07:15:50 -07:00
78cc9b92a5 Change fbgemm_linear_{int8,fp16}_weight to fbgemm_linear_{int8,fp16}_weight_fp32_activation (#22955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22955

Following the comment in https://github.com/pytorch/pytorch/pull/22891, change the fbgemm wrapper function name to indicate whether it is dynamic quantization or static quantization.

Differential Revision: D16297512

fbshipit-source-id: 498678e2af27070628be11a6d724ce17c2a3cde5
2019-08-06 23:19:26 -07:00
002d4f9f7d Erase shape information from class types (#23362)
Summary:
](https://our.intern.facebook.com/intern/diff/16681944/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23362

Pulled By: driazati

Differential Revision: D16681944

fbshipit-source-id: dba46b6fc3223a2f94dc502531df438f3212d8fb
2019-08-06 22:30:25 -07:00
b0a27278bd Recursive script migration guide
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23892

Pulled By: driazati

Differential Revision: D16677532

fbshipit-source-id: 40f506b1c770e60309c0628d4745047996a05295
2019-08-06 21:43:28 -07:00
8b349073ce sync and async torch.distributed.rpc for builtin operators (#23228)
Summary:
Features:

* sync and async RPC for builtin operators
* RpcAgent API
* ProcessGroupAgent implementation

Goal:

* have a minimum working and testable RPC implementation
* make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation
  * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object.
  * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...).
* support blocking and non-blocking RequestCallback
  * blocking means the callback won't return before sending out the response
  * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list.

We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly.

https://fb.quip.com/FabTAZKVgQpf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23228
ghstack-source-id: 87816717

Reviewed By: zhaojuanmao

Differential Revision: D15194693

fbshipit-source-id: 7adb600796613cde6073db6c227451b89940ecaf
2019-08-06 16:03:01 -07:00
c07fc96b94 Set caffe2_tvm_min_ops to 8 (#23893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23893

Set `caffe2_tvm_min_ops` to 8 for production and tests.

Reviewed By: yinghai

Differential Revision: D16659420

fbshipit-source-id: ef33b37e2a5128e502a6b8df306914a409f13c2d
2019-08-06 14:48:45 -07:00
ddc25efc80 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 627d1403a7caf833f63c93dc976e83f10d384925
2019-08-06 12:37:23 -07:00
68318404f4 Rename cpu-only to cpuonly, as dash features are not supported. (#23879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23879

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16670379

Pulled By: ezyang

fbshipit-source-id: c498f8362760bdf8526c59043db3276f99e3ccc1
2019-08-06 12:32:16 -07:00
40f0b1c844 Enable OSS quantization tests (#23858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23858

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23718

Changes:

- Enable tests for quantization test files in `run_tests.py`
- Remove `__future__` imports from `torch/nn/qat/modules/__init__.py`, since `unicode_literals` messes up imports on python2 because the elements in `__all__` will be Unicode and not string
- Skip PostTrainingQuantTests if the build doesn't have FBGEMM (only a small subset of targets in tests) or if testing under UBSAN (the suppression file doesn't seem to work)

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D16639467

Pulled By: jamesr66a

fbshipit-source-id: 532766797c216976dd7e07d751f768ff8e0fc207
2019-08-06 11:20:30 -07:00
6ba60ec9b0 Add flag to temporarily disable MKL-DNN conv (#23837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23837

This is a temporary workaround to an issue in MKL-DNN's Convolution backwards implementation: https://github.com/pytorch/pytorch/issues/23825

It is only used to enable testing quantization

Test Plan: Imported from OSS

Differential Revision: D16659081

Pulled By: jamesr66a

fbshipit-source-id: de18ebe98dec2a042f28b23373e20da2b44a42a2
2019-08-06 11:20:26 -07:00
9588cd921e weight_names bug fix (#23848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23848

Problem:
In experiment running feed model 127607201 (/mnt/public/tracelog/feed_repro2/127607201_0.predictor), encountered blob dimensionality mismatch error when running onnxified net. This is due to the model initializing input blobs in current workspace with blob size 0, and onnxifi() falsely identified those input blobs as weight blobs and assigned wrong dimension.

Solution:
Add option to pass correct weight blob names to onnxifi() instead of using all blobs in current workspace.

Reviewed By: yinghai

Differential Revision: D16661396

fbshipit-source-id: cabe44db6b64e6538bef4b65e380312214b3ba9f
2019-08-06 10:58:43 -07:00
d413f2d335 format init.cpp (#23840)
Summary:
formatting in advance of pr that touches this file bc there is a lot of formatting noise :'(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23840

Differential Revision: D16659311

Pulled By: eellison

fbshipit-source-id: 7dedaccf9b9c455f97efdcce1c58515eb155d261
2019-08-06 10:38:30 -07:00
43c4bcba1d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 71bb684dc1f35dfc82c52a049092e63f449468b1
2019-08-06 10:09:55 -07:00
52be1448e8 Docs: Delete placeholder to use top-level file (#23869)
Summary:
Replaces and closes https://github.com/pytorch/pytorch/issues/23864.

When opening a pull request, GitHub shows you this:

![image](https://user-images.githubusercontent.com/1324225/62534181-30142880-b851-11e9-9b39-32d0ed6ff26c.png)

Or this:

![image](https://user-images.githubusercontent.com/1324225/62534569-24753180-b852-11e9-8242-8905ddda1f6f.png)

However, that links to https://github.com/pytorch/pytorch/blob/master/.github/CONTRIBUTING.md which looks like:

![image](https://user-images.githubusercontent.com/1324225/62534607-3656d480-b852-11e9-8c8c-37f54e8ca774.png)

As the commit message shows, that was a placeholder. There's already a real `CONTRIBUTING.md` document, so *delete the placeholder*.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23869

Differential Revision: D16667966

Pulled By: ezyang

fbshipit-source-id: c4135ebbb75de803ef227e4608e16da1a2e83a0c
2019-08-06 08:14:10 -07:00
d58059bc6f Fix SliceGradientOp to handle properly empty batches (#23784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23784

Backward path does nothing during the gradient path when the input as empty, as
a result workspace can preserve gradient values from previous iteration and get
inconsistent inputs for some of the backward pass operators. This diff should
fix this disrepancy by always reinitializing output during the backward path.

Reviewed By: dzhulgakov

Differential Revision: D16646096

fbshipit-source-id: 8ca68dfad17a63fc87c033cce7b36b40bd77245c
2019-08-06 02:43:32 -07:00
c002ede107 Delete Travis CI config (#23788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23788

We be using Azure Pipelines now, matey!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16648527

Pulled By: ezyang

fbshipit-source-id: d05326c4971fd392868f2a70aa0a9be9c7280f86
2019-08-05 20:04:22 -07:00
489cc46686 Define toIValue conversion for dtype (#23708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23708

Resolves https://github.com/pytorch/pytorch/issues/23631

We always treat dtypes as number types, and we have the conversion logic of dtype->int64_t present in toSugaredValue. So if a dtype appears in a statement being compiled, it's properly converted to its long ScalarType equivalent. However, this logic was missing in `toIValue`, thus making taking dtypes as attributes broken

Test Plan: Imported from OSS

Differential Revision: D16617222

Pulled By: jamesr66a

fbshipit-source-id: 4b10e5795f9c142c2fd9fa1b5d60f6374f5218e0
2019-08-05 19:32:42 -07:00
e8cf9b686b Rename previously THNN conv kernels to have naive_ prefix.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23790

Test Plan: Imported from OSS

Differential Revision: D16650364

Pulled By: ezyang

fbshipit-source-id: 31d72107915cf03e19a746f31ee45fdb2b056101
2019-08-05 18:59:46 -07:00
60a4ef3074 Remove nightly suffix from nightlies; upload to pytorch-nightly. (#23752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23752

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16657471

Pulled By: ezyang

fbshipit-source-id: 4d8fcde1d10d4b078c76c643adb6d4a4fc1259c6
2019-08-05 18:49:42 -07:00
e2f5bc5c08 Properly mangle nn.Module.__construct (#23779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23779

Mangling is two underscores, not one :(. We want this method to be
private so that inheritors who define a `__construct` do not interfere
with Module initialization

Test Plan: Imported from OSS

Differential Revision: D16645156

Pulled By: suo

fbshipit-source-id: b9060cb35bfaa0391ff200b63fb78b1ac15fee39
2019-08-05 17:58:34 -07:00
8fc349f7be fix some compiler warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23816

Test Plan: Imported from OSS

Differential Revision: D16654126

Pulled By: suo

fbshipit-source-id: addf3d24df514a17a521f8584cd5e142c8a3aec4
2019-08-05 17:52:56 -07:00
51d59a43ba fix torch.frac documentation (#23830)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/13968 .

Following the math formula in wiki: https://en.wikipedia.org/wiki/Fractional_part
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23830

Differential Revision: D16656871

Pulled By: ailzhang

fbshipit-source-id: a71467870cf9566e0c7b1a045f72607dada81e1f
2019-08-05 17:43:17 -07:00
8fb0d198e9 make nn.LSTM accept PackedSequence instead of Tuples
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23643

Differential Revision: D16615531

fbshipit-source-id: af508838cac21d271d3470f0f16fd75473a6e68d
2019-08-05 17:16:18 -07:00
a15845555c Negate halves on GPU using __hneg() when possible, instead of using float conversion.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23626

Test Plan: Imported from OSS

Differential Revision: D16656730

Pulled By: ezyang

fbshipit-source-id: 7e1f4e334f484a3ed4392949ff7679cefd67a74e
2019-08-05 16:21:38 -07:00
f90afff3bd Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23621

Test Plan: Imported from OSS

Differential Revision: D16656729

Pulled By: ezyang

fbshipit-source-id: d107e8caa2ccfa6ff8a1bd8a31b4d79f142d68fb
2019-08-05 16:21:34 -07:00
f278aee731 Std opset export (#22310)
Summary:
Added export for std (standard deviation) op, plus onnxruntime, caffe2 and expect tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22310

Differential Revision: D16109889

Pulled By: bddppq

fbshipit-source-id: 067b2d385d463877bb99f673a18da4e5ea823426
2019-08-05 15:55:42 -07:00
a710f81639 Add CUDA 10.1 to CI. (#23791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23791

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16657447

Pulled By: ezyang

fbshipit-source-id: a4a5f5abef72146a52a76cfab629f8c105949bb3
2019-08-05 15:55:39 -07:00
0015b188be Fix typos
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23770

Differential Revision: D16646852

Pulled By: ezyang

fbshipit-source-id: 826b041c0b528ae6e0b320d49d8141057c1f9bf3
2019-08-05 15:38:32 -07:00
44ba092e5b Remove unnecessary fetch and reset on builder checkout. (#23792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23792

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16648539

Pulled By: ezyang

fbshipit-source-id: f713fca6d428c03ed31aad18464c92265fb81420
2019-08-05 15:32:59 -07:00
4050de5b58 Revert D16627326: [pytorch][PR] [ROCm] Improve hip-clang support in build_amd.py
Differential Revision:
D16627326

Original commit changeset: 977003174395

fbshipit-source-id: d26959c85d74ce8b81341a31c9ddb2260bf18c9b
2019-08-05 15:04:47 -07:00
ab15d38497 Adam implementation minor fix (#23737)
Summary:
This PR is in accordance with https://github.com/pytorch/pytorch/issues/22628
I had submitted the PR for `adam.py` and `adamw.py` but had forgotten about the `adam.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23737

Differential Revision: D16623828

Pulled By: vincentqb

fbshipit-source-id: 4390fd751d1c0cd12f32214b4234d42a06dcbb20
2019-08-05 14:59:07 -07:00
8e9fef61f4 Revert D15996322: Open up AliasAnalysisKind for any ops
Differential Revision:
D15996322

Original commit changeset: df27ed95397b

fbshipit-source-id: 3327a3b56d8d1ea2cf0ea998f39ef254c47d5f3f
2019-08-05 14:54:27 -07:00
f0a581801a Improve hip-clang support in build_amd.py (#23699)
Summary:
Use the supported way to differentiate and automatically switch between hip-clang and hcc hipification in build_amd.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23699

Differential Revision: D16627326

Pulled By: vincentqb

fbshipit-source-id: 977003174395fb69cf0c96c89232bd6214780cd8
2019-08-05 13:39:28 -07:00
3ad9dbf9d5 Open up AliasAnalysisKind for any ops (#23810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23810

A previous diff removed the special casing for aten:: and prim:: ops in alias analysis and implements alias analysis purely
based on the AliasAnalysisKind. To be sure it doesn't break our existing code base, it added asserts that make sure that
our existing aten:: and prim:: ops set the correct AliasAnalysisKind.

However, we don't need that restriction for future ops. Since we are now certain all existing cases are set up correctly,
we can remove these assertions.
ghstack-source-id: 87733626

Differential Revision: D15996322

fbshipit-source-id: df27ed95397bbe58a76b6b2c2e9808fcfde35294
2019-08-05 13:18:12 -07:00
1aa4afde80 Document bool tensors for bitwise_not. (#23800)
Summary:
Requested by vadimkantorov at https://github.com/pytorch/pytorch/pull/23621#issuecomment-517945167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23800

Differential Revision: D16651008

Pulled By: gchanan

fbshipit-source-id: 4ce21158bd5dd142edcd951e7ac941521b3d54af
2019-08-05 12:11:45 -07:00
6e4a83ab57 Channels last stored in tensor (#23391)
Summary:
Define 4D tensor as stored in channels last memory format, when dimensions order is NCHW and C-strides < W-strides < H-strides < N-strides (If size of any dimension is equal to 1, this dimension strides value is not taken into account).

Channels last contiguous tensor is channel last tensor which occupies contiguous memory block. So x.is_contiguous(memory_format=torch.channels_last) checks if tensor is channels last contiguous.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23391

Differential Revision: D16601414

Pulled By: VitalyFedyunin

fbshipit-source-id: 8d098e7eec2f00fb1d12261bc240b3645d4f5b73
2019-08-05 11:50:29 -07:00
a3c165f9d2 Revert D16452539: support Gather different indices for different examples in one batch
Differential Revision:
D16452539

Original commit changeset: 7229489f4a9c

fbshipit-source-id: 010c177e551cb81521d2af84ce951bf964cdab44
2019-08-05 10:22:01 -07:00
dfd8a08f51 frobenius_norm onnx export added
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23536

Differential Revision: D16566154

Pulled By: bddppq

fbshipit-source-id: 6d076274d1d780e7d39d17ddb35ceabe55b394a3
2019-08-05 10:13:00 -07:00
7d9e69e62e allow INSTALL_TEST to pass through from env to cmake (#23793)
Summary:
This allows `INSTALL_*` to pass through to cmake.
Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793

Differential Revision: D16648668

Pulled By: soumith

fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859
2019-08-05 09:55:14 -07:00
fb06c9e61f qconv operator level benchmark (#22895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22895

Adding op level benchmarking for qconv operator

Reviewed By: mingzhe09088

Differential Revision: D16274273

fbshipit-source-id: 6674753e38f6692f5e6d0db0cac90c5fbf358147
2019-08-05 09:39:16 -07:00
be7fe1ccb9 Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (#23701)
Summary:
As pointed out by colesbury in https://github.com/pytorch/pytorch/pull/23579#discussion_r309798987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23701

Differential Revision: D16623781

Pulled By: mrshenli

fbshipit-source-id: f48a29499128b08d2ac8bc9e466f2326112ead94
2019-08-05 07:50:06 -07:00
19c675178f Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261
ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2

Test Plan: Imported from OSS

Differential Revision: D16005990

Pulled By: izdeby

fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85
2019-08-05 07:42:34 -07:00
520982d1df Zero sized tensor support for repeat_interleave (#23717)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23717

Differential Revision: D16623598

Pulled By: mrshenli

fbshipit-source-id: 297a3274fb5a5b2fcc0c3ad601337d7eb29fdca2
2019-08-05 07:36:47 -07:00
f87a4cc23f support Gather different indices for different examples in one batch (#23285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23285

for example:

Inputs:
  data:
   [[[2 4 2 0],
     [0 1 2 0],
     [1 1 0 0]],
    [[3 4 1 3],
     [0 3 2 2],
     [4 1 0 4]]]

  idx:
    [[0 2],
     [0 1]]

outputs:
  [[[2 4 2 0],
    [1 1 0 0]],
   [[3 4 1 3],
    [0 3 2 2]]]

data and idx must have the same outer dimension

call Gather or BatchGather with argument match_outer=True

Reviewed By: huayuli00

Differential Revision: D16452539

fbshipit-source-id: 7229489f4a9c02ee9f3c6a8a24bcd02925d96e07
2019-08-04 21:17:49 -07:00
18d0873b7a cpu binary builds are built with cu100 docker image now instead of cu80 (#23772)
Summary:
cpu binary builds are built with cu100 docker image now instead of cu80
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23772

Differential Revision: D16644224

Pulled By: soumith

fbshipit-source-id: 5af09aba149c13fadbd4146172e7da038f2f4261
2019-08-04 18:42:52 -07:00
6313d5e28b add appropriate install_requires (#23722)
Summary:
This adds:
- dependency on numpy if compiled with numpy support
- dependency on future if python <= 2.7

Fixes https://github.com/pytorch/pytorch/issues/23670
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23722

Differential Revision: D16643824

Pulled By: soumith

fbshipit-source-id: 5cf4d79cd188678cb2328c4286eabd52a2a86fcd
2019-08-04 17:24:19 -07:00
1b1bddaab3 Revert D16469619: Add Virtual Memory and CPU percentage computation to AIBench
Differential Revision:
D16469619

Original commit changeset: 670f3549c830

fbshipit-source-id: f55d4cda36f5e29df2df306d33a70158e5a7908b
2019-08-04 16:06:51 -07:00
cbf05305c0 don't try to set training after ScriptModule has been initialized. (#23680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23680

Now when initializing a ScriptModule during the torch.jit.load()
process, there is already a cpp module backing the thing. That means
that setting training will overwrite whatever the initialized
ScriptModule had.

This PR splits apart the common "set up internal state" part of the
Module __init__ and calls that from ScriptModule.__init__ and
Module.__init__, leaving the "nn.Module-specific" part (setting
`self.training`) for the nn.Module __init__

Test Plan: Imported from OSS

Differential Revision: D16606959

Pulled By: suo

fbshipit-source-id: f7ea6b36551ff4e4472b7685f65731d5cfab87fd
2019-08-04 15:04:55 -07:00
31137738de Support for non-zero zero_points for weight and activation (#23541)
Summary:
We can now have any valid zero points for weight and activation for conv2d kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23541

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qconv\ \(test_quantized.TestQuantizedConv\)'  --print-passing-details
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699723897843
      ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 68.528 1/1 (passed)
Test output:
> test_qconv (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 68.529s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3377699723897843
Summary (total time 74.97s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16556515

Pulled By: dskhudia

fbshipit-source-id: 6e2ee9ddc58f9dc8a3f8b25918bb7955f0655073
2019-08-04 11:05:25 -07:00
445440a6a9 Add Virtual Memory and CPU percentage computation to AIBench (#23590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23590

This diff adds CPU% and Virtual Memory computation by default to AIBench when doing mobile remote run

Reviewed By: llyfacebook

Differential Revision: D16469619

fbshipit-source-id: 670f3549c830a36bc456a57f2ea668f9f82dd15a
2019-08-04 09:29:44 -07:00
7f130c8494 Expose the quantized inputs and output of dynamic quantized int8 FC operator for debugging (#23566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23566

Currently if we use dynamic quantization we don't have the access to the internally quantized inputs and output for debugging.

To make the debugging easier, this diff adds a debug feature to expose the quantized X, W and Y for debugging if debug outputs are attached to the operator and caffe2_dnnlowp_force_slow_path flag is set.

The quantized inputs and output are exposed as the extra outputs.

The example Int8FC op with debug outputs appended looks like:
```
op {
  input: "X"
  input: "W"
  input: "b"
  output: "Y"
  output: "X_q"
  output: "W_q"
  output: "Y_q"
  name: ""
  type: "Int8FC"
  arg {
    name: "axis"
    i: 1
  }
  ...
}
```

Next need to expose the quantization parameters.

Reviewed By: jspark1105

Differential Revision: D16566753

fbshipit-source-id: acd855a172ee7993ddba8808f2af81b628ff9c02
2019-08-02 21:23:43 -07:00
5faecc8b1f Perform string uniquing by value in pickle serialization. (#23741)
Summary:
On my testcase, this reduces the uncompressed size of TorchScript
debug info from 281KB to 76KB.  With zip compression enabled, this
saves about 2.5KB of final size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23741

Differential Revision: D16624128

fbshipit-source-id: ce45659d6b20d40608ace05639b69b93696b00d9
2019-08-02 21:12:38 -07:00
8e2b9de860 Document empty_strided (#23735)
Summary:
Changelog:
- Add doc string for torch.empty_strided
- Remove empty file named `python` in test/

Fixes https://github.com/pytorch/pytorch/issues/23688
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23735

Differential Revision: D16623438

Pulled By: ailzhang

fbshipit-source-id: acd5a47da9220243467ccc6bff92edd209cca709
2019-08-02 20:02:44 -07:00
f81db8afb8 Initial torchbind prototype (#21098)
Summary:
I have some test code in there as well, along with a script "test_libtorch" to run it. You'll need to modify `test_libtorch` to point to where you have `pytorch` built. I currently require that `pybind11` is included as a subdirectory of the test, but added it to the `.gitignore` to make this reviewable.

Currently, something like this works:
```cpp
struct Foo {
  int x, y;
  Foo(): x(2), y(5){}
  Foo(int x_, int y_) : x(x_), y(y_) {}
  void display() {
    cout<<"x: "<<x<<' '<<"y: "<<y<<endl;
  }
  int64_t add(int64_t z) {
    return (x+y)*z;
  }
};
static auto test = torch::jit::class_<Foo>("Foo")
                    .def(torch::jit::init<int64_t, int64_t>())
                    .def("display", &Foo::display)
                    .def("add", &Foo::add)
                    .def("combine", &Foo::combine);

```
with
```py
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    val.display()
    print(val.add(3))
```
results in
```
x: 5 y: 3
24
```

Current issues:
- [x] The python class created by torchscript doesn't interactly properly with the surrounding code.
```
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    return val
```
- [x] Doesn't properly take in non-pointer classes. Can't define this function signature in cpp (We don't want to support this I believe).
```cpp
  void combine(Foo x) {
```

- [x] Has some issues with memory for blobs when constructing multiple objects (fix constant propagation pass to not treat capsules as the same object).
```py
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    val2 = torch._C.Foo(100, 0)
    val.display()
    print(val.add(3))
```
- [ ] Can't define multiple constructors (need to define overload string. Currently not possible since we don't support overloaded methods).
- [x] `init` is a little bit different syntax than `pybind`. `.init<...>()` instead of `.def(py::init<>())`
- [x] I couldn't figure out how to add some files into the build so they'd be copied to the `include/` directories, so I symlinked them manually.
- [ ] Currently, the conversion from Python into Torchscript doesn't work.
- [ ] Torchbind also currently requires Python/Pybind dependency. Fixing this would probably involve some kind of macro to bind into Python when possible.
- [ ] We pass back into Python by value, currently. There's no way of passing by reference.
- [x] Currently can only register one method with the same type signature. This is because we create a `static auto opRegistry`, and the function is templated on the type signature.

Somewhat blocked on https://github.com/pytorch/pytorch/pull/21177. We currently use some structures that will be refactored by his PR (namely `return_type_to_ivalue` and `ivalue_to_arg_type`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21098

Differential Revision: D16634872

Pulled By: Chillee

fbshipit-source-id: 1408bb89ea649c27d560df59e2cf9920467fe1de
2019-08-02 18:45:15 -07:00
4e6e11c139 added opset10 ORT tests (#22993)
Summary:
Added a number of opset10 tests from Caffe2 to ORT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22993

Differential Revision: D16467954

Pulled By: bddppq

fbshipit-source-id: 0b92694c7c0213bdf8e77e6f8e07e6bc8a85170a
2019-08-02 17:34:48 -07:00
97917fd26d Partially revert "Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380)" (#23747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23747

This reverts commit 6a3ebdbbc529da79125423839bf18f527a706ab8
"Remove all conda 3.5 nightly configs" but not the smoketest
removal.

Test Plan: Imported from OSS

Differential Revision: D16632992

Pulled By: ezyang

fbshipit-source-id: 5c6dcf1510b84359a1760cfa216edea610563ad5
2019-08-02 16:24:29 -07:00
a1b10270c2 Fix the bug in regularizer matching (#23485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23485

In previous diff D16326492, the "regularizer" in dot processor is defined according to input regularizer options through the function "get_emb_weighting_reg" in processor_utils.py. The option matching is only valid in local test, but doesn't work in workflows. This bug causes the regularizer not added in actual models and has made previous trimmed lasso implementation useless.

An evidence is that before D16326492, a flow f126010621 has elastic regularizer added:
https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5375243255&smc=chronos_gp_admin_client

{F171862755}

while after D16326492, the regularizer is gone in flow f127262007
https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=5428982684&smc=chronos_gp_admin_client

{F171862770}

Differential Revision: D16535466

fbshipit-source-id: 6b0b5e95b2b14a0d6c6d65f96bab89529f4e79c5
2019-08-02 15:54:48 -07:00
29881c7f02 Fix LSTM int8 quantization model size issue (#23577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23577

This diff is fixing a model size issue introduced in #23291. After that PR, the model size after in8 quantization is the same as that of the original unquantized model. The reason is that we save original weight for int8 quantization even when that's not needed anymore. This diff fixes that by only saving original weight for fp16 quantization path.

Reviewed By: llyfacebook

Differential Revision: D16557619

fbshipit-source-id: f924ae8d155a0d525b86a7440b3c7147d5bead0a
2019-08-02 13:38:30 -07:00
3107f1dcd5 fix align_corners doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23707

Differential Revision: D16617565

Pulled By: ezyang

fbshipit-source-id: 9ae581e9233d8c2b92f35b9486af1dab30ce8e3a
2019-08-02 12:43:35 -07:00
d9ec37adc4 Compress all non-Tensor components of a serialized TorchScript model. (#23723)
Summary:
This saves about 69KB off the FaceBlaze model, bringing the total size down from 388KB to 319KB.
 See https://github.com/pytorch/pytorch/issues/23582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23723

Differential Revision: D16623693

fbshipit-source-id: 66267f87635c502c804293054fd5716d291389c0
2019-08-02 12:39:20 -07:00
302adf1d20 add LambdaRank DCG Loss Option (#23679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23679
Full Canary: https://fburl.com/fblearner/sa1pkpya
Add LambdaRank DCG Loss Option
* when use_idcg_normalization == true, regular LambdaRank with NDCG loss
* when use_idcg_normalization == false, gradient and loss functions are not normalized by idcg.

Differential Revision: D16605459

fbshipit-source-id: a16f071e69516974e48d27bef4ca179019ca4ae7
2019-08-02 11:47:46 -07:00
fc6aec9491 format only change (#23685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23685

format only changes.

Differential Revision: D16607482

fbshipit-source-id: 572afb59c6ff9f8a8842ba044fed6c87f8506843
2019-08-02 11:47:42 -07:00
57fc793650 Add names to repr for named tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23316

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23316 gh/zou3519/80/head

Imported from OSS

Differential Revision: D16494415

Pulled By: zou3519

fbshipit-source-id: e483f57bdb0610d0eadbe70d673e20dc3d3f9502
2019-08-02 11:37:29 -07:00
8e466b7e21 Add torch._C._BUILD_NAMEDTENSOR() (#23623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23623

This is a quick, not-user-facing check for if pytorch was built with BUILD_NAMEDTENSOR=1.

Test Plan:
- run tests [namedtensor ci]

gh-metadata: pytorch pytorch 23623 gh/zou3519/85/head

Differential Revision: D16621829

Pulled By: zou3519

fbshipit-source-id: d7e1161dc176bab2c1f953265722daeba1e63102
2019-08-02 11:37:25 -07:00
995920ae2c Fix frontend error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23576

Pulled By: driazati

Differential Revision: D16611640

fbshipit-source-id: 4a6937e779dc43b3f043aca33e66d2b84376501c
2019-08-02 11:37:21 -07:00
mal
692825db86 Tests for C++ custom autograd function API (#23628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23628

More tests for autograd::Fuction based on python tests from test_autograd.py

Test Plan: Imported from OSS

Differential Revision: D16600992

fbshipit-source-id: 0cb8bfbcff315111dc4936e837ff859d0a1e251d
2019-08-02 11:37:17 -07:00
8df83ce559 Bump Gloo (#23400)
Summary:
Feature includes

- Log message if bind(2) fail
- Make collective work with single process context
- Use hipStreamCreateWithFlags instead of hipStreamCreateWithPriority
- Add RCCl support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23400

Differential Revision: D16623110

Pulled By: bddppq

fbshipit-source-id: e75cd8d2e2cad551ad0b0a08667320d7036b78bd
2019-08-02 11:26:28 -07:00
638d0b3705 Support ONNX export Multinomial (#23581)
Summary:
cc bddppq spandantiwari
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23581

Differential Revision: D16584853

Pulled By: bddppq

fbshipit-source-id: 01c066e86a0ad071361cd67b8c3925bfb6b84a4a
2019-08-02 11:06:21 -07:00
87131a9bae Fix unused imports in torch/onnx/symbolic_opset8.py (#23678)
Summary:
Which causes lint errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23678

Differential Revision: D16622458

Pulled By: mrshenli

fbshipit-source-id: 145ad30dfb452dd556573c1b3d4cdd9cd7852752
2019-08-02 10:55:16 -07:00
5cb41d35da increase predefined_minimum_secs to reduce variation (#23734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23734

In the latest run on AI-PEP, there are 6 tests out of 342 which has more than 7% variation. Around 20 tests which has variations between 4% to 7%. The rest are within 4%. This diff tries to further reduce the variation to 4% for all tests.

Each test has to run predefined_minimum_secs seconds before exiting. Increasing that value makes all tests run longer. Based on the experimental results, we will see what's the right value to use.

Reviewed By: hl475

Differential Revision: D16622361

fbshipit-source-id: d4c034f64b1d64e1cffd67ffbced7d8cd4449d69
2019-08-02 10:33:48 -07:00
89956374c3 Remove qconfig_dict from API (#23465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23465

We decided not to allow user to use qconfig_dict to do quantization
since that API is not robust.

Differential Revision: D16611504

fbshipit-source-id: b0d1d311b32c990a165c480f50e9ce3d68b785b5
2019-08-02 10:28:48 -07:00
645b981d95 QAT modules take qconfig as argument and keep qconfig as memeber (#23609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23609

Add qconfig to QAT modules to accommodate the convert logic

Differential Revision: D16584654

fbshipit-source-id: 2d7da652eb6eea43056030952c533314da41550d
2019-08-02 10:20:06 -07:00
725d6cd8ce Extract common classes and functions from test_c10d to common_distributed (#23660)
Summary:
MultiProcessTestCase will be useful for both c10d and rpc tests. So, this diff extracts that class and some common decorators to a separate file.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23660

Reviewed By: pietern

Differential Revision: D16602865

Pulled By: mrshenli

fbshipit-source-id: 85ad47dfb8ba187b7debeb3edeea5df08ef690c7
2019-08-02 09:19:32 -07:00
b2f6e2bdc1 Migrate neg's CUDA implementation to ATen. (#23617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23617

Doesn't seem to cause any performance regression. Performance difference
in the benchmarks is negligible.

Benchmark script:

```python
import timeit

for n, t in [(10, 100000),
             (1000, 10000)]:
    print('a.neg() (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.float', 'torch.double') + (('torch.half',) if device == 'cuda' else ()):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a.neg()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t))
```

Before:

```
a.neg() (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            2.5537249100016197
device: cpu, dtype: torch.uint8, 100000 times           2.512518662999355
device: cpu, dtype: torch.int16, 100000 times           2.548207502000878
device: cpu, dtype: torch.int32, 100000 times           2.5974994509997487
device: cpu, dtype: torch.int64, 100000 times           2.6533011499996064
device: cpu, dtype: torch.float, 100000 times           2.6474813019995054
device: cpu, dtype: torch.double, 100000 times          2.6949866009999823
device: cuda, dtype: torch.int8, 100000 times           5.820120684998983
device: cuda, dtype: torch.uint8, 100000 times          5.732108927997615
device: cuda, dtype: torch.int16, 100000 times          5.791249125999457
device: cuda, dtype: torch.int32, 100000 times          5.816761754998879
device: cuda, dtype: torch.int64, 100000 times          5.935873205999087
device: cuda, dtype: torch.float, 100000 times          6.276509613999224
device: cuda, dtype: torch.double, 100000 times         6.122782447000645
device: cuda, dtype: torch.half, 100000 times           6.161522764999972
a.neg() (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.3766637519984215
device: cpu, dtype: torch.uint8, 10000 times            0.37288786600038293
device: cpu, dtype: torch.int16, 10000 times            0.3485262310023245
device: cpu, dtype: torch.int32, 10000 times            0.41810554200128536
device: cpu, dtype: torch.int64, 10000 times            0.5609612200023548
device: cpu, dtype: torch.float, 10000 times            0.39054008099992643
device: cpu, dtype: torch.double, 10000 times           0.4946578170020075
device: cuda, dtype: torch.int8, 10000 times            0.5843639539998549
device: cuda, dtype: torch.uint8, 10000 times           0.5780841570012853
device: cuda, dtype: torch.int16, 10000 times           0.5819949180004187
device: cuda, dtype: torch.int32, 10000 times           0.5827294059999986
device: cuda, dtype: torch.int64, 10000 times           0.5861426519986708
device: cuda, dtype: torch.float, 10000 times           0.5929420489992481
device: cuda, dtype: torch.double, 10000 times          0.594638443999429
device: cuda, dtype: torch.half, 10000 times            0.5903799709994928
```

After:

```
a.neg() (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            2.4983287129980454
device: cpu, dtype: torch.uint8, 100000 times           2.479393904999597
device: cpu, dtype: torch.int16, 100000 times           2.5382055320005747
device: cpu, dtype: torch.int32, 100000 times           2.5587980189993687
device: cpu, dtype: torch.int64, 100000 times           2.637738788002025
device: cpu, dtype: torch.float, 100000 times           2.602799075997609
device: cpu, dtype: torch.double, 100000 times          2.6648931070012623
device: cuda, dtype: torch.int8, 100000 times           5.793338211999071
device: cuda, dtype: torch.uint8, 100000 times          5.782462584000314
device: cuda, dtype: torch.int16, 100000 times          5.824340334998851
device: cuda, dtype: torch.int32, 100000 times          5.851659068001027
device: cuda, dtype: torch.int64, 100000 times          5.8898071570001775
device: cuda, dtype: torch.float, 100000 times          5.913144636000652
device: cuda, dtype: torch.double, 100000 times         5.963339805999567
device: cuda, dtype: torch.half, 100000 times           5.87889370099947
a.neg() (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.37244726499920944
device: cpu, dtype: torch.uint8, 10000 times            0.36641623199830065
device: cpu, dtype: torch.int16, 10000 times            0.3449854829996184
device: cpu, dtype: torch.int32, 10000 times            0.4127863069988962
device: cpu, dtype: torch.int64, 10000 times            0.5551902160004829
device: cpu, dtype: torch.float, 10000 times            0.38593814199703047
device: cpu, dtype: torch.double, 10000 times           0.48877579500185675
device: cuda, dtype: torch.int8, 10000 times            0.5862828740027908
device: cuda, dtype: torch.uint8, 10000 times           0.5836667540024791
device: cuda, dtype: torch.int16, 10000 times           0.5918155769977602
device: cuda, dtype: torch.int32, 10000 times           0.5961457039993547
device: cuda, dtype: torch.int64, 10000 times           0.5963898690024507
device: cuda, dtype: torch.float, 10000 times           0.5985483309996198
device: cuda, dtype: torch.double, 10000 times          0.6027148480025062
device: cuda, dtype: torch.half, 10000 times            0.5961164370019105
```

Test Plan: Imported from OSS

Differential Revision: D16617574

Pulled By: ezyang

fbshipit-source-id: c90aa410f6385ce94fe6b84ebeceffa5effd0267
2019-08-02 02:52:51 -07:00
acc5cedf6a Adjust maintainers list (#23693)
Summary:
Adds new people and reorders sections to make more sense
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23693

Differential Revision: D16618230

Pulled By: dzhulgakov

fbshipit-source-id: 74191b50c6603309a9e6d14960b7c666eec6abdd
2019-08-01 22:59:02 -07:00
d1e0a3dd15 Compress debug symbols when serializing TorchScript models.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23659

Differential Revision: D16603775

fbshipit-source-id: f2912048bdee36b3bcaa779e801c61bfbb5f30e5
2019-08-01 22:30:27 -07:00
3d15ee1b34 Remove more uses of DimensionedTensorType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060

Differential Revision: D16460391

Pulled By: Krovatkin

fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1
2019-08-01 21:19:28 -07:00
3314d60a75 fix conv2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23690

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D16610734

Pulled By: jamesr66a

fbshipit-source-id: e190174f11d1810e6f87e2df256543028e9154ef
2019-08-01 19:39:08 -07:00
df8638b0ed Support Copy Op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23705

Reviewed By: yinghai

Differential Revision: D16354204

fbshipit-source-id: 158b0ee556606c117e52bee875d3dc89cc944b5a
2019-08-01 19:27:26 -07:00
9d2cc2c987 Support nn.GRU in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23266

Test Plan: Imported from OSS

Differential Revision: D16466586

Pulled By: wanchaol

fbshipit-source-id: 0f5b8013167bb7b246bd7e28d87a4a9e9c3b34d5
2019-08-01 17:19:26 -07:00
b22c88b8eb Reduce input sets for tests to speed them up. (#23692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23692

Before tests took ~40s to finish, with this change it's ~2s.

Test Plan: Imported from OSS

Differential Revision: D16611479

Pulled By: ZolotukhinM

fbshipit-source-id: 391235483029d2ab860fcc4597ce84f4964025f1
2019-08-01 17:06:31 -07:00
c91f209130 Updating submodules
Reviewed By: zpao

fbshipit-source-id: ff6387055e7fa2cde88bd870081a05c3adbf56ef
2019-08-01 17:01:23 -07:00
0539462ca2 Fix pin_memory_thread not exiting quickly (#23646)
Summary:
fixes https://github.com/pytorch/pytorch/issues/23642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23646

Differential Revision: D16600874

Pulled By: soumith

fbshipit-source-id: 50f0828d774a558d6f21e9dd21135906bd5be128
2019-08-01 15:24:14 -07:00
3b5daef6de Move addcmul to Aten (#22874)
Summary:
Move CPU implementation of the `addcmul` operator to Aten ( https://github.com/pytorch/pytorch/issues/22797 )

### before

```python
In [11]: timeit x.addcmul(a, b)
1.31 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

### after

```python
In [9]: timeit x.addcmul(a, b)
588 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Adding custom code for the case when `value == 1`, doesn't provide significant performance gain.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22874

Differential Revision: D16359348

Pulled By: VitalyFedyunin

fbshipit-source-id: 941ead835672fca78a1fcc762da052e64308b111
2019-08-01 12:40:48 -07:00
dded794eeb add setup metadata to help PyPI flesh out content on pypi package page (#22085)
Summary:
add setup metadata to help PyPI flesh out content on pypi package page.

Apparently this might help flesh out the "Used By" feature according to driazati
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22085

Differential Revision: D16604703

Pulled By: soumith

fbshipit-source-id: ddb4f7ba7c24fdf718260aed28cc7bc9afb46de9
2019-08-01 12:15:56 -07:00
ff3dd72469 Add in-place check to AliasDb
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23210

Test Plan: Imported from OSS

Differential Revision: D16444529

Pulled By: bwasti

fbshipit-source-id: 83af54d423989a2a726158b521093660584ee9c2
2019-08-01 12:15:52 -07:00
336c9be7f4 Slightly improve dataloader docs on when auto-batching is disabled (#23671)
Summary:
cc gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23671

Differential Revision: D16604387

Pulled By: soumith

fbshipit-source-id: 0ebc120bcaa0f6fa09158b1d0459a72ab11a53d6
2019-08-01 12:10:17 -07:00
7ac41b1cfd Remove useless code from shape info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23663

Reviewed By: yinghai

Differential Revision: D16592163

fbshipit-source-id: de1482305abef45f7ef0e3e57b0c93cd2acac450
2019-08-01 11:47:01 -07:00
fed5ca192c Adam/AdamW implementation minor fix (#22628)
Summary:
I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections.

![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png)

In the current implementation, the epsilon is scaled by the square root of `bias_correction2`.  I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line).

![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628

Differential Revision: D16589914

Pulled By: vincentqb

fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7
2019-08-01 11:42:04 -07:00
6cf9ed4a54 ConvBn2d/ConvBnReLU2d (#23357)
Summary:
Added _intrinsic.qat.ConvBn2d/_intrinsic.qat.ConvBnReLU2d.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23357
ghstack-source-id: 87519573

Differential Revision: D16295500

fbshipit-source-id: 81e6d1d10d05bf6e343721fc5701d3d6bd7e07e6
2019-08-01 10:07:00 -07:00
029c8e7754 allow forward hooks in tracing (#23613)
Summary:
As far as I could tell forward hooks work out of the box, so allow them in the tracing. We don't have any way of supporting backward hooks though.

Fixes https://github.com/pytorch/pytorch/issues/20862 and fixes https://github.com/pytorch/pytorch/issues/17571
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23613

Differential Revision: D16601437

Pulled By: eellison

fbshipit-source-id: ecf5dc6201ca08b3b9afdb9fcdb0fda8741133a9
2019-08-01 09:51:19 -07:00
2342b7485e Omit local version identifier for default configuration. (#23654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23654

Default configuration at time of writing is CUDA 10 (but
with 10.1 coming soon)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16601097

Pulled By: ezyang

fbshipit-source-id: c8368355ce1521c01b0ab2a14b1cd0287f554e66
2019-08-01 08:54:56 -07:00
8ab99a28d9 Fix CPU-only binary testing by properly installing cpu-only first. (#23611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23611

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16601098

Pulled By: ezyang

fbshipit-source-id: febb5a822854b91d5b3d942e6bf71b4ae9f1f15c
2019-08-01 08:54:52 -07:00
865c7eea48 Changed tensor comparison return type from uint8 to bool (#21113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113
ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf

Test Plan: Imported from OSS

Differential Revision: D15552204

Pulled By: izdeby

fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037
2019-08-01 07:54:53 -07:00
388dc4f2a6 Let user be able to change MKLDNN "-m" flags back and forth in subsequent builds (#23608)
Summary:
Currently once user has set `USE_NATIVE_ARCH` to OFF, they will never be able to turn it on for MKLDNN again by simply changing `USE_NATIVE_ARCH`. This commit fixes this issue.

Following up 09ba4df031ed51e05724bb490d4d6fc52b3b1ac6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23608

Differential Revision: D16599600

Pulled By: ezyang

fbshipit-source-id: 88bbec1b1504b5deba63e56f78632937d003a1f6
2019-08-01 06:05:36 -07:00
02f794b102 Add overload names to native_functions.yaml (#23532)
Summary:
We need this to be able to register them with the c10 dispatcher.

The overload names are based on one-letter-per-argument-type.

Script used to change native_functions.yaml and derivatives.yaml: P75630718

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23532
ghstack-source-id: 87539687

Differential Revision: D16553437

fbshipit-source-id: a1d0f10c42d284eba07e2a40641f71baa4f82ecf
2019-08-01 02:08:37 -07:00
mal
ec13f18390 Allow empty Variables to be saved for backwards (#23618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23618

For example: `save_for_backward({Variable(), x, Variable()})` should be allowed, so that this is consistent with the python API behaviour.

Test Plan: Added a test similar to the python test `test_save_none_for_backward` from test_autograd.py.

Differential Revision: D16589402

fbshipit-source-id: 847544ad8fc10772954d8629ad5a62bfdc1a66c1
2019-07-31 19:51:35 -07:00
0a12ff7c5b Use dst dir for temp file (#23629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23629

Differential Revision: D16594223

Pulled By: soumith

fbshipit-source-id: db0275415111f08fc13ab6be00b76737a20f92df
2019-07-31 19:04:03 -07:00
0ce950de05 prefix module qualified names with __module__ (#23630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23630

This is temporary, won't be needed with the new serialization format.
But for now, since the main module gets its name from the archive name,
we need this for safety, other wise something like
`torch.jit.save("torch.pt") will break things.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D16592404

Pulled By: suo

fbshipit-source-id: b538dc3438a80ea7bca14d84591ecd63f4b1289f
2019-07-31 18:30:13 -07:00
230f7f9bbc Include protobuf-defined outputs in the graph cutting algorithm (#23557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23557

As title states this enables any tensors defined by the user to be outputs, including activations

Reviewed By: yinghai

Differential Revision: D16362993

fbshipit-source-id: b7dc8412c88c46fcf67a3b3953dc4e4c2db8c4aa
2019-07-31 16:15:59 -07:00
9467e80097 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23615

Differential Revision: D16590899

Pulled By: zou3519

fbshipit-source-id: 4f07eda93fd618605c3bb6dfe4c11b2d1d2dec0d
2019-07-31 16:06:14 -07:00
88b96ba951 Update relative links in OVERVIEW.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23627

Test Plan: Imported from OSS

Differential Revision: D16590415

Pulled By: suo

fbshipit-source-id: 9f4fabd77b80f08f96f4bc969b43aa8ff3d4ac96
2019-07-31 15:45:04 -07:00
3b6aa9ade6 Add logging to Alias Analysis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23383

Differential Revision: D16573661

Pulled By: Krovatkin

fbshipit-source-id: c199656805b474b3c1b3ba09b4e236aec84617f4
2019-07-31 13:31:36 -07:00
2e40857dad Fix CTC loss for zero-length targets on GPU (#23298)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18215 at last!

Also sprinkle tests...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
2019-07-31 12:03:45 -07:00
08f7f27c6a Fix named tensor build by enabling tensor.is_pinned and removing support for clone() (#23597)
Summary:
`is_pinned` was moved to native_functions.yaml, disabling it for named
tensors. This PR re-enables its usage for named tensors.

I wrote a named inference rule for torch.clone(), but something happened
to it. Disable it for now so we can get the namedtensor ci to be green.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23597

Test Plan: - run tests [namedtensor ci]

Differential Revision: D16581771

Pulled By: zou3519

fbshipit-source-id: 498018cdc55e269bec80634b8c0a63ba5c72914b
2019-07-31 11:48:40 -07:00
mal
3fa2df7c9a Support custom autograd functions in C++ (#23572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23572

### **(The stack from #23020  was moved into this PR)**

Adding API for custom autograd operations, with user defined forward and backward, [like in python](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd).

The custom operation should be a subclass of Function, with static forward and backward functions. `forward()` can accept any arguments similar to the Python API and `backward()` should accept a variable list as an argument.

Both `forward()` and `backward() `accept a AutogradContext* which can be used to share data between them.
Variables can be saved in the context using `save_for_backward()` and other data can be saved in the map `save` in the form of `<std::string, at::IValue>` pairs. Variables saved in forward can be accessed with `get_saved_variables()`.

Example usage:
```
class MyFunction : public Function<MyFunction> {
  public:
  static variable_list forward(AutogradContext *ctx, int n, Variable var) {
     // Save data for backward in context
     ctx->saved_data["n"] = n;
     return {var};
  }

  static variable_list backward(AutogradContext *ctx, variable_list grad_output) {
     // Use data saved in forward
     auto n = ctx->saved_data["n"].toInt();
     return {grad_output[0]*n};
  }
};

```
Then, it can be used with:
```
Variable x;
MyFunction::apply(6, x);
```

Also AutogradContext has methods to mark outputs as non differentiable and mark inputs as dirty similar to the [Python API](ff23a02ac4/torch/autograd/function.py (L26)).

Test Plan: Added tests for the custom autograd function API based on test_autograd.py. Currently only the tests for the basic functionality have been added. More tests will be added later.

Differential Revision: D16583428

fbshipit-source-id: 0bd42f19ce37bcd99d3080d16195ad74d40d0413
2019-07-31 11:30:48 -07:00
5e4c24baef Documentation cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23148

Test Plan: Imported from OSS

Differential Revision: D16414202

Pulled By: zafartahirov

fbshipit-source-id: a999be0384a2ff5272dd2f8adcf87547ce6ee9dd
2019-07-31 11:30:44 -07:00
87a75bd605 remove ONNX & Turn on NO_API for mobile build (#23546)
Summary:
### Summary
The iOS build was broken after this PR 👉 [23195](https://github.com/pytorch/pytorch/pull/23195/files) was merged, as there are two files still have dependency on ONNX.
- `test.cpp` in `test/cpp/jit`
-  `export.cpp` in `torch/csrc/jit`

This PR is to remove ONNX completely from mobile build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23546

Test Plan:
- The `build_ios.sh` finished successfully.
- The `libtorch.a` can  be compiled and run on iOS devices

Differential Revision: D16558236

Pulled By: xta0

fbshipit-source-id: b7ff1db750698cfd5a72d5cb0b9f2f378e315077
2019-07-31 10:42:56 -07:00
9130ab380a fix gemm call for CUDABlas for THCUNN conv, #23545 (#23552)
Summary:
* Swapped `CUBLAS_OP_N` for `'n'`
* added a test

This PR should fix https://github.com/pytorch/pytorch/issues/23545.

Thanks at AlphabetMan for reporting the initial issue reported in [the forum](https://discuss.pytorch.org/t/cuda-10-1-error-using-transposeconv2d-with-output-padding-1/51414?u=ptrblck) as well as ngimel for the guidance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23552

Differential Revision: D16580986

Pulled By: ezyang

fbshipit-source-id: abc0bce1e84d9c9d96d44ae0296951725adc8424
2019-07-31 10:01:36 -07:00
5d130e4232 Allowing batching for det/logdet/slogdet operations (#22909)
Summary:
Changelog:
- Add batching for det / logdet / slogdet operations
- Update derivative computation to support batched inputs (and consequently batched outputs)
- Update docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909

Test Plan:
- Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py`
- Add autograd tests for batched inputs

Differential Revision: D16580988

Pulled By: ezyang

fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a
2019-07-31 10:01:32 -07:00
5b66062f99 Use prerendered KaTeX in docs. (#23376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23376

This uses master version of sphinxcontrib-katex as it only
recently got prerender support.

Fixes #20984

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16582064

Pulled By: ezyang

fbshipit-source-id: 9ef24c5788c19572515ded2db2e8ebfb7a5ed44d
2019-07-31 10:01:28 -07:00
456e66d531 format jit_type.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23564

Test Plan: Imported from OSS

Differential Revision: D16567850

Pulled By: zdevito

fbshipit-source-id: 6e2056b480da3f1ea0dbb6e7240677f7e7a9937e
2019-07-31 09:38:57 -07:00
02d5c62f34 Fix regression in torch.qr (#23591)
Summary:
Changelog:
- Use narrow instead of narrow_copy while returning
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23591

Test Plan:
- All tests should pass to ensure that the change is correct

Fixes https://github.com/pytorch/pytorch/issues/23580

Differential Revision: D16581174

Pulled By: ezyang

fbshipit-source-id: 1b6bf7d338ddd138ea4c6aa6901834dd202ec79c
2019-07-31 09:38:53 -07:00
50ce9e09da Fix typos in .circleci/README.md (#23588)
Summary:
Fix typos in .circleci/README.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23588

Differential Revision: D16581934

Pulled By: ezyang

fbshipit-source-id: 39bf07e8d9d80493e15ecba7e846097ef44a6851
2019-07-31 09:32:38 -07:00
3e0da2ab8e Rename AT_FORALL_SCALAR_TYPES_WITH_COMPLEX to AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_STUBS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23336

Test Plan: Imported from OSS

Differential Revision: D16467982

Pulled By: gchanan

fbshipit-source-id: 004bfc179c7bf963e1132c59af692080156808ab
2019-07-31 08:17:17 -07:00
e324f9a093 Remove AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF, which isn't used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22932

Test Plan: Imported from OSS

Differential Revision: D16467978

Pulled By: gchanan

fbshipit-source-id: 1dafde39a63c4109a8bc5fb31b9ffe5071d6dc53
2019-07-31 08:17:13 -07:00
c7248dad63 Update MKL to 2019.4 for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23583

Differential Revision: D16581167

Pulled By: ezyang

fbshipit-source-id: a5b4f65c08d53c9a477093a5558502a4b7b888a4
2019-07-31 07:47:10 -07:00
c5482e33e9 Rename tensor.is_named to has_named, expose has_named to python.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23315

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23315 gh/zou3519/79/head

Imported from OSS

Differential Revision: D16494414

Pulled By: zou3519

fbshipit-source-id: d2d6beb45db9288e5df707b68b6046d783ca9f97
2019-07-31 07:14:07 -07:00
725e41e955 Enable named tensors for arithmetic, clone, and tensor conversion ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23237

Test Plan: Imported from OSS

Differential Revision: D16494416

Pulled By: zou3519

fbshipit-source-id: 29bc390797c99088d50a2b59c3e2402a93562e2c
2019-07-31 07:14:04 -07:00
b417c2d5a7 Refactor the pytorch_doc_push_script to take a branch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23556

Test Plan:
- Run ci

Imported from OSS

Differential Revision: D16563747

Pulled By: zou3519

fbshipit-source-id: 104371b3712c00b073a82e5145090e7bd6fd2d53
2019-07-31 07:02:18 -07:00
2aaeccda55 add a test for inline tracing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23543

Test Plan: Imported from OSS

Differential Revision: D16570826

Pulled By: suo

fbshipit-source-id: 854609b298b31bc0250a1c536daa9ff572fb71d6
2019-07-31 00:06:48 -07:00
9c549dfdc1 make_module: First version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23288

Test Plan: Imported from OSS

Differential Revision: D16455390

Pulled By: zafartahirov

fbshipit-source-id: 4352f0a17cd0382b48502b93e51574cc3acdfdcc
2019-07-30 22:14:44 -07:00
af638ad5d7 pin_memory should not copy on already pinned tensors (#23484)
Summary:
fixes https://github.com/pytorch/pytorch/issues/21076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484

Differential Revision: D16546264

Pulled By: ezyang

fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982
2019-07-30 21:16:23 -07:00
3fe00f0c90 Fix set_grad for extension backends
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23516

Test Plan: Imported from OSS

Differential Revision: D16546732

Pulled By: li-roy

fbshipit-source-id: bbf9498de98fd807c64862d628da35d0097f2ee0
2019-07-30 20:28:37 -07:00
775d7bd6a1 at::view (#23452)
Summary:
accidently calls clone, but what we want is creating an empty tensor and set storage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23452
ghstack-source-id: 87438096

Differential Revision: D16442756

fbshipit-source-id: 6d5663f82c9bd4e9de8fc846c52992477843af6a
2019-07-30 18:08:04 -07:00
756bdcbca4 Include recursive class compilations in error call stack (#23454)
Summary:
Previously these were left out which would lead to confusing messages,
now it looks something like:

```
torch.jit.frontend.UnsupportedNodeError: import statements aren't
supported
:
at ../test.py:13:9
    def bad_fn(self):
        import pdb
        ~~~~~~ <--- HERE
'__torch__.X' is being compiled since it was called from 'fn'
at ../test.py:16:12
def fn(x):
    return X(10)
           ~~~~ <--- HERE
```

Fixes #23453

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454

Pulled By: driazati

Differential Revision: D16567930

fbshipit-source-id: 251b6f91f37a2816e06bb4c803f9bc172fa1d91b
2019-07-30 17:29:54 -07:00
941be58b5a remove the confused CPU op (#23575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23575

as title

Reviewed By: skamalas

Differential Revision: D16571878

fbshipit-source-id: f175d1d70f0e96e04da949100985db0e1f936fb9
2019-07-30 17:29:50 -07:00
bc64324da9 Change condition in swap module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23561

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D16570928

Pulled By: jerryzh168

fbshipit-source-id: 70f36f577ac657d015f3d7738819867742088e5a
2019-07-30 17:25:02 -07:00
ab584c738b Move overview to docs/ folder
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23457

Test Plan: Imported from OSS

Differential Revision: D16552603

Pulled By: suo

fbshipit-source-id: 91547f870c563ca78382b8fdd7a42b472ed07ea4
2019-07-30 17:20:01 -07:00
1c86b8a783 add docs for serialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23456

Test Plan: Imported from OSS

Differential Revision: D16552602

Pulled By: suo

fbshipit-source-id: 41e333af97e43fcef2b7f6e02c36a805ceb64573
2019-07-30 17:19:57 -07:00
0a04513367 Remove old Type based backend extensions (#22009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22009
ghimport-source-id: e481b64707434a1abdc382fd80bd70f165540711

Test Plan: Imported from OSS

Differential Revision: D15914755

Pulled By: li-roy

fbshipit-source-id: 9230b8b234f71a5d865bf6bca93347c68c349ff7
2019-07-30 14:07:46 -07:00
3cc7da3a7d Revert D16561561: [pytorch][PR] Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts.
Differential Revision:
D16561561

Original commit changeset: 962a27a2b0a1

fbshipit-source-id: 82ed08e5599ddbb9ed96352ac4572aa73df65aac
2019-07-30 13:28:19 -07:00
649fa8e5c8 add log stmts to peephole.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23279

Differential Revision: D16519245

Pulled By: Krovatkin

fbshipit-source-id: 50c49d890c0acac8259b3c367d183a1aa7cf6859
2019-07-30 13:16:16 -07:00
9dea86f86b Make ProfiledTensorType hashable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23116

Differential Revision: D16519748

Pulled By: Krovatkin

fbshipit-source-id: 25090678d82d5dc9ca0a48aef45eeb62b8ac8d45
2019-07-30 13:11:06 -07:00
2d238d090c Add Cast Op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23548

Reviewed By: yinghai

Differential Revision: D16355170

fbshipit-source-id: 72de08b16251f55165977736e686075bca08c24e
2019-07-30 12:51:03 -07:00
776b6b6bcd Cleanup interface of inlineCallTo.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23539

Test Plan: Imported from OSS

Differential Revision: D16555365

Pulled By: ZolotukhinM

fbshipit-source-id: 6cfcde7a7600315e73e083284c80f876509489a5
2019-07-30 11:26:31 -07:00
be3d27589f Added torch.autograd.profiler.record_function() as context manager. (#23428)
Summary:
Added torch.autograd.profiler.record_function() as context manager to annotate block of Python code during profiling.

Fixes https://github.com/pytorch/pytorch/issues/19422 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23428

Differential Revision: D16560771

Pulled By: soumith

fbshipit-source-id: 3923130f7647a36a84dbbe28cc59d216d395d3f9
2019-07-30 11:10:01 -07:00
7364aa796d skip nn.Identity in add_observer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23500

Test Plan:
e2e test in quantizing resnext 101

Imported from OSS

Differential Revision: D16550190

Pulled By: jerryzh168

fbshipit-source-id: 6128d7c3419235152b43739fcc5cade34342ba3d
2019-07-30 11:00:36 -07:00
5b4ac841c9 Quantized Average Pool kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23143

Test Plan: Imported from OSS

Differential Revision: D16406281

Pulled By: zafartahirov

fbshipit-source-id: dcd8b58a0ef32b3dcc3337c282c59b4e52091516
2019-07-30 10:51:25 -07:00
401fbb0088 Port resize_as_ and clone from TH to Aten (#23027)
Summary:
API operators now routed to `at::native::resize_as_*_` and `at::native::clone` accordingly.
Internal `THTensor_(resizeAs)`, `THCTensor_(resizeAs)`, `THTensor_(newClone)` and `THCTensor_(newClone)` remains to support older TH code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23027

Differential Revision: D16362304

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c1e8516da685f3fdea632ff791d143f27aeebeb
2019-07-30 10:40:27 -07:00
e7abff0778 Delete re_worker_requirements 2019-07-30 13:02:20 -04:00
b3a9a7a9b9 Rename gels to lstsq (#23460)
Summary:
Changelog:
- Rename `gels` to `lstsq`
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460

Test Plan: - All tests should pass to confirm that the patch is correct

Differential Revision: D16547834

Pulled By: colesbury

fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496
2019-07-30 09:56:04 -07:00
cfe9400996 Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. (#23528)
Summary:
After https://github.com/pytorch/pytorch/issues/23455, there is no need of this preprocessing in Python scripts.
They will be automatically processed in CMake (plus CPPFLAGS here
probably meant to be CXXFLAGS).

Reference:

- https://cmake.org/cmake/help/v3.15/envvar/CFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/CXXFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/LDFLAGS.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23528

Differential Revision: D16561561

Pulled By: ezyang

fbshipit-source-id: 962a27a2b0a18db0f95477ad067a2611e4128187
2019-07-30 08:07:36 -07:00
fd61cc9ebc Moved at::assert_no_internal_overlap to TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22917

Differential Revision: D16521429

Pulled By: pbelevich

fbshipit-source-id: 80ae583c6486d6948431b79e1452902bdf2cfbc3
2019-07-30 07:47:33 -07:00
4b78ce1ba4 Clean cmake infrastructure up (#23527)
Summary:
Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse)

Use cmake targets for ROCm where possible.

While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527

Differential Revision: D16561010

Pulled By: ezyang

fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f
2019-07-30 07:26:48 -07:00
437a8b3eed Named inference rule for copy_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23229

Test Plan: Imported from OSS

Differential Revision: D16494413

Pulled By: zou3519

fbshipit-source-id: 4acb85e5a4ad09bf5f7cbb84cc8d4ceac0cd9967
2019-07-30 07:17:34 -07:00
16da355b30 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
LARGE: 66
MEDIUM: 649
XLARGE: 1

Updated actions:
From LARGE to MEDIUM: 18
From LARGE to XLARGE: 2
From MEDIUM to LARGE: 20
From XLARGE to LARGE: 1

Differential Revision: D16559356

fbshipit-source-id: a51ef034265649314661ab0e283089a069a20437
2019-07-30 02:53:11 -07:00
4e59055c4d optimize matmul memory usage for certain cases (#23433)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23433

Differential Revision: D16524135

Pulled By: ailzhang

fbshipit-source-id: e7684fec60c9b9db9a09f8ac157b13c8dde1bdd2
2019-07-29 22:35:45 -07:00
7b081e5d1e Improve error message for changing tensor metadata after .data or .detach() (#23504)
Summary:
When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message.

Closes https://github.com/pytorch/pytorch/issues/23393.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504

Differential Revision: D16547415

Pulled By: yf225

fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c
2019-07-29 22:25:14 -07:00
db1e9b1d6c Fix a few clang warnings.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23524

Differential Revision: D16549562

fbshipit-source-id: 58351fc2858d495b135023626116f6f565c8e9b1
2019-07-29 22:08:50 -07:00
30bc19d751 dictKeys and dictItems ops on typed dicts return typed lists (#23270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23270
ghstack-source-id: 87389530

Differential Revision: D16448942

fbshipit-source-id: e6b578f0e97776112259d7ea38e143e4716ec273
2019-07-29 20:00:34 -07:00
c8817f9436 fix default value for script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23542

Test Plan: Imported from OSS

Differential Revision: D16557122

Pulled By: suo

fbshipit-source-id: c86578aa2c55f44ed5d573d33874a82244df3d09
2019-07-29 19:51:26 -07:00
6314af6e57 Revert D16526027: [jit] Include recursive class compilations in error call stack
Differential Revision:
D16526027

Original commit changeset: 109f2968430d

fbshipit-source-id: c27252540ec6b7da60739eb7dcc8b1650672c226
2019-07-29 19:02:39 -07:00
6574f6167c Revert D16554694: [jit] add a test for inline tracing
Differential Revision:
D16554694

Original commit changeset: 0fae4458f18c

fbshipit-source-id: 08aa0c292fa5b2dbdd0d1f0e59f531416edef760
2019-07-29 18:57:06 -07:00
265b498de2 add a test for inline tracing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23535

Test Plan: Imported from OSS

Differential Revision: D16554694

Pulled By: suo

fbshipit-source-id: 0fae4458f18c06ffbd484905ad7836dce9ce69cc
2019-07-29 18:05:15 -07:00
52b95fd4be Include recursive class compilations in error call stack (#23454)
Summary:
Previously these were left out which would lead to confusing messages,
now it looks something like:

```
torch.jit.frontend.UnsupportedNodeError: import statements aren't
supported
:
at ../test.py:13:9
    def bad_fn(self):
        import pdb
        ~~~~~~ <--- HERE
'__torch__.X' is being compiled since it was called from 'fn'
at ../test.py:16:12
def fn(x):
    return X(10)
           ~~~~ <--- HERE
```

Fixes #23453
](https://our.intern.facebook.com/intern/diff/16526027/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454

Pulled By: driazati

Differential Revision: D16526027

fbshipit-source-id: 109f2968430dbf51ee91b1b3409badfd557d19a4
2019-07-29 18:00:05 -07:00
696642ae8d Change docs to use recursive script API (#21612)
Summary:
Use the recursive script API in the existing docs

TODO:
* Migration guide for 1.1 -> 1.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21612

Pulled By: driazati

Differential Revision: D16553734

fbshipit-source-id: fb6be81a950224390bd5d19b9b3de2d97b3dc515
2019-07-29 17:51:22 -07:00
bfee46f8e2 Update argument list for non-fbgemm path for qconv_prepack (#23521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23521

non-fbgemm path should have the same arguments as fbgemm path.

Reviewed By: jianyuh

Differential Revision: D16547637

fbshipit-source-id: bb00d725fb968cbee32defb8facd2799a7e79bb4
2019-07-29 17:41:11 -07:00
65a89472c4 Put all modules in the global Python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23154

Test Plan: Imported from OSS

Differential Revision: D16441913

Pulled By: suo

fbshipit-source-id: a79f2c3e06a33cbd79b2e3333f16c069f356f451
2019-07-29 16:38:20 -07:00
e366af7d87 Add TORCH_CHECK to disable sub for bool tensors (#23519)
Summary:
This resolves two issues in one shot:

- sub shouldn't be available for bool type.
- When sub is applied to an unsupported type, the current error messages
  shows "add_cpu/add_cuda is not implemented for [type]". They should be
  "sub_cpu/sub_cuda" instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23519

Differential Revision: D16548770

Pulled By: izdeby

fbshipit-source-id: fe404a2a97b8d11bd180ec41364bf8e68414fb15
2019-07-29 16:28:35 -07:00
3c986dff77 introduce auto_set to simplify benchmarking the backward path of operators (#23276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276

This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example:

```
...
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
...
```

In this way, the benchmark will generate three different test cases.
1. input_one requires grad
2. input_two requires grad
3. both inputs require grad

Here is a sample output:
```
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwdall
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 863.744

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd1
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 727.915

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd2
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 687.626
```

Reviewed By: zheng-xq

Differential Revision: D16450355

fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305
2019-07-29 15:58:41 -07:00
41dfe7204b Threading and CPU Inference note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23417

Test Plan:
cd docs; make html

Imported from OSS

Differential Revision: D16523781

Pulled By: ilia-cher

fbshipit-source-id: d6c09e8a85d39e6185bbdc4b312fea44fcdfff06
2019-07-29 15:45:49 -07:00
f4eb93f7bc Support pack_padded_sequence and pad_packed_sequence
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23249

Test Plan: Imported from OSS

Differential Revision: D16466587

Pulled By: wanchaol

fbshipit-source-id: a721da01b2da0ef90cac80b77f1285102e3b1118
2019-07-29 15:36:47 -07:00
c384fbf4c8 support torch._C._get_tracing_state in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23248

Test Plan: Imported from OSS

Differential Revision: D16466588

Pulled By: wanchaol

fbshipit-source-id: 3c3d5dec2cea2f9cb080eadaef457cc62ac3fbe0
2019-07-29 15:05:50 -07:00
e1f8985973 Specify onnxruntime version to install for CI tests (#23517)
Summary:
No real change on the CI since currently the default latest is 0.4.0. houseroad bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23517

Differential Revision: D16550375

Pulled By: bddppq

fbshipit-source-id: a669b8af678c79c4d6909300b28458fe6b7cd30c
2019-07-29 14:58:15 -07:00
c779eff579 support torch.as_tensor in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23247

Test Plan: Imported from OSS

Differential Revision: D16466590

Pulled By: wanchaol

fbshipit-source-id: cf52721eacd177d9040564790382db13a9fcc2fe
2019-07-29 14:38:22 -07:00
3a568c9a2b CI: install clang-tidy (#23518)
Summary:
Install clang-tidy (from LLVM 8) for the `clang_tidy` job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23518

Differential Revision: D16549621

Pulled By: ezyang

fbshipit-source-id: b1d20641380cdfdb0589249770b98163528fa69f
2019-07-29 14:28:26 -07:00
a8edc2b5d2 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926

Differential Revision: D16546369

Pulled By: colesbury

fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027
2019-07-29 13:47:05 -07:00
9219a37c12 avoid Include the same header file twice (#23418)
Summary:
avoid Include the same header file twice
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23418

Differential Revision: D16546422

Pulled By: colesbury

fbshipit-source-id: 5cd868cce73d9199ced9b6f2f6f57bf42e5a5d5b
2019-07-29 13:34:11 -07:00
dec4eacae4 fix fbcode weak ordering (#23511)
Summary:
There is an internal fbcode assert that fails if i do not add these checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23511

Differential Revision: D16545606

Pulled By: eellison

fbshipit-source-id: cd3a799850bae8f052f9d81c1e4a2678fda19317
2019-07-29 13:14:39 -07:00
0c9979dd7d Fix TestCuda.test_events_wait (#23520)
Summary:
PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
Whenever a test is run, assertLeaksNoCudaTensors is called,
which in turn calls CudaMemoryLeakCheck, which in turn calls
initialize_cuda_context_rng, where it executes torch.randn
on each device, where a kernel is launched on each device.

Since the kernel may not finish on device 1, the assertion
self.assertTrue(s1.query()) fails.

The fix is to insert

        torch.cuda.synchronize(d0)
        torch.cuda.synchronize(d1)

at the beginning of the test so that previously launched kernels finish before the real
test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520

Differential Revision: D16547701

Pulled By: soumith

fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a
2019-07-29 13:09:41 -07:00
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
56664c2c65 Untap caskroom/homebrew-cask in attempt to unbreak OS X builds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23514

Test Plan: Imported from OSS

Differential Revision: D16546841

Pulled By: ezyang

fbshipit-source-id: 96d2e988cb0dddfeec0174875761dfa26f25a8c1
2019-07-29 12:45:01 -07:00
31f1928096 add sorting policy to ChunkDataset (#23053)
Summary:
Add a sorting policy to ChunkDataset.

This is considered an advanced parameter for developers who want to apply a 'sorting policy' to the chunk data before sampling into minibatch.

Different than the collate method, this policy is applied on the chunk level instead of minibatch level. When a chunk of data is loaded (multiple chunks if cross_chunk_shuffle_count_ is greater than 1), this policy is targeting to the full loaded data. It will be useful if developers want to perform some pre-processing (like bucketing) to the chunk data before example sampler samples the data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23053

Differential Revision: D16537692

Pulled By: colesbury

fbshipit-source-id: cd21ed40ab787a18b8c6dd304e5b806a7a45e6ba
2019-07-29 12:34:02 -07:00
a356276d79 add note to Contribution Guide around recently released research (#23513)
Summary:
Thanks adefazio for the feedback, adding a note to the Contribution guide so that folks don't start working on code without checking with the maintainers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23513

Differential Revision: D16546685

Pulled By: soumith

fbshipit-source-id: 1ee8ade963703c88374aedecb8c9e5ed39d7722d
2019-07-29 12:24:59 -07:00
06762b4721 Fix distributions.Categorical.sample bug from .view() (#23328)
Summary:
This modernizes distributions code by replacing a few uses of `.contiguous().view()` with `.reshape()`, fixing a sample bug in the `Categorical` distribution.

The bug is exercised by the following test:
```py
batch_shape = (1, 2, 1, 3, 1)
sample_shape = (4,)
cardinality = 2
logits = torch.randn(batch_shape + (cardinality,))
dist.Categorical(logits=logits).sample(sample_shape)
# RuntimeError: invalid argument 2: view size is not compatible with
#   input tensor's size and stride (at least one dimension spans across
#   two contiguous subspaces). Call .contiguous() before .view().
#   at ../aten/src/TH/generic/THTensor.cpp:203
```
I have verified this works locally, but I have not added this as a regression test because it is unlikely to regress (the code is now simpler).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23328

Differential Revision: D16510678

Pulled By: colesbury

fbshipit-source-id: c125c1a37d21d185132e8e8b65241c86ad8ad04b
2019-07-29 12:09:50 -07:00
be644d822b fixes #20178 (#23297)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23297

Differential Revision: D16497552

Pulled By: VitalyFedyunin

fbshipit-source-id: 386933b15c27d02351f042be71b153bc9439004d
2019-07-29 12:04:44 -07:00
236149edc5 Make randperm works properly on non-contiguous tensors. (#23043)
Summary:
Close https://github.com/pytorch/pytorch/issues/22710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043

Differential Revision: D16446340

Pulled By: VitalyFedyunin

fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9
2019-07-29 11:59:04 -07:00
d6ff78fd00 fix an over-indented return in trace_module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23358

Differential Revision: D16519010

Pulled By: Krovatkin

fbshipit-source-id: a7e4225b70e915d91c74874e3eca9bcb87baf84c
2019-07-29 11:15:55 -07:00
505fa83b2f Implement named inference rule for mul
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23193

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23193 gh/zou3519/75/head

Imported from OSS

Differential Revision: D16494401

Pulled By: zou3519

fbshipit-source-id: 0e2395d7de39158ec51feed5da0389715ec52600
2019-07-29 09:58:18 -07:00
d3fcb4ccd3 Try another version of apt/dpkg killing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23499

Test Plan: Imported from OSS

Differential Revision: D16542875

Pulled By: ezyang

fbshipit-source-id: 05aa97f2d61e4fc00a819768448944f85701cab8
2019-07-29 09:13:24 -07:00
8ada7c9920 Remove two CMAKE_ build options from additional_options. (#23451)
Summary:
Following up 915261c8bef85e3b845a0384d3fb55e55707b609
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23451

Differential Revision: D16542303

Pulled By: ezyang

fbshipit-source-id: 1406c311c198eb237f85d6d8f1f0d58626be8257
2019-07-29 08:13:59 -07:00
09ba4df031 Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445)
Summary:
Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445

Differential Revision: D16542275

Pulled By: ezyang

fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2
2019-07-29 08:13:56 -07:00
b335f3910f Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455)
Summary:
- MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts.
- Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used.
- Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455

Differential Revision: D16542274

Pulled By: ezyang

fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac
2019-07-29 08:08:47 -07:00
81e46d4f78 Fix build issue. CUDA may be installed in $CUDA_HOME/lib on macOS. (#23491)
Summary:
Closes gh-16955.
Closes https://github.com/pytorch/vision/issues/977

On Linux both `lib64` and `lib` may be present (symlinked). The reports
seem to all be about macOS, but it seems like this is also possibly more
robust on Linux and can't hurt. So not treating platforms differently.

Note that Eigen has a similar check in its CMake:

```
if(CUDA_64_BIT_DEVICE_CODE AND (EXISTS "${CUDA_TOOLKIT_ROOT_DIR}/lib64"))
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib64")
else()
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib")
endif()
 ```

There may be other issues for building from source on macOS, can't test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23491

Differential Revision: D16538973

Pulled By: soumith

fbshipit-source-id: cc309347b7d16e718e06878d3824d0a6e40b1019
2019-07-29 08:08:43 -07:00
97f129bf0a Let set_rng_state and get_rng_state accept string parameter (#23448)
Summary:
Currently set_rng_state and get_rng_state do not accept string as their parameters. This commit let them accept strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23448

Differential Revision: D16527172

Pulled By: soumith

fbshipit-source-id: 8f9a2129979706e16877cc110f104770fbbe952c
2019-07-29 08:08:39 -07:00
7a82066282 Update PyTorch Docker image to 323. (#23389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23389

Test Plan: Imported from OSS

Differential Revision: D16541971

Pulled By: ezyang

fbshipit-source-id: 2b7e483f4d6eedef7f5c140ffc0fac21fecd179b
2019-07-29 07:29:54 -07:00
f546a3b8d8 fixing documentation, issue 22697 (#23268)
Summary:
As fmassa commented :

> Agree, it should probably be weight, start, end
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23268

Differential Revision: D16493403

Pulled By: zou3519

fbshipit-source-id: 51ed07f6f7abdbd41dc323570aed41d804fa9c1b
2019-07-29 07:24:49 -07:00
19858f7cc6 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 981
LARGE: 56

Updated actions:
From MEDIUM to LARGE: 10
From LARGE to MEDIUM: 3
From LARGE to XLARGE: 1

Differential Revision: D16532427

fbshipit-source-id: c58bf59e6c571627b3994f8cdfa79758fb85892b
2019-07-29 04:37:23 -07:00
91d28026f8 Remove unused cuBLAS driver functions for getrs (#23375)
Summary:
Changelog:
- Remove getrs driver functions from THCBlas{.h/.cpp}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23375

Test Plan: - Build to pass to confirm no callsites were missed.

Differential Revision: D16539079

Pulled By: soumith

fbshipit-source-id: b5c285a2d36714ddf3393337eec7d85b1eaf3f51
2019-07-28 21:29:18 -07:00
54c280863c Add some compiler flags for building cpp extensions on Windows (#23472)
Summary:
(1) Add `COMMON_MSVC_FLAGS` to the flags in the ninja codepath
(2) Add `/EHsc` to `COMMON_MSVC_FLAG`
(3) Remove `-fPIC` and `-std=c++11` from the flags in the windows codepath
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23472

Differential Revision: D16532993

Pulled By: soumith

fbshipit-source-id: bc2d983f5f8b4eae9c7385bf170f155679e92e87
2019-07-28 20:33:18 -07:00
ef6356133e Revert D16428208: [pytorch][PR] only scatter in forward if multi-device per process
Differential Revision:
D16428208

Original commit changeset: eaa3876b2b95

fbshipit-source-id: 9db3bc86bf419dd06fdaaff434f72b92ecb5a427
2019-07-27 22:41:20 -07:00
64e4152064 Clarify that torch.device without an index will always represent the current device (#23468)
Summary:
Per discussion in https://github.com/pytorch/pytorch/issues/23448
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23468

Differential Revision: D16532950

Pulled By: soumith

fbshipit-source-id: 48c97060aaf55f1d7589afab42c6cd623d71a9a7
2019-07-27 06:49:52 -07:00
ffef0e03b7 Enabling GPU device runs for operators (#23461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461

Enabling GPU device runs for production operator shapes.

Reviewed By: xw285cornell, mingzhe09088

Differential Revision: D16526928

fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388
2019-07-26 18:53:40 -07:00
23e526e6ff Fix SourceRange comparison
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23341

Test Plan: Imported from OSS

Differential Revision: D16505398

Pulled By: jamesr66a

fbshipit-source-id: 0bf6a1a054c7749c0a3334654d5746dd9f5dee96
2019-07-26 18:08:43 -07:00
3497891c14 add sorted keyword for lists and dicts (#23274)
Summary:
Add `sorted` keyword to JIT for lists and dicts. This desugars to a list copy and a call to `list.sort()`. Since we don't have interfaces yet I implement it in terms of `list.sort()`. When we do we can re-visit implementing this op in a different manner.

The test fails bc of a fix to specialized lists which is landing here: https://github.com/pytorch/pytorch/pull/23267

Ignore the first commit because it is formatting, plz use clang_format ppl :'(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23274

Differential Revision: D16527323

Pulled By: eellison

fbshipit-source-id: aed8faef23cb790b9af036cd6c1b9b1d7066345d
2019-07-26 17:44:15 -07:00
f0ebf769de allow accepting empty input to the benchmark (#23462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462

as title

Reviewed By: hl475

Differential Revision: D16527176

fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc
2019-07-26 17:30:42 -07:00
522cca5040 Support IntList in Dict's shalloEquals (#23205)
Summary:
Required when comparing IntList type of default values

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23205
ghstack-source-id: 87208341

Reviewed By: zrphercule

Differential Revision: D16433809

fbshipit-source-id: 3f60d67d708129be31198161423d819108468077
2019-07-26 17:30:38 -07:00
d6d7a5f075 only scatter in forward if multi-device per process (#22384)
Summary:
Scatter is unnecessary if only using one device, and it breaks on some custom data structures like namedtuple, so would like to avoid :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22384

Differential Revision: D16428208

Pulled By: soumith

fbshipit-source-id: eaa3876b2b95c1006ccaaacdb62f54c5280e730c
2019-07-26 17:30:34 -07:00
e1ae3a75c8 gate module::save logic on mobile (#23415)
Summary:
This is part of the effort to shrink OSS libtorch mobile build size.
We shouldn't need Module::save function on mobile - it depends on
csrc/jit/export.cpp which then depends on ONNX. By gating these two
methods we can avoid these dependencies for libtorch mobile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23415
ghstack-source-id: 87288228

Reviewed By: dreiss

Differential Revision: D16511143

fbshipit-source-id: fd031f91fcf9b7be54cbe1436506965af94ab537
2019-07-26 17:23:38 -07:00
23f963e4a8 Update distributed.rst (#23289)
Summary:
Different backend is supported since https://github.com/pytorch/pytorch/pull/18595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289

Differential Revision: D16528229

Pulled By: soumith

fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0
2019-07-26 16:55:52 -07:00
ca76c82ce3 Add early returns to JIT (#19179)
Summary:
Add early returns to JIT with minimal changes to compiler.cpp and an IR->IR pass that will transform the graph so that there is only one return value.

In compiler.cpp, record when a block will exit so that in the following example will work:
```
if cond:
    a = torch.zeros([2])
else:
    return 2
a += 2
...
```
To match block outputs with values that will not be used, like in the above example with `a`, I add a Bottom Type that subtypes everything else. This allows shape propagation to continue to work, and makes it so that we don't need many extra nodes filling up the graph.

The IR transform currently doesn't work on Loops, I didn't add that to this PR to avoid too much complexity, but will add it as a stack (and it should be very little extra code). the IR  transform is commented at the top of the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19179

Differential Revision: D16519819

Pulled By: eellison

fbshipit-source-id: 322a27f69966d1fd074ebe723c3e948b458b0e68
2019-07-26 16:42:43 -07:00
9223fa1c46 Add support to serialize qtensor in JIT. (#23356)
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428

Differential Revision: D16473237

fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
2019-07-26 15:52:15 -07:00
9dad13e1f0 Revert "Add fbgemm_qlinear_dynamic op (#23104)"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23449

Test Plan: Imported from OSS

Differential Revision: D16524768

Pulled By: ezyang

fbshipit-source-id: 9eb01b021011d1172317b5adb774c10c42ac2b86
2019-07-26 15:02:33 -07:00
953459f29e Dont allow conversions with QInt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22931

Test Plan: Imported from OSS

Differential Revision: D16467985

Pulled By: gchanan

fbshipit-source-id: 3925fc96a641e66b92fa65c542a2a23190c915a5
2019-07-26 14:45:14 -07:00
190d255d2e Add FLOAT_MODULE to quantized conv (#23414)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23414
ghstack-source-id: 87225586

Differential Revision: D16511055

fbshipit-source-id: c617733f60cfe38f4791e35e57e9551f2b5d8c09
2019-07-26 14:02:20 -07:00
83d6c6be07 ONNX export for index_select (#21866)
Summary:
ONNX export for index_select
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21866

Reviewed By: zrphercule

Differential Revision: D16471345

Pulled By: houseroad

fbshipit-source-id: 745c23ba8a3223b5ec59b924df7358a36a92518c
2019-07-26 13:56:15 -07:00
74f8094ea5 Rename threading build options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407

Test Plan:
USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB
BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python
setup.py develop install --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D16522538

Pulled By: ilia-cher

fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5
2019-07-26 13:09:14 -07:00
aae48748f2 Avoid unnecessary tensor clone in Cloneable (#20995)
Summary:
As pointed out by SsnL in https://github.com/pytorch/pytorch/issues/20910, when clone destination is different from the module's device,
`Cloneable` currently calls `clone()` and then `to()` on every parameter and buffer, where the first clone is unnecessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20995

Differential Revision: D15517353

Pulled By: mrshenli

fbshipit-source-id: 6b6dc01560540a63845663f863dea0a948021fa5
2019-07-26 12:46:42 -07:00
53182e53f0 fix observer name in the benchmark output (#23443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443

as title

Reviewed By: hl475

Differential Revision: D16520962

fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c
2019-07-26 12:20:41 -07:00
828c08b4c7 allow passing a list of operators to benchmark (#23442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442

Replace the argument name from `operator` to `operators` which can take a list of operators to test.

Reviewed By: hl475

Differential Revision: D16520779

fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb
2019-07-26 12:20:36 -07:00
0bc90194fb Catch and print exception traceback in parallel_apply() workers (#18055)
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.

This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.

Before:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
    raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

After:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
    ''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
  ...
  File "../models/foo.py", line 319, in bar
    baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055

Differential Revision: D16444972

Pulled By: zhangguanheng66

fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
2019-07-26 11:41:22 -07:00
7499fe72e9 remove c2 tests from benchmark_all_test (#23437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437

as title

Reviewed By: hl475

Differential Revision: D16519770

fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113
2019-07-26 11:12:53 -07:00
e5e2face8f Change handling of DataParallel in ONNX exporter (#23365)
Summary:
Don't automatically unwrap top layer DataParalllel for users. Instead, we provide useful error information and tell users what action to take.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23365

Reviewed By: zrphercule

Differential Revision: D16514273

Pulled By: houseroad

fbshipit-source-id: f552de5c53fb44807e9d9ad62126c98873ed106e
2019-07-26 11:12:49 -07:00
c8c5e11fba Support variadic returns in Schema's operator<< (#23204)
Summary:
old: prim::PythonOp(...) ->
new: prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23204
ghstack-source-id: 87208343

Reviewed By: zrphercule

Differential Revision: D16433592

fbshipit-source-id: 36cbb329188f112e09c3b1708a8090781b830dfe
2019-07-26 10:58:00 -07:00
34f53564b4 Don't warn when using conda compilers with utils.cpp_extension (#23396)
Summary:
The conda compiler are gcc/c++ 7.3.0, but have custom version strings
for clarity:

    x86_64-conda_cos6-linux-gnu-cc
    x86_64-conda_cos6-linux-gnu-c++

Using these compilers to build a C++ or CUDA extension now gives this warning (unnecessarily):

```
                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/home/rgommers/anaconda3/envs/pytorch-nightly/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux.
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23396

Differential Revision: D16500637

Pulled By: soumith

fbshipit-source-id: 5b2fc3593e22e9a7d07dc2c0456dbb4934ffddb2
2019-07-26 10:17:14 -07:00
47a54295ee Add fbgemm_qlinear_dynamic op (#23104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23104

ghstack-source-id: 87247148

As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for ```torch.fbgemm_linear_int8_weight``` (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear.

Differential Revision: D16381552

fbshipit-source-id: 1ccc4174fd02c546eee328940ac4b0da48fc85e8
2019-07-26 10:11:56 -07:00
b7984fa8a7 Remove cases of AT_FORALL_SCALAR_TYPES_EXCEPT_HALF.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22890

Test Plan: Imported from OSS

Differential Revision: D16467980

Pulled By: gchanan

fbshipit-source-id: 93ddbd041b7f65cafe8520b095289f14ad6d667f
2019-07-26 09:58:35 -07:00
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
c8a50a26d2 Named inference rule for torch.prod
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23106

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16419175

Pulled By: zou3519

fbshipit-source-id: beb9ef838525c1ea7d7839cb9b8d68028fb4917f
2019-07-26 08:50:45 -07:00
9817d7e16b Implement named inference rule for torch.sum
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23081

Test Plan:
- New tests [namedtensor ci]

Imported from OSS

Differential Revision: D16419174

Pulled By: zou3519

fbshipit-source-id: 8679f77f121664d0398d7f062a53c0fa37482481
2019-07-26 08:50:40 -07:00
4104e80eae qconv+relu and qlinear+relu modules (#23410)
Summary:
adding qconv+relu and qlinear+relu modules in nn/_intrinsic/quantized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23410

Test Plan:
Extended tests to test these new modules as well

buck test mode/dev caffe2/test:quantized -- 'test_linear_api'  --print-passing-details
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
      ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 4.055 1/1 (passed)
Test output:
> test_linear_api (test_nn_quantized.ModuleAPITest)
> test API functionality for nn.quantized.linear and nn._intrinsic.quantized.linear_relu ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 4.056s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
Summary (total time 10.66s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'  --print-passing-details
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 5.195 1/2 (passed)
Test output:
> test_conv_api (test_quantized_conv.QuantizedConvTest)
> Tests the correctness of the conv functional. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.195s
>
> OK
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 10.616 2/2 (passed)
Test output:
> test_conv_api (test_nn_quantized.ModuleAPITest)
> Tests the correctness of the conv module. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 10.616s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
Summary (total time 17.31s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
``

Differential Revision: D16505333

Pulled By: dskhudia

fbshipit-source-id: 04f45cd0e76dc55f4694d558b913ab2958b7d727
2019-07-26 08:50:36 -07:00
fb8725fdbd Update doc about subsequent builds: options can be changed in build/CMakeCache.txt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23331

Test Plan: Imported from OSS

Differential Revision: D16517622

Pulled By: ezyang

fbshipit-source-id: d2d15b8bb2599b3b9abb7a1aa6bc91bfc0d8e5d0
2019-07-26 08:50:32 -07:00
0b4c0b95e9 For second-time build, let build_type be inferred from CMakeCache.txt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23323

Test Plan: Imported from OSS

Differential Revision: D16517621

Pulled By: ezyang

fbshipit-source-id: 22984df214d01246a7868980e148936698940ea8
2019-07-26 08:50:28 -07:00
beb109f6f1 Enable cpp api test in multigpu-test.sh (#23380)
Summary:
blocking https://github.com/pytorch/pytorch/issues/20995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23380

Differential Revision: D16517013

Pulled By: mrshenli

fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837
2019-07-26 07:44:07 -07:00
46224ef89e Update ONNX docs (#23185)
Summary:
This is still work in progress.

There are several more items to add to complete this doc, including

- [x] LHS indexing, index assignments.
- [x] Tensor List.
- [x] ~Shape/Type propagation.~
- [x] FAQs

Please review and share your thoughts, feel free to add anything that you think should be included as well. houseroad spandantiwari lara-hdr neginraoof
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23185

Differential Revision: D16459647

Pulled By: houseroad

fbshipit-source-id: b401c005f848d957541ba3b00e00c93ac2f4609b
2019-07-26 00:14:54 -07:00
7a0ae0079f export sort to onnx
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21913

Differential Revision: D15982801

Pulled By: houseroad

fbshipit-source-id: 96dbd738c557478fffd48000db7263ae1f9754f5
2019-07-26 00:02:20 -07:00
1c00e0fc3f Added a flatten module (#22245)
Summary:
https://github.com/pytorch/pytorch/issues/2118

I'm not sure I'm doing it correctly, so I'll add tests if we decide that it's roughly correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22245

Differential Revision: D16508957

Pulled By: Chillee

fbshipit-source-id: a8dc7af999ba698c921006889f71cb1bc5a59d50
2019-07-25 22:48:52 -07:00
5b0484d977 Fix forwarding of arguments into kernel function (#23412)
Summary:
They should be forwarded by their actual type, not their rvalue reference.
This looked like perfect forwarding but actually wasn't.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23412
ghstack-source-id: 87214575

Reviewed By: dzhulgakov

Differential Revision: D16507872

fbshipit-source-id: 2b20a37df83067dd53e917fe87407ad687bb147c
2019-07-25 22:00:40 -07:00
3516f3c235 handle exit from init method (#21211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211

There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped.

Reviewed By: zheng-xq

Differential Revision: D15466410

fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7
2019-07-25 21:41:06 -07:00
dd79d45c5a Added torch.bitwise_not docstr (#23397)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/23311
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23397

Differential Revision: D16505107

Pulled By: pbelevich

fbshipit-source-id: 8d515fc27e253469393941c8da23d8e0510e64df
2019-07-25 18:32:58 -07:00
58a3e4f71f Automatic update of fbcode/onnx to 28ca699b69b5a31892619defca2391044a9a6052 (#23404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23404

Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd

Included changes:
- **[28ca699b](https://github.com/onnx/onnx/commit/28ca699b)**: Member Company logo guidelines (#2196) <Prasanth Pulavarthi>
- **[47acb06a](https://github.com/onnx/onnx/commit/47acb06a)**: remove link to outdated issue for contributions wanted (#2186) <Prasanth Pulavarthi>
- **[168519f6](https://github.com/onnx/onnx/commit/168519f6)**: Create sigs.md (#2103) <Prasanth Pulavarthi>
- **[b9320746](https://github.com/onnx/onnx/commit/b9320746)**: mintor format update (#2180) <Prasanth Pulavarthi>
- **[65b8e0f9](https://github.com/onnx/onnx/commit/65b8e0f9)**: add more types support for Equal op (#2176) <Ke Zhang>
- **[dc5e62a9](https://github.com/onnx/onnx/commit/dc5e62a9)**: Update AddNewOP document. (#2172) <Emad Barsoum>
- **[bae8b530](https://github.com/onnx/onnx/commit/bae8b530)**: Add missing space (#2150) <Takeshi Watanabe>
- **[5952b7f5](https://github.com/onnx/onnx/commit/5952b7f5)**: python api example typo fix (#2155) <LeicongLi>
- **[904cb842](https://github.com/onnx/onnx/commit/904cb842)**: Fix errors in RoiAlign shape inference code (#2167) <G. Ramalingam>

Reviewed By: zrphercule

Differential Revision: D16502373

fbshipit-source-id: 81f1e8f0db6828fd551089ae2e0be65153739532
2019-07-25 18:26:04 -07:00
bd54608bd2 fused qconv2d+relu kernel (#23353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23353

Adding support for fused qconv2d + relu

Reviewed By: jianyuh

Differential Revision: D16473318

fbshipit-source-id: cd3c3476a21ffe946dbd9812e833b957c0fd206c
2019-07-25 17:55:47 -07:00
6a8c2758d5 Add better performing versions for groupwise and depthwise convolutions (#22869)
Summary:
Groupwise and depthwise convolutions become faster with this diff
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22869

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qconv'  --print-passing-details

```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
      ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 2.731 1/2 (passed)
Test output:
> test_qconv (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 2.732s
>
> OK
      ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 5.187 2/2 (passed)
Test output:
> test_qconv_unpack (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.188s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
Summary (total time 15.66s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.040 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.402 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
Summary (total time 11.83s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16264144

Pulled By: dskhudia

fbshipit-source-id: 32fa43e5c3d97c8aaa6e0858327a2ac0aef8df5c
2019-07-25 17:55:43 -07:00
2409e6a475 C++ at::Tensor, torch::tensor constructors should not accept QInts.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22889

Test Plan: Imported from OSS

Differential Revision: D16467984

Pulled By: gchanan

fbshipit-source-id: 6e2b1bf2a6a8dbc60138cd437b9cf814a0fc297d
2019-07-25 16:30:25 -07:00
0e3a359a38 Align the operator<< for Argument with FunctionSchema parser (#23203)
Summary:
Align the Argument's operator<< with parser,
additional support:
1) List size
2) real default value
3) Alias information

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23203
ghstack-source-id: 87118985

Reviewed By: zrphercule

Differential Revision: D16433188

fbshipit-source-id: aea5711f93feacd94d1732e2f0d61218a31a0c5c
2019-07-25 15:28:17 -07:00
83b0fbc38d Remove TensorIterator::Builder (#23329)
Summary:
The builder pattern doesn't seem to work well with return-value-optimization.
This saves ~100 ns in the construction of TensorIterator::binary_op.

```
import torch
x = torch.rand(1)
y = torch.rand(1)
z = torch.rand(1)
%timeit torch.add(x, y, out=z)  # ~1.76 us vs ~1.88 us on my machine
```

cc resistor zheng-xq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23329

Differential Revision: D16495070

Pulled By: VitalyFedyunin

fbshipit-source-id: 8ce116075fa4c7149dabfcdfa25885c1187c8e2f
2019-07-25 15:15:49 -07:00
2cfe949d45 DEPRECATE_MESSAGE doesn't actually get expanded; inline it at both sites.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23379

Test Plan: Imported from OSS

Differential Revision: D16495278

Pulled By: ezyang

fbshipit-source-id: 596438fbf3285d6eee7b5d27a014f87b6c261cf1
2019-07-25 14:26:00 -07:00
b755bc1e31 fix importing for module defs that are named "foo.bar"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23367

Test Plan: Imported from OSS

Differential Revision: D16478637

Pulled By: suo

fbshipit-source-id: 30c6e7bfe377ef35d8c39e2d31615075ca0a6a19
2019-07-25 14:07:56 -07:00
b22adeb007 Fix error message for a wrong fork CUDA (#23322)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322

Differential Revision: D16469442

Pulled By: zhangguanheng66

fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc
2019-07-25 12:58:14 -07:00
d18529eb93 Fix upload path for macos binaries (#23386)
Summary:
peterjc123  will this affect windows too?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23386

Differential Revision: D16499443

Pulled By: pjh5

fbshipit-source-id: a3bec32d16f2cd06a097082deae0b020dd8bc5ac
2019-07-25 12:48:04 -07:00
7ee62d3d91 Fix the iOS build (#23293)
Summary:
The legacy iOS build script (`build_ios.sh`) is still working, but the output is in caffe2, not Pytorch. To enable the Pytorch iOS build, we can set the value of `BUILD_CAFFE2_MOBILE` to `NO`, and turn on another cmake arg - `INTERN_BUILD_MOBILE` ljk53  has created for Android.

There is a trivial issue in `used_kernel.cpp` that will cause the compiling error when running `build_ios.sh`, as it uses a `system`API that has been deprecated since iOS 11. The fix below is to bypass this file since it's not needed by mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23293

Test Plan:
The `build_ios.sh` completed successfully, and all the generated static libraries can be compiled and linked successfully on iOS devices.

### Build script

```shell
./scripts/build_ios.sh \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

Differential Revision: D16456100

Pulled By: xta0

fbshipit-source-id: 38c73e1e3a0c219a38ddc28b31acc181690f34e8
2019-07-25 12:41:20 -07:00
071536f895 Fix the operator== for Argument (#23202)
Summary:
type() returns a shared pointer, we should compare its content, not pointer itself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23202
ghstack-source-id: 87125582

Reviewed By: zrphercule

Differential Revision: D16432634

fbshipit-source-id: 639e730dcdc1cec02f280efeea53019b36d9ae37
2019-07-25 11:59:28 -07:00
bbc53bffef AliasAnalysisKind::CONSERVATIVE/FROM_SCHEMA (#22175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22175

- Rename AliasAnalysisKind::DEFAULT to AliasAnalysisKind::CONSERVATIVE
- Introduce AliasAnalysisKind::FROM_SCHEMA that means the alias annotations of the schema should be honored
- Introduce AliasAnalysisKind::INTERNAL_SPECIAL_CASE to be able to run assertions that internal special cased ops are treated correctly

- aten:: and prim:: ops are not treated as special cases anymore, but just use AliasAnalysisKind::FROM_SCHEMA
- There's a set of assertions to ensure that aten:: and prim:: ops are all correctly set up to use AliasAnalysisKind::FROM_SCHEMA. Once this PR lands and passes all tests, we will remove those assertions and open up for the possibility of different AliasAnalysisKind settings for aten:: and prim:: ops

Differential Revision: D15929595

fbshipit-source-id: 7c6a9d4d29e13b8c9a856062cd6fb3f8a46a2e0d
2019-07-25 11:53:51 -07:00
b9202d459a Polish torch::Dict (#23344)
Summary:
torch::List recently received some polishing that now also is done for Dict. This should be done before the PyTorch 1.2 release because of backwards compatibility.

- Dict is just a reference type, so "const Dict" should have the same capabilities as "Dict", constness is not guaranteed in any way.
- DictIterator gets comparison operators <, <=, >, >=

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23344
ghstack-source-id: 87170304

Differential Revision: D16468800

fbshipit-source-id: 2978c3b9cdcfb2cfb3f26516b15bd455d9a48ba9
2019-07-25 11:35:36 -07:00
722f80a07d Align String str() with the format in FunctionSchema (#23201)
Summary:
old: string
new: str

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23201
ghstack-source-id: 87125580

Reviewed By: zrphercule

Differential Revision: D16432340

fbshipit-source-id: 56bc7e8efbc2276315f464958cf38704f75dd06e
2019-07-25 11:31:00 -07:00
7c383ba4a0 Remove superfluous check (#23370)
Summary:
This check is not needed. Even if it were, the assignment is clobbered anyway.

Closes #23300.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23370
ghstack-source-id: 87157671

Differential Revision: D16485329

fbshipit-source-id: 8ccac79e81f5e0d0d20099d550411c161f58c233
2019-07-25 11:26:16 -07:00
39de49c7ec Fix a tiny bug and refactor (#22808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22808

- Use ```size_to_dim_```.
- ```mod``` is not in the scope. Should be ```module```.

Reviewed By: mingzhe09088

Differential Revision: D16225799

fbshipit-source-id: 9a263227d2d508eefdfddfee15fd0822819de946
2019-07-25 11:26:12 -07:00
39fd264799 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23381

Differential Revision: D16496327

Pulled By: pjh5

fbshipit-source-id: 529029544a5f8c8106bcb7cebdc71aee33e3b86c
2019-07-25 10:39:37 -07:00
ed316c0ca0 Align Dict str() with the format in FunctionSchema (#23200)
Summary:
Old: Dict[int, str]
New: Dict(int, str)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23200
ghstack-source-id: 87125581

Reviewed By: zrphercule

Differential Revision: D16432159

fbshipit-source-id: a3dc5fa397697a53e78290d25e19589f757c1eb8
2019-07-25 10:27:41 -07:00
f477cab2dc Add type checks in _intrinsics.modules.fused (#23361)
Summary:
recreated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23361
ghstack-source-id: 87142339

Reviewed By: zafartahirov

Differential Revision: D16455500

fbshipit-source-id: ab2c9d10c7c025ae77f5b80f28e6bd261620a5f7
2019-07-25 09:49:01 -07:00
25e06618c8 Support parsing variadic return schema (#23199)
Summary:
all cases should be prim ops, but let's support it. it will expect variadic return schema to be prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23199
ghstack-source-id: 87113845

Differential Revision: D16431635

fbshipit-source-id: 798b6957ce5d800f7fcf981c86fdcb009cd77a78
2019-07-25 09:39:49 -07:00
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
82545ecc71 Specify build dir as a global variable in BUILD_DIR in the build system.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23318

Test Plan: Imported from OSS

Differential Revision: D16493987

Pulled By: ezyang

fbshipit-source-id: 497e9dd924280f61dde095b4f2b50f5402d9da97
2019-07-25 07:19:47 -07:00
915261c8be Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16493553

Pulled By: ezyang

fbshipit-source-id: 852f4779e70f84a4c9f7bab4c2ae4927248ffc93
2019-07-25 07:19:44 -07:00
f91b19c2aa Do not explicitly set USE_FBGEMM in tools/setup_helpers/cmake.py (#23314)
Summary:
Instead, defer its default value to CMakeLists.txt

NO_FBGEMM has already been handled in tools/setup_helpers/env.py
(although deprecated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314

Differential Revision: D16493580

Pulled By: ezyang

fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2
2019-07-25 07:11:52 -07:00
ba6f65cf33 Add document of functions nn.init.ones_/zeros_ (#23145)
Summary:
Functions `nn.init.ones_` and `nn.init.zeros_` were not documented. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23145

Differential Revision: D16427108

Pulled By: soumith

fbshipit-source-id: 4fac31e79717a436411ef5e107a829b403e576c9
2019-07-25 06:09:50 -07:00
252710262f (#22775)
Summary:
passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures
consistent fusion in nested Blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775

Differential Revision: D16439979

Pulled By: soumith

fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1
2019-07-25 05:54:03 -07:00
0c79753c0d Improve documentation for torch.enable_grad , torch.no_grad and torch.set_grad_enabled (#23310)
Summary:
Modified documentation for ` torch.enable_grad` , ` torch.no_grad` and `torch.set_grad_enabled`.

Fixes https://github.com/pytorch/pytorch/issues/19189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23310

Differential Revision: D16489626

Pulled By: soumith

fbshipit-source-id: f0926e4f51ffd97521e67bee3a16ad954458247a
2019-07-25 05:48:33 -07:00
2938299de1 Fix lint failure introduced in #23346
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23371

Differential Revision: D16489985

Pulled By: pietern

fbshipit-source-id: 914048563bbe7bf5ab897c6f12f4a1bb4bff30e1
2019-07-25 05:17:15 -07:00
0842624d50 Fix upload issue with linux libtorch nightlies (#23368)
Summary:
This is a small fix on top of gh-23348, which fixed the libtorch
nightly build timeouts.

For the latest nighly build (25 July), see
https://circleci.com/workflow-run/33d0a24a-b77c-4a8f-9ecd-5646146ce684
The only failures are these uploads, which is because `aws s3 cp`
can only deal with one file at a time. The only way to make it do
multiple files at once is:
```
aws s3 cp . "$s3_dir" --exclude "*"  --include "libtorch-*.zip" --recursive --acl public-read
```
which is much more verbose. executing one `cp` per file should be fine,
and this is also what's done in `binary_macos_upload.sh`

Closes gh-23039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23368

Differential Revision: D16488853

Pulled By: soumith

fbshipit-source-id: 6dc04b4de2f6cd2de5ae9ad57a6e980f56896498
2019-07-25 04:52:43 -07:00
95e822622b Enhance interpretation of GLOO_SOCKET_IFNAME (#22978)
Summary:
With this change you can now list multiple interfaces separated by
comma. ProcessGroupGloo creates a single Gloo context for every device
in the list (a context represents a connection to every other
rank). For every collective that is called, it will select the context
in a round robin fashion. The number of worker threads responsible for
executing the collectives is set to be twice the number of devices.

If you have a single physical interface, and wish to employ increased
parallelism, you can also specify
`GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`.  This makes ProcessGroupGloo
use 4 connections per rank, 4 I/O threads, and 8 worker threads
responsible for executing the collectives.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978
ghstack-source-id: 87006270

Differential Revision: D16339962

fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1
2019-07-25 04:52:38 -07:00
1dd4d55565 Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292)
Summary:
Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231
To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292

Differential Revision: D16488773

Pulled By: soumith

fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a
2019-07-25 04:42:26 -07:00
e8ad167211 Remove usage of FOLDER constant in test_distributed.py (#23223)
Summary:
This is step 1 in trying to get rid of constants that are set prior to
executing the test runner. All setup logic should be concentrated in
the setupClass() function of the TestCase subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23223
ghstack-source-id: 87005260

Reviewed By: zhaojuanmao

Differential Revision: D16439147

fbshipit-source-id: 7a929ad4b1c8e368e33d1165becbd4d91220882c
2019-07-25 02:55:30 -07:00
711be82951 Make optimize a thread_local flag
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170

Test Plan: Imported from OSS

Differential Revision: D16441912

Pulled By: suo

fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55
2019-07-24 23:09:21 -07:00
b3980f46a2 Replace uint8 with int8 in Linear and LSTM quantization path (#23347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23347

This diff replaces uint8 with int8 to match with the underlying kernel implementation.  When we do int8 quantization,  we are computing with uint8 (input activation) * int8 (weight) -> uint8 (output activation). The weight is quantized into int8.

Reviewed By: jianyuh

Differential Revision: D16469435

fbshipit-source-id: a697655b0e97833fc601e5980970aec4dba53c39
2019-07-24 22:25:12 -07:00
fba325be34 Try kill -9ing apt-get (#23354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23354

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16474254

Pulled By: ezyang

fbshipit-source-id: 0dd7ce02e1aa1a42a24d2af066ebd0ac5206c9a0
2019-07-24 19:27:10 -07:00
ff3cc795c8 Build binaries with local version string specifying CUDA version (#23325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23325

Fixes #19990

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16473826

Pulled By: ezyang

fbshipit-source-id: 466db2c22fabd7b574f0a08aec67a18318ddb431
2019-07-24 18:32:32 -07:00
cf0f3556f6 Enabled cumsum and cumprod for bool tensors (#23346)
Summary:
```
a = torch.tensor([[True, False, True],
                  [False, False, False],
                  [True, True, True]])

>>> torch.cumsum(a, 0)
tensor([[1, 0, 1],
        [1, 0, 1],
        [2, 1, 2]])

>>> torch.cumsum(a, 1)
tensor([[1, 1, 2],
        [0, 0, 0],
        [1, 2, 3]])
```

Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23346

Differential Revision: D16469393

Pulled By: izdeby

fbshipit-source-id: b55f3ca0588f9961a771def40f6ef58932021e1a
2019-07-24 18:16:01 -07:00
c9312e1a8b Open source 3D depthwise conv (#23164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23164

for open source CSN model

Reviewed By: weiyaowang

Differential Revision: D16412312

fbshipit-source-id: bb4e7748e697271563f974ca05878f8832d83653
2019-07-24 17:56:56 -07:00
73dee44ec5 Specifying libtorch variant in libtorch builds (#23348)
Summary:
This should fix all libtorch timeout issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23348

Differential Revision: D16472782

Pulled By: pjh5

fbshipit-source-id: b1f3a783d044eb881f6e8755e0c891093e93c93e
2019-07-24 17:31:43 -07:00
3297d8e203 Switch keys to be sequential and stable in pickle serialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23280

Test Plan: Imported from OSS

Differential Revision: D16452816

Pulled By: zdevito

fbshipit-source-id: e143780b8e834298a575ac76d49576df94fbe27b
2019-07-24 17:13:51 -07:00
93da1030df Fix pickler bug where it would not load if no tensors were saved
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23263

Test Plan: Imported from OSS

Differential Revision: D16446928

Pulled By: zdevito

fbshipit-source-id: f70f86b28c3901a97b65b4d7654e39dc6e1aab6a
2019-07-24 17:13:46 -07:00
7922b5057d Memoize storages in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23262

Test Plan: Imported from OSS

Differential Revision: D16446927

Pulled By: zdevito

fbshipit-source-id: 92d26f64ff6269b1deef821edae31745158b5137
2019-07-24 17:13:42 -07:00
71a047c3e3 Unwrap DataParallel automatically (#23334)
Summary:
Handle DataParallel for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23334

Differential Revision: D16467844

Pulled By: houseroad

fbshipit-source-id: 696aeada437c6c0612ac4ef9c4d51e3386625de0
2019-07-24 16:29:48 -07:00
c23ba35009 Skip QNNpack tests on ppc64le (where support is not enabled) (#23343)
Summary:
Proposed PR for
https://github.com/pytorch/pytorch/issues/23342

Disables execution of QNNpack tests if IS_PPC.
Basically this parallels the same skipping of tests for IS_WINDOWS as well, which is already present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23343

Differential Revision: D16469218

Pulled By: soumith

fbshipit-source-id: 80b651d00e5d413e359cf418f79e20d74cd9c8e1
2019-07-24 15:24:00 -07:00
22c169fb9c Improve the error message for ONNX export (#23317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23317

Print out the kind type when fail to export

Reviewed By: zrphercule

Differential Revision: D16462641

fbshipit-source-id: 27157c0bd597362f90ac8cfb33e1808bac0ec48b
2019-07-24 15:03:05 -07:00
87d3f66506 max_pool_with_indices: return valid indices if all input elements are -inf (#23161)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20465.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23161

Differential Revision: D16442672

Pulled By: ezyang

fbshipit-source-id: 8c2ee13acd73954c7307720c01c732f460266a63
2019-07-24 14:51:39 -07:00
b7d90332ea add notes about overshoot in bicubic mode (#23321)
Summary:
fix https://github.com/pytorch/pytorch/issues/21044

Bicubic interpolation can cause overshoot.

Opencv keeps results dtype aligned with input dtype:
- If input is uint8, the result is clamped [0, 255]
- If input is float, the result is unclamped.

In Pytorch case, we only accept float input, so we'll keep the result unclamped, and add some notes so that users can explicitly call `torch.clamp()` when necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23321

Differential Revision: D16464796

Pulled By: ailzhang

fbshipit-source-id: 177915e525d1f54c2209e277cf73e40699ed1acd
2019-07-24 14:46:37 -07:00
d522b3ca58 BlackBoxPredictor OSS part N: ThreadLocalPtr, InferenceGraph (#23257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23257

Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: zrphercule

Differential Revision: D16428124

fbshipit-source-id: b35deada5c015cd97b91ae12a7ea4aac53bd14b8
2019-07-24 14:35:30 -07:00
2915d53096 Move OptionalType wrapping out of constants.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23234

Pulled By: driazati

Differential Revision: D16460880

fbshipit-source-id: d4e6b747615dbfe73a92ce571d3b2aaae7179f1b
2019-07-24 14:35:26 -07:00
48ca64dbf7 Better error for compiling a module type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23312

Pulled By: driazati

Differential Revision: D16461299

fbshipit-source-id: 11e56c44d561c3fbf70a96c22c5fd494eea0cf19
2019-07-24 14:24:50 -07:00
d6dcec37b6 Add docs about prod ecosystem features (#23010)
Summary:
Covering fleet-wide profiling, api logging, etc.

It's my first time writing rst, so suggestions are definitely welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23010

Differential Revision: D16456721

Pulled By: dzhulgakov

fbshipit-source-id: 3d3018f41499d04db0dca865bb3a9652d8cdf90a
2019-07-24 14:15:33 -07:00
mal
87482bb15a Remove torch::autograd::Node::get_shared_ptr()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23307

Test Plan: Imported from OSS

Differential Revision: D16460972

fbshipit-source-id: 0678627e05dd4c69c4dafa6b717db5cd1a531f56
2019-07-24 13:50:47 -07:00
8fdbe1e10b Support LSTM with FP16 weight (#23291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23291

This diff implements LSTM with FP16 weights based on FBGEMM.

At a high level, here are the steps:
1. Quantize and pack weight in every layer of LSTM
2. Pass weights from step 1 to the ATen `quantized_lstm` function which does matrix multiplication with FP16 weight. The following code shows the dtype of each variable used in MM:
Y  =   X * W + B
(fp32, fp32, fp16, fp32)

Reviewed By: jianyuh

Differential Revision: D16389595

fbshipit-source-id: c26ae4e153c667a941f4af64e9d07fc251403cee
2019-07-24 12:40:11 -07:00
1f608d09cf Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA"
Differential Revision:
D16440000

Original commit changeset: e05683275522

fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914
2019-07-24 12:05:36 -07:00
1a9edfcd36 Prevent xla-build from clobbering pytorch_linux_trusty_py3_6_gcc5_4_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23304

Test Plan: Imported from OSS

Differential Revision: D16459166

Pulled By: zou3519

fbshipit-source-id: 8f5c35ebf1fe6e86705e7fb4fff4c720bcd8f97b
2019-07-24 11:58:44 -07:00
0c0ffccbb6 deepCopy also copies type information of lists (#23271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23271
ghstack-source-id: 87088503

Differential Revision: D16449220

fbshipit-source-id: 551b7cef8f6d0d2d5a56b24ddbe2e0bb2c0c3dbe
2019-07-24 11:53:01 -07:00
895e79adf1 Revert D16441000: Switch from KaTeX to imgmath for documentation rendering.
Differential Revision:
D16441000

Original commit changeset: c1ab557cb816

fbshipit-source-id: cbfec2ca648b614b291debd6b3e215db9fbeb57b
2019-07-24 11:43:17 -07:00
94711d7471 Quantized conv avoid functional usage (#22733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22733

This refactor changes the conv module to avoid the usage of the functional ops.

Reviewed By: jerryzh168

Differential Revision: D15835572

fbshipit-source-id: f2294cd708fbe8372eb3a15cc60d83777d4f7029
2019-07-24 11:43:12 -07:00
67179d71f7 Reduce number of processes spawned for gloo_test.TestCase.test_forked_cw (#23221)
Summary:
It used to be run with comm_size=8, which causes flaky results in a
stress run. The flakiness was caused by too many listening sockets
being created by Gloo context initialization (8 processes times 7
sockets times 20-way concurrency, plus TIME_WAIT).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23221
ghstack-source-id: 86995596

Reviewed By: d4l3k

Differential Revision: D16437834

fbshipit-source-id: 998d0e2b087c0ab15eca64e308059c35e1b51e7b
2019-07-24 11:31:20 -07:00
3ed79f4b6c Fix argument names in torch doc (#22973)
Summary:
I manually went through all functions in `torch.*` and corrected any mismatch between the arguments mentioned in doc and the ones actually taken by the function. This fixes https://github.com/pytorch/pytorch/issues/8698.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22973

Differential Revision: D16419602

Pulled By: yf225

fbshipit-source-id: 5562c9b0b95a0759abee41f967c45efacf2267c2
2019-07-24 11:22:45 -07:00
eb51131fb4 Revert D16423217: [pytorch][PR] Update sleef to master, fixes #20535
Differential Revision:
D16423217

Original commit changeset: 587de3f10e83

fbshipit-source-id: 466e56eab73ce669cc179d08b7f39d2c8b0ffb34
2019-07-24 11:10:15 -07:00
803af9988c Fix the problem in parseOctal and throw exception if use \xhh to specify hex value (#23198)
Summary:
follow the rules:
1) https://docs.python.org/2.0/ref/strings.html
2) https://en.cppreference.com/w/cpp/language/escape
didn't find anything about the format \h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23198
ghstack-source-id: 86986691

Reviewed By: zrphercule

Differential Revision: D16431215

fbshipit-source-id: 7b342708d1984e08b3cbbcf6d487623f1dc63a14
2019-07-24 10:41:59 -07:00
b9a7fc529a Suppress warnings in tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23264

Pulled By: driazati

Differential Revision: D16449965

fbshipit-source-id: ff7d6ddf2275e5f44883a19b6a24c6beaa42ccf4
2019-07-24 10:36:46 -07:00
200cb836f1 Enabled 'add_cuda' for bool and fixed alpha scalar bug (#23044)
Summary:
Enabled 'add_cuda' for bool
Tested via unit tests

Fixed https://github.com/pytorch/pytorch/issues/22431 #22430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23044

Differential Revision: D16368095

Pulled By: izdeby

fbshipit-source-id: 033d28095ff1c5df4078905c52782cf4cf9ed6b0
2019-07-24 10:31:34 -07:00
fbf28b5458 Change TensorIterator to be stack allocated, using named return value optimization to elide copies.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22519

Differential Revision: D16451460

fbshipit-source-id: 6ca6ae2fdf1af5a2f792b42e55279413971b3c46
2019-07-24 10:22:19 -07:00
7203612f85 Update sleef to master, fixes #20535 (#23168)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20535

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23168

Differential Revision: D16423217

Pulled By: ezyang

fbshipit-source-id: 587de3f10e839b94f51f673741b5fda8849e32f6
2019-07-24 08:18:14 -07:00
fd1d06e317 Let Python build scripts accept both CMAKE_BUILD_TYPE and the oldschool DEBUG and REL_WITH_DEB_INFO variables. (#22875)
Summary:
Currently the build type is decided by the environment variable DEBUG
and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be
effective. This makes the interface more consistent with CMake. This
also prepares https://github.com/pytorch/pytorch/issues/22776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875

Differential Revision: D16281663

Pulled By: ezyang

fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026
2019-07-24 08:07:47 -07:00
aa660b8eb7 Re-land "Fix error message for a wrong fork CUDA" (#23209)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209

Differential Revision: D16440000

Pulled By: zhangguanheng66

fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98
2019-07-24 07:01:04 -07:00
4cd726c7b3 Update ROCm CI to python3.6 (#23088)
Summary:
Rehash of https://github.com/pytorch/pytorch/issues/22322 .

Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088

Differential Revision: D16448261

Pulled By: bddppq

fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8
2019-07-23 23:07:45 -07:00
91bef6c168 format sugared_value & compiler.cpp (#23283)
Summary:
there are a lot of formatting changes which makes other diffs to these PRs noisy & hard to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23283

Differential Revision: D16453590

Pulled By: eellison

fbshipit-source-id: 97b4bf1dbbbfb09c44c57402f61ea27287060044
2019-07-23 22:29:22 -07:00
bc15a20db9 Minor refactor: propagating messages in TestCase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23146

Test Plan: Imported from OSS

Differential Revision: D16413801

Pulled By: zafartahirov

fbshipit-source-id: 8009a7afe77e127ddd220fb71c6c272b0d44c733
2019-07-23 21:18:44 -07:00
8a77098247 Make Module::register_module / register_parameter / register_buffer public (#23196)
Summary:
In Python, `register_module` / `register_parameter` / `register_buffer` method in `nn.Module` is public. This PR makes those APIs public for C++ `nn::Module` as well. Closes https://github.com/pytorch/pytorch/issues/23140.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23196

Differential Revision: D16440239

Pulled By: yf225

fbshipit-source-id: e0eff6e1db592961fba891ec417dc74fa765e968
2019-07-23 21:18:41 -07:00
24601daa12 Adding check for a single batch in adaptive_avg_pool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23137

Test Plan: Imported from OSS

Differential Revision: D16403804

Pulled By: zafartahirov

fbshipit-source-id: df79a8c768ffabeceb4c0044c967a623c5885484
2019-07-23 21:11:06 -07:00
mal
e7a9b0d62f Rename torch::autograd::Function to torch::autograd::Node
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269

Test Plan: Imported from OSS

Differential Revision: D16454878

fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af
2019-07-23 20:52:22 -07:00
0ab19d66ee Port lu_solve to ATen (#22379)
Summary:
Changelog:
- Port TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Remove TH/THC implementations
- Update doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379

Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists)

Differential Revision: D16089645

Pulled By: zou3519

fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8
2019-07-23 19:11:35 -07:00
2197bee3da add cudnn.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23265

Differential Revision: D16453611

Pulled By: bddppq

fbshipit-source-id: b49e01b6d019781097ec5072cdc6d37a2988bfbe
2019-07-23 18:09:23 -07:00
a936a90391 caffe2/caffe2/fb/operators/cc_amrc: drop SIMD OpenMP vectorization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23235

Reviewed By: ajtulloch

Differential Revision: D16384612

Pulled By: luciang

fbshipit-source-id: a4c8257c6d3e151ba99167a152ad824b0dde7671
2019-07-23 17:25:00 -07:00
7ed9622fdf Read number of workspaces from argument in recurrent_network_op (#23272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23272

We see significant performance improvements by limiting concurrency
at caffe2 level on mobile. This diff enables setting the number of caffe2
workspaces used during rnn inference.

Reviewed By: akyrola

Differential Revision: D16448611

fbshipit-source-id: 28abaddb4ea60bacb084ceb28cb7a4d1e67ccc17
2019-07-23 17:19:40 -07:00
a35136dd73 Add support for onnx tensor index export (#21716)
Summary:
Support exporting
* Standard tensor indexing like
```
x = torch.ones(4, 5)
ind = torch.tensor([0, 1])

return x[ind]
```
* [Advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing) like
```
x = torch.ones(4,5,6,7,8)
ind1 = torch.tensor([0, 1])
ind2 = torch.tensor([[3], [2]])
ind3 = torch.tensor([[2, 2], [4, 5]])

return x[2:4, ind1, None, ind2, ind3, :]
```
It would be ideal if ONNX can natively support indexing in future opsets, but for opset <= 10 it will always need this kind of workarounds.

There are still various limitations, such as not supporting advanced indexing with negative indices, not supporting mask indices of rank > 1, etc. My feeling is that these are less common cases that requires great effort to support using current opset, and it's better to not make the index export more cumbersome than it already is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21716

Reviewed By: zrphercule

Differential Revision: D15902199

Pulled By: houseroad

fbshipit-source-id: 5f1cc687fc9f97da18732f6a2c9dfe8f6fdb34a6
2019-07-23 17:11:28 -07:00
1de44a6f54 fix specialized list from dict keys (#23267)
Summary:
Previously we weren't specializing the list returned from `dict.keys()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23267

Differential Revision: D16448512

Pulled By: eellison

fbshipit-source-id: fcd2a37ac680bdf90219b099a94aa36a80f4067c
2019-07-23 17:02:19 -07:00
a6ccd62a81 BlackBoxPredictor OSS part 5: glow transforms
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: bertmaher

Differential Revision: D16367134

fbshipit-source-id: fc6bacc1be3ff6336beb57cdad58168d3a2b8c28
2019-07-23 16:39:23 -07:00
bdb1e1305d exclude some caffe2 modules from libtorch mobile build (#20000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20000
ghimport-source-id: f47773ef1c6849cd0c0e65080400416c6b370d39

Test Plan:
- verified libtorch mobile library builds and links successfully;

Imported from OSS

Differential Revision: D15169024

Pulled By: ljk53

fbshipit-source-id: 20ac89c6e7053239c93e51f00c5c5dc3595bea74
2019-07-23 16:20:27 -07:00
1c0309a9a9 make OMP_NUM_THREADS default in launch.py (#22501)
Summary:
per https://github.com/pytorch/pytorch/issues/22260, default number of open mp threads are spawned to be the same of number of cores available, for multi processing data parallel cases, too many threads may be spawned and could overload the CPU, resulting in performance regression.

so set OMP_NUM_THREADS = number of CPU processors/number of processes in default to neither overload or waste CPU threads
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22501

Test Plan:
1. without and with this change, example codes result in same result
      python ~/local/fbsource-fbcode/fbcode/caffe2/torch/distributed/launch.py --nproc_per_node=2 pytorch/examples/yanlizhao/distributed_launch_example.py

  Setting OMP_NUM_THREADS environment variable for each process to be: 24, which
  is max(1, num_cpus / num_processes), you can further tune the variable for optimal performance in your application if needed.
  final loss =  tensor(0.5211, device='cuda:0', grad_fn=<MseLossBackward>)

Differential Revision: D16092225

Pulled By: zhaojuanmao

fbshipit-source-id: b792a4c27a7ffae40e4a59e96669209c6a85e27f
2019-07-23 16:14:24 -07:00
058645acb1 Fusion and _intrinsic modules (#23003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003

torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu

Fusion function to combine specific modules: (conv,bn) and  (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.

TODO: Add tests for _intrinsic modules.

Conv BN fusion code is based on DsKhudia's implementation

Differential Revision: D16199720

fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
2019-07-23 14:54:19 -07:00
7b229342ca Renamed CosineAnnealingLr to CosineAnnealingLR (#23242)
Summary:
fixing https://github.com/pytorch/pytorch/issues/23160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242

Differential Revision: D16443348

Pulled By: pbelevich

fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070
2019-07-23 14:54:15 -07:00
8d4956fd02 hook up dropout sparse with replacement operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23183

Reviewed By: ffjiang

Differential Revision: D16428262

fbshipit-source-id: 0d6e17d15c898629bbd2826441f2c9701a78b0bd
2019-07-23 14:34:25 -07:00
6f01d13728 Implement dropout with replacement for id list features. (#22880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22880

Implement sparse dropout with replacement value.

Reviewed By: xianjiec

Differential Revision: D16267012

fbshipit-source-id: 8c4878230f61bb3ac333291e2c6aaf2fbdc5f9ce
2019-07-23 14:34:21 -07:00
e0f632c58b pickler.cpp: respect __getstate__/__setstate__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23190

Test Plan: Imported from OSS

Differential Revision: D16431553

Pulled By: zdevito

fbshipit-source-id: 680ea1507c12727fd17aedb3067f522cf490e306
2019-07-23 14:27:51 -07:00
bae10db522 Incorporating arguments to pull production operators and adding device type. (#23197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197

Incorporating arguments to pull production operators and adding device type.

Reviewed By: mingzhe09088

Differential Revision: D16387263

fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf
2019-07-23 13:43:26 -07:00
d8220b0599 add simple inheritance support to AST
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23109

Test Plan: Imported from OSS

Differential Revision: D16441914

Pulled By: suo

fbshipit-source-id: 18a57762d376759b98c18bc160eacbcc99f78ee9
2019-07-23 12:21:27 -07:00
017870a633 kill module_lookup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23097

Test Plan: Imported from OSS

Differential Revision: D16383329

Pulled By: suo

fbshipit-source-id: 282f8bac2245d584b66139daf4e5ea7b2b317295
2019-07-23 12:21:23 -07:00
2a37740a86 make RHS of assignment optional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23033

Test Plan: Imported from OSS

Differential Revision: D16383330

Pulled By: suo

fbshipit-source-id: 63c55fae06f0cd534eb5053f91a773431ad052d4
2019-07-23 12:21:19 -07:00
3be0a2b4be Parse all stmts in class defs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23031

Test Plan: Imported from OSS

Differential Revision: D16383327

Pulled By: suo

fbshipit-source-id: 6485109a66e653b7f26d30b91a97af8d71594e22
2019-07-23 12:21:15 -07:00
0dabaad819 Add Module::replace_module to C++ api (#22546)
Summary:
This adds a replace_module method to the C++ api. This is needed to be able to replace modules.

The primary use case I am aware of is to enable finetuning of models.
Given that finetuning is fairly popular these days, I think it would be good to facilitate this in the C++ api as well.

This has been reported by Jean-Christophe Lombardo on the [forums](https://discuss.pytorch.org/t/finetuning-a-model-on-multiple-gpu-in-c/49195).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22546

Differential Revision: D16440289

Pulled By: yf225

fbshipit-source-id: c136f914b8fc5c0f1975d877ea817fda5c851cda
2019-07-23 11:50:06 -07:00
f112c522af LinearReLU module (#23022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23022

will be tested in later diffs.

Added LinearReLU module for qat, allows conversion from torch.nn._intrisic.LinearReLU to torch.nn._intrinsic.qat.LinearReLU

Reviewed By: zafartahirov

Differential Revision: D16286800

fbshipit-source-id: 84cce3551d46e649781b9b6107d4076e10e51018
2019-07-23 11:17:25 -07:00
192dd8faf1 Set correct list type in pybind_utils (#23188)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23188
ghstack-source-id: 87008828

Differential Revision: D16430911

fbshipit-source-id: 9d9d29bf42402e0fff323dfd0ed65fcfd5564fd3
2019-07-23 10:52:38 -07:00
19be7ece15 Fix erase_number_types test (#23181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23181

We can't run dead code elimination after erasing number types because dce relies on graph invariants that erase_number_types breaks.

Reviewed By: houseroad

Differential Revision: D16427819

fbshipit-source-id: d1b98a74d2558b14d4be692219691149689a93d8
2019-07-23 10:23:10 -07:00
e56f11b750 Fix onnx export (#23180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23180

This pass needs to be run later because it breaks jit graph invariants and the lower_all_tuples pass still needs a valid jit graph.

Reviewed By: houseroad

Differential Revision: D16427680

fbshipit-source-id: 427c7e74c59a3d7d62f2855ed626cf6258107509
2019-07-23 10:23:06 -07:00
60afcabc6f DictConstruct sets correct types (#23171)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23171
ghstack-source-id: 87009037

Differential Revision: D16423640

fbshipit-source-id: 0f4f9b12759b8a9defaae775e33e2b0af9bb7791
2019-07-23 10:23:01 -07:00
67aede98c3 Exclude unused onnx targets (#23195)
Summary:
e.g. onnxifi_dummy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23195

Differential Revision: D16441493

Pulled By: bddppq

fbshipit-source-id: 76816e7a7c73f60f3c7abea10fbdbf086cea0476
2019-07-23 10:22:57 -07:00
9d03133c14 ListConstruct sets correct element type (#23189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23189
ghstack-source-id: 86971099

Differential Revision: D16430987

fbshipit-source-id: 9af255075b670e6f811e1a9d104f2738a38e9515
2019-07-23 10:14:35 -07:00
2073cc73f8 Use concrete types in jit test for generic lists (#23192)
Summary:
Creating an untyped generic list is deprecated, we always want type information to be present.

This fixes test cases and removes one that used lists with ambigious types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23192
ghstack-source-id: 86972891

Differential Revision: D16431482

fbshipit-source-id: 4ca5cd142118a3f0a4dcb8cd77383127c54abb29
2019-07-23 10:04:12 -07:00
21f52ce0d4 Remove trailing semicolon from TORCH_CHECK macros.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22339

Test Plan: Imported from OSS

Differential Revision: D16182743

Pulled By: ezyang

fbshipit-source-id: 3c4ac0abe49ce83901bd5b07279a135857035f80
2019-07-23 09:58:50 -07:00
174f7a586f Switch from KaTeX to imgmath for documentation rendering.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23025

Test Plan: Imported from OSS

Differential Revision: D16441000

Pulled By: ezyang

fbshipit-source-id: c1ab557cb8163e9c69585c32d237c076582a6d73
2019-07-23 09:44:37 -07:00
792d527746 Fix typos in comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23130

Differential Revision: D16402755

Pulled By: ezyang

fbshipit-source-id: 8bf9767c0012aed8ad91289bbaf2d979f130d728
2019-07-23 09:44:33 -07:00
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
e4b75c6580 Fix typo in dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132

Differential Revision: D16402759

Pulled By: ezyang

fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5
2019-07-23 08:45:47 -07:00
45d3f495ef Add document of function torch.as_strided (#22842)
Summary:
Documentation of `torch.as_strided` and `Tensor.as_strided` is missing. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22842

Differential Revision: D16254106

Pulled By: soumith

fbshipit-source-id: dee142483fb9ef7bea84bd44a970b6eccdcdc471
2019-07-23 06:06:00 -07:00
c9e62f6988 Update nccl to 2.4.8-1 (#23186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23186

Differential Revision: D16438723

Pulled By: soumith

fbshipit-source-id: ff4f5b9c7383b92e5cf2053a87caf2ac11be7aeb
2019-07-23 05:35:32 -07:00
9a6ae5c0b1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: aab8ded966d718befb664a6e968eedc6bbe7cb5e
2019-07-22 22:47:52 -07:00
d7448c7812 quantized conv module (#23178)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23178
ghstack-source-id: 86973164

Differential Revision: D16426871

fbshipit-source-id: a2ebb38997acfeb61b7dfd6b11dd8ee9b3a7a8ed
2019-07-22 20:47:40 -07:00
f3a37278cc ConvReLU2d module (#23008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23008

Added ConvReLU2d module to convert from nn._intrinsic.ConvReLU2d to nn._intrinsic.qat.ConvReLU2d

Differential Revision: D16286670

fbshipit-source-id: 2903d825175911c0095497369f313bf2a2eb3833
2019-07-22 20:47:36 -07:00
eb5137a5d1 Export torch.arange to ONNX (#22601)
Summary:
Some overlap with https://github.com/pytorch/pytorch/pull/21716 regarding caffe2 nonzero. Will rebase the other one accordingly whichever gets merged first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22601

Reviewed By: zrphercule

Differential Revision: D16224660

Pulled By: houseroad

fbshipit-source-id: dbfd1b8776cb626601e0bf83b3fcca291806e653
2019-07-22 20:30:39 -07:00
06d11f0434 Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA
Differential Revision:
D16368004

Original commit changeset: 44b6977790ce

fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb
2019-07-22 18:46:48 -07:00
3861520603 Verify flatten works for quantized Tensor (#23121)
Summary:
Added a test in `test_torch.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23121
ghstack-source-id: 86983227

Differential Revision: D16391409

fbshipit-source-id: 04e72b2f753a0a6ddbf58d55b794e443b18a2156
2019-07-22 18:34:25 -07:00
a24f6c13a3 Fix broken indexing when using None and ellipses indexing together (#22905)
Summary:
https://github.com/pytorch/pytorch/issues/20153

I believe you need 2 passes for this. Take this example
```python
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None].shape
```
which results in `[10, 9, 8, 7, 6, 1, 1]`
vs
```
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None, :].shape
```
which results in `[10, 9, 8, 7, 1, 1, 6]`
After only processing `x[..., None, None` we don't know whether we should be creating a new dimension at the end of the dimension list or somewhere in the middle. What we do depends on the elements to the right of it.

Thus, I do 2 passes - one to collect all the dimensions that the index operations operate on, and another that executes the index operations.

This still doesn't work for an ellipse index followed by a tensor index, but it wasn't working previously either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22905

Differential Revision: D16433558

Pulled By: Chillee

fbshipit-source-id: c1b303cb97b1af8b6e405bad33495ef3b4c27c4a
2019-07-22 18:11:23 -07:00
648f10be16 Fix load op to return the shape info as before when loading multiple blobs (#23182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23182

This fixes the issue seen in D16390551

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16428140

fbshipit-source-id: 8ef6705ab2efb346819489e1f166e23269f7ef8a
2019-07-22 15:53:40 -07:00
1c574458b0 nn_quantized test (#23169)
Summary:
- scale/zero_point in quantized modules should be Tensor
- fix conv module permutation API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23169
ghstack-source-id: 86956383

Reviewed By: zafartahirov

Differential Revision: D16423570

fbshipit-source-id: d29498e07bdd8f71a33b4e16e089f80847bbca6d
2019-07-22 15:53:36 -07:00
e08f8f45ff Turning on fbgemm for nightlies (#22784)
Summary:
fbgemm requires a AVX512 which requires a more recent compiler, so this also switches all the nightlies from devtoolset3 to devtoolset7. Since CUDA 9.0 doesn't support devtoolset7, we also switch from CUDA 9.0 to CUDA 9.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22784

Differential Revision: D16428165

Pulled By: pjh5

fbshipit-source-id: c1af3729d8edce88a96fa9069d4c5a1808c25f99
2019-07-22 15:09:11 -07:00
a6e45a69a8 Fix error message for a wrong fork CUDA (#23030)
Summary:
Fix https://github.com/pytorch/pytorch/issues/17357
Unblock 1.2 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030

Differential Revision: D16368004

Pulled By: zhangguanheng66

fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288
2019-07-22 15:04:32 -07:00
3ca7c0ffdb Add get_accessed_features function to ModelLayer class (#23036)
Summary:
We need a way to figure get a complete list fo features that are used in training a model.  One way to do this is to make it possible to get the list of features used in each Model Layer.  Then once the model is complete we can go through the layers and aggregate the features.

I've introduced a function to expose that information here, get_accessed_features, and implemented it in the FeatureSparseToDense layer to start with.

I've tried to include the minimum amount of information to make this useful, while making it easy to integrate into the variety of model layers.  This is, for example, why AccessedFeatures does not contain feature_names which is not always present in a model layer.  I debated whether or not to include feature_type, but I think that's useful enough, and easy enough to figure out in a model layer, that it's worth including.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23036

Test Plan:
Added a unit test to verify the behavior of get_accessed_features in FeatureSparseToDense.

aml_dper2-fblearner-flow-integration-tests failed due to a known issue D16355865
aml_dper3-fblearner-flow-integration-tests failed due to a known issue T47197113

I verified no tests in the integration tests failed to issues other than those known ones.

DPER2 canaries: https://fburl.com/fblearner/1217voga

Reviewed By: volkhin

Differential Revision: D16365380

Pulled By: kevinwilfong

fbshipit-source-id: 2dbb4d832628180336533f29f7d917cbad171950
2019-07-22 15:04:28 -07:00
ff23a02ac4 Pin numba to 0.44.0 to fix Windows CI.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23176

Test Plan: Imported from OSS

Differential Revision: D16426873

Pulled By: ezyang

fbshipit-source-id: 10d800db78416137504c396711dc45109f6f5ca4
2019-07-22 14:59:15 -07:00
b6d06d5496 Remove empty THCThreadLocal{.h/.cpp} (#23157)
Summary:
These files were removed from the build process and cleaned in https://github.com/pytorch/pytorch/pull/9735.

Closes https://github.com/pytorch/pytorch/issues/22572
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23157

Differential Revision: D16426819

Pulled By: soumith

fbshipit-source-id: aa01aec9fe0e3af456ba8b75ae85d0b1df2a8ed9
2019-07-22 14:59:11 -07:00
fdfc676eb6 Invert ownership between PyFunction and THPFunction.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22983

Test Plan: Imported from OSS

Differential Revision: D16422209

Pulled By: ezyang

fbshipit-source-id: d6e41a1606484fbbd7a95a547b83a4199151be68
2019-07-22 14:13:14 -07:00
ae5b52086e Support converting Python number to IValue in pybind_utils.h (#22817)
Summary:
I ran into the following error when trying to pass a Python int as an arg to `torch::jit::createStackForSchema`, and I think it is due to the missing support for `NumberType` in [toIValue](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/pybind_utils.h#L448).

> RuntimeError: Missing cases in toIValue for type: Scalar! File a bug report. (toIValue at ../torch/csrc/jit/pybind_utils.h:449)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22817

Differential Revision: D16276006

Pulled By: mrshenli

fbshipit-source-id: 7f63519bb37219445e836ec1f51ca4f98bf52c44
2019-07-22 14:01:30 -07:00
2becbd3faa BlackBoxPredictor OSS part 4: Open-source other transforms (#23099)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23099

Test Plan:
salex@devvm4218:caffe2 { (fcdaf96|HISTEDIT)}$ submit_canary --q tw_adindexer_canary_on_canary_tier && submit_canary --q tw_adfinder_canary_on_canary_tier && submit_canary prospector_repl
ay_canary
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851419/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717789681292057
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GBYe_ANnNNBnbWsDAAAAAABJPvJBbjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851536/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717806884923980
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GArl_QPncP7tc30IAAAAAACfza93bjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851661/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717823090263325
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GNcyAwRrfFd0MIUIAAAAAABLOINibjEQAAAz

Differential Revision: D16288332

Pulled By: salexspb

fbshipit-source-id: 95899dede6b11a2ae14703b9aaea8e1a677f0aaa
2019-07-22 13:53:43 -07:00
27031dccb2 Updating producer_version in exported ONNX models to pytorch 1.2. (#23120)
Summary:
Bumping up the producer_version in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23120

Reviewed By: zrphercule

Differential Revision: D16420917

Pulled By: houseroad

fbshipit-source-id: 6686b10523c102e924ecaf96fd3231240b4219a9
2019-07-22 13:45:39 -07:00
7e31c02afe Fixed deprecated use of yaml.load
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22985

Test Plan: Imported from OSS

Differential Revision: D16425112

Pulled By: Chillee

fbshipit-source-id: ef0c764c3fd2518b9284d9a20e84d677ebd8f277
2019-07-22 13:25:27 -07:00
76291829ba Refactor named inference rule for reductions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23075

Test Plan: Imported from OSS

Differential Revision: D16419173

Pulled By: zou3519

fbshipit-source-id: 187639b563336f935e5f06351dd0b680de1aadfd
2019-07-22 13:12:03 -07:00
b4b51ed5ec Implement tensor.size(Dimname), tensor.stride(Dimname)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22989

Test Plan: Imported from OSS

Differential Revision: D16364437

Pulled By: zou3519

fbshipit-source-id: 393a93fecac27b5d3b1a7f7692590d8fd5e95a5d
2019-07-22 13:11:59 -07:00
965b97f5f0 Bidirectional GRU and LSTM C++ API forward fix (#22850)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/17998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22850

Differential Revision: D16420854

Pulled By: pbelevich

fbshipit-source-id: 76f38be40d8479fb9cafba92939cea61d81fd336
2019-07-22 12:59:47 -07:00
e5797e9350 Revert D16390551: Fix load op to return the shape info as before when loading multiple blobs
Differential Revision:
D16390551

Original commit changeset: 1055b481a7a9

fbshipit-source-id: ea50a71e3d446a74bd04d9945710cc4ccee63c87
2019-07-22 12:48:14 -07:00
fcdfc35d1c Support get/setstate with no args (#23119)
Summary:
`pickle` supports this and a lot of the quantized use cases for get/set
state follow this pattern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23119

Pulled By: driazati

Differential Revision: D16391234

fbshipit-source-id: 9f63e0a1679daa61b17aa64b5995e2be23b07b50
2019-07-22 12:32:29 -07:00
858d4a6a04 Cleanup API and remove 'experimental' warning (#23000)
Summary:
This fixes ASAN test issues with https://github.com/pytorch/pytorch/pull/21786 seen at https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2212325/output/105/0?file=true and lands it again.

This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc yf225, lanpa, natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23000

Reviewed By: sanekmelnikov

Differential Revision: D16349734

Pulled By: orionr

fbshipit-source-id: 604a9cad56868a55e08b509a0c6f42b84f68de95
2019-07-22 12:10:05 -07:00
fad3031b5c Fix type hints for None constants (#23029)
Summary:
The type hint was being ignored when emitting `None` constants, this also de-dups some testing code
](https://our.intern.facebook.com/intern/diff/16364572/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23029

Pulled By: driazati

Differential Revision: D16364572

fbshipit-source-id: 64f3abd3e37ee49c209480a85ed4f1b8802e5d93
2019-07-22 11:55:05 -07:00
2891784a72 Resolve with closed over variables instead of stack frame (#22270)
Summary:
Previously we looked at the stack frame of the function that called
`script` to resolve variables. This doesn't work if someone calls script
with a function defined somewhere else that references captured
variables. We already have a mechanism to look at the closed over
variables for a function, so this changes the `rcb` to use that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22270

Pulled By: driazati

Differential Revision: D16391346

fbshipit-source-id: ad9b314ae86c249251b106079e76a5d7cf6c04c2
2019-07-22 11:44:36 -07:00
fd90b967b2 Fix load op to return the shape info as before when loading multiple blobs (#23166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23166

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16390551

fbshipit-source-id: 1055b481a7a9e83021209e59f38a7cc0b49003cf
2019-07-22 11:27:59 -07:00
82db5dceb6 Added running via throughput benchmark options. (#23077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077

Although the difference between running from python and this is not much if we
have forward method's loop long enough (like 1000 in this case).

Reviewed By: mingzhe09088

Differential Revision: D16122343

fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78
2019-07-22 11:27:55 -07:00
2ba516d5b6 Added add op framework overhead benchmark for C2 (#23078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078

C2 benchmark.

Reviewed By: mingzhe09088

Differential Revision: D16122337

fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a
2019-07-22 11:27:50 -07:00
0621068cdc Add simple add op based framework overhead benchmark. (#23076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076

Tracing based and non tracing based added

Reviewed By: mingzhe09088

Differential Revision: D16097280

fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892
2019-07-22 11:27:45 -07:00
4223e2f9e9 fix qat tests (#23124)
Summary:
missed instantiating observers in linear

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23124
ghstack-source-id: 86886705

Reviewed By: raghuramank100

Differential Revision: D16401066

fbshipit-source-id: f9f0f359caeca855c62192d13261a33eef57715a
2019-07-22 10:28:35 -07:00
8bc28cc898 Remove cuda free mutex (#23040)
Summary:
Revision of https://github.com/pytorch/pytorch/issues/22173 to address CI failure after merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23040

Differential Revision: D16366872

Pulled By: mrshenli

fbshipit-source-id: 747b6ecf2dc195c25f82b8f732ae9ff52cd3a394
2019-07-22 07:58:29 -07:00
22f7c9e31b (#23105)
Summary:
Fixed a [bug](https://github.com/pytorch/pytorch/issues/22992) where passing result tensor into masked_select wont work with bool mask.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23105

Differential Revision: D16386676

Pulled By: izdeby

fbshipit-source-id: 93a1e9bfbc916c8a8eaa149a70a5553f3711f53e
2019-07-22 07:49:30 -07:00
aeee49d51d Revert "Temporarily skip mypy-0.720 to unbreak master type checks"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23095

Test Plan: Imported from OSS

Differential Revision: D16383149

Pulled By: zou3519

fbshipit-source-id: ca6bdfe0f51f6bdbd4d95142a880f3902f60676d
2019-07-22 06:54:22 -07:00
b8c8977be7 Update ScatterWeightedSum Op (#23087)
Summary:
Update ScatterWeightedSum op when there exists only one weighted X to update slice of Y which is usually the case when the op is used for gradient update. The changes remove the copy overhead and seeing significant operator performance improvement

- 25 - 50% improvment on CUDA based on input configuration

-  ~50% improvement on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23087

Differential Revision: D16385194

Pulled By: bddppq

fbshipit-source-id: 3189e892940fb9c26305269eb0d47479b9b71af0
2019-07-21 22:21:40 -07:00
ff8cb9f622 hipify: do not overwrite files that stay the same (#23112)
Summary:
This is a small patch to not overwrite unchanged files to help a bit with building.
It is not as incremental as one might like, given that one has to pass `--out-of-place-only` to not run into the patching and things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23112

Differential Revision: D16402623

Pulled By: bddppq

fbshipit-source-id: 531ce0078bc716ae31bd92c5248080ef02a065b9
2019-07-21 22:00:53 -07:00
2ac9abf759 Fix memory leak in Adam, Adagrad, RMSProp (#23125)
Summary:
As reported in LaurentMazare/tch-rs#76, the memory grows when weight_decay is present when using Adam. It applies the same fix in https://github.com/pytorch/pytorch/issues/23007 to Adam, Adagrad and RMSProp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23125

Differential Revision: D16402421

Pulled By: soumith

fbshipit-source-id: 59eb4bd81b8bd9e1a5f7c068ed841f70a4c38a80
2019-07-21 10:06:18 -07:00
96b6797fc0 improve enforce in cross_entroy_op (#23062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23062

as title

Reviewed By: xianjiec, BIT-silence

Differential Revision: D16374601

fbshipit-source-id: 62219c6abde311ebc8a0e6a03cfb517d80bb52b5
2019-07-21 00:07:58 -07:00
3e66385002 Lint fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23135

Test Plan: Imported from OSS

Differential Revision: D16403272

Pulled By: zafartahirov

fbshipit-source-id: 31f9eb11216c494a8327bcb5dc37e47a77611e2b
2019-07-20 21:46:18 -07:00
963707c5ea MaxPool2d in the torch (#22765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22765

the pooling signature is the same as the non-quantized one. Adding it to the native_functions.yaml

Reviewed By: jerryzh168

Differential Revision: D16102608

fbshipit-source-id: 7627ad8f02a231f488b74d1a245b853f89d9c419
2019-07-20 21:41:09 -07:00
cf3e6478ad Concat with out (#22408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22408

Quantized Concatenation with out argument

Reviewed By: jianyuh

Differential Revision: D16061526

fbshipit-source-id: 61487cf87763665df19feb8e678da72fd66e8740
2019-07-20 16:13:14 -07:00
05f088ec22 make jit logging visible, so it can be used in a TVM compiler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23041

Differential Revision: D16402934

Pulled By: Krovatkin

fbshipit-source-id: 715f9821809527e94bd7f01f1680db046c888e6c
2019-07-20 14:37:49 -07:00
bb9119f67d Use set -x to help investigate doc push errors (#23111)
Summary:
I couldn't find any verbosity options in the [`docker pull` command docs](https://docs.docker.com/engine/reference/commandline/pull/), but
`docker pull` [got a `--quiet` option](https://github.com/docker/cli/pull/882) in a recent version (not sure if we're using that version), and `--quiet` for `docker push` [is forthcoming](https://github.com/docker/cli/pull/1221).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23111

Differential Revision: D16402993

Pulled By: kostmo

fbshipit-source-id: 52f77b11b839d28f8cf1ecb58518ca69632d7fbe
2019-07-20 12:36:05 -07:00
a62c687445 Remove unused atomics detection code. (#23089)
Summary:
USE_{C11,MSC,GCC}_ATOMICS are not used in PyTorch or submodules. Now we remove their underlying detection code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23089

Differential Revision: D16402750

Pulled By: ezyang

fbshipit-source-id: fde84b958eb0b5b4d3f0406acefa92ab30ea43be
2019-07-20 10:52:53 -07:00
4e5f70089f fix indexing for more than 65535 elems in non-indexed first dim (#23123)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22843, also adds test from https://github.com/pytorch/pytorch/issues/23102
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23123

Differential Revision: D16402422

Pulled By: soumith

fbshipit-source-id: aa7a79159ed947be03ce3725ec8abcf5246a60bf
2019-07-20 06:17:43 -07:00
6791f395f9 support at::view and at::reshape for quantized tensor (#23046)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23046
ghstack-source-id: 86840501

Differential Revision: D16368897

fbshipit-source-id: 9da232c11f21af5f850cd9545e56996a81791d00
2019-07-19 23:34:04 -07:00
a03205ed66 Move THTensor_compute_stride to ATen (#23045)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23045
ghstack-source-id: 86842517

Differential Revision: D16368860

fbshipit-source-id: 8970a73758afadbc9a6a3e263cdcfe5e2fd9cc0d
2019-07-19 23:14:11 -07:00
0d8324b18a Add fused modules in nn._intrinsic (#23085)
Summary:
Using nn.Sequential to represent fused modules

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23085
ghstack-source-id: 86883096

Differential Revision: D16379521

fbshipit-source-id: 57d67cb947de8665bd758848595a4a000366153a
2019-07-19 23:04:25 -07:00
47af41fe72 Quantized concatenation (+fused relu). (#21749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21749

This is the first version without "requantization"

Reviewed By: jerryzh168

Differential Revision: D15807940

fbshipit-source-id: 19bb0482abed8ed9d1521a3fa1f15bda8e6a6a7c
2019-07-19 22:23:41 -07:00
9f4df63c2c Moving np function to test area
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23118

Test Plan: Imported from OSS

Differential Revision: D16400634

Pulled By: zafartahirov

fbshipit-source-id: 44872fdf64b20a6b67e5176042fe58c8c2359738
2019-07-19 22:11:21 -07:00
77353636de Conv module (#23084)
Summary:
Added Conv module for qat

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23084
ghstack-source-id: 86862445

Differential Revision: D16379417

fbshipit-source-id: 742cc8b8e0f132070ca4943a1c2e3db60c2b5bdc
2019-07-19 18:49:52 -07:00
b964bdb53a Fbgemm fp16 tensor support (#23101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101

Support for
- Shape inference
- Tensor info extraction

Reviewed By: zrphercule

Differential Revision: D16345251

fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79
2019-07-19 17:08:03 -07:00
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
79c4f83fbe Include module names in recursive error stacks (#22921)
Summary:
Following on to #22280, this adds module names so they're included in
the call stacks of an error message (e.g. so it appears as `M.forward`
instead of `forward`)
](https://our.intern.facebook.com/intern/diff/16287925/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22921

Pulled By: driazati

Differential Revision: D16287925

fbshipit-source-id: 6f31d72caa87ba2dc527805d36f7d62eb94c0808
2019-07-19 16:09:14 -07:00
7cc029cb75 Quantization aware training in eager mode (#23082)
Summary:
Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650

Differential Revision: D16379374

fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
2019-07-19 14:57:25 -07:00
c09e92255c Add initial support for serializing classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22953

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D16340214

Pulled By: zdevito

fbshipit-source-id: 70fb1968eca34e14492e0d2be52e28b27813f821
2019-07-19 14:51:59 -07:00
6334edc2d8 BlackBoxPredictor OSS: open-source NQL and custom transforms (#22877)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison

This specific diff:

There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22877

Test Plan:
did a bunch of unit tests locally and now

waitforsandcaslte

AdFinder canary:
https://our.intern.facebook.com/intern/ads/canary/419623727275650390

adindexer:
https://our.intern.facebook.com/intern/ads/canary/419623750891549182

prospector:
https://our.intern.facebook.com/intern/ads/canary/419644899887610977
https://our.intern.facebook.com/intern/ads/canary/419645123742738405

Differential Revision: D16267765

Pulled By: salexspb

fbshipit-source-id: 776a1cd5415e0695eae28254b3f155e7a9bd8c2b
2019-07-19 14:37:56 -07:00
f2f3e8ad8c fix overspecializing constants in compilation (#22816)
Summary:
When we specialize the tensor type of constants in compilation it causes all sorts of problems.

Fix for https://github.com/pytorch/pytorch/issues/22809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22816

Differential Revision: D16384094

Pulled By: eellison

fbshipit-source-id: f33c00d92d87108749d09bf037a6e74c5d9adaa2
2019-07-19 14:19:49 -07:00
a302821c5d Adding more binary documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23093

Differential Revision: D16384838

Pulled By: pjh5

fbshipit-source-id: 0ce91c2f3f0ec8f5c026622f27039b36c42a81d4
2019-07-19 14:06:34 -07:00
Jie
a28ffaf350 (#22827)
Summary:
1. Fix out of range memory access for reduction on all dimensions for non-packed
tensor.

2. Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22827

Differential Revision: D16271897

Pulled By: zdevito

fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d
2019-07-19 13:38:17 -07:00
818828e8a8 Only import PIL when needed (#23023)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22389

In most cases we only import `PIL` methods when we need them, but we missed a spot.

cc lanpa natalialunova sanekmelnikov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23023

Reviewed By: sanekmelnikov

Differential Revision: D16373492

Pulled By: orionr

fbshipit-source-id: b08bf8a9b5a861390eadf62eda21ac055777180f
2019-07-19 13:30:43 -07:00
mal
44493a623e Pass variable_list of inputs to _wrap_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23037

Test Plan: Imported from OSS

Differential Revision: D16380071

fbshipit-source-id: ae3333c02ef8a3c09b95bec7b8e92ce649553615
2019-07-19 12:31:23 -07:00
2ee0f0bc3a add break continue to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23091

Differential Revision: D16382604

Pulled By: eellison

fbshipit-source-id: 47432d844c811ecd87ad97155e835b07ae8056cc
2019-07-19 12:17:00 -07:00
6dfecc7e01 Remove deprecated linear algebra functions (and methods) (#22841)
Summary:
Changelog:
- Removed the following linear algebra functions in PyTorch in favor of the renamed operations
  - `btrifact` (use `lu` instead)
  - `btrifact_with_info` (use `lu` with `get_infos=True` instead)
  - `btrisolve` (use `lu_solve` instead)
  - `btriunpack` (use `lu_unpack` instead)
  - `gesv` (use `solve` instead)
  - `pstrf` (use `cholesky` instead)
  - `potrf` (use `cholesky` instead)
  - `potri` (use `cholesky_inverse` instead)
  - `potrs` (use `cholesky_solve` instead)
  - `trtrs` (use `triangular_solve` instead)

- Removed dead code after the removal of `pstrf`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22841

Test Plan:
- All existing tests should pass to verify that the removal is clean

Closes https://github.com/pytorch/pytorch/issues/22832

Differential Revision: D16346184

Pulled By: zou3519

fbshipit-source-id: f748d16ed7609c028de6adcbc28684d5a1af0678
2019-07-19 11:43:06 -07:00
61a683c212 Delete aten/src/ATen/out.txt (#23050)
Summary:
An file introduced in 5c0e0589509540fc991a88ffc48e96cc76fd799d , probably by mistake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23050

Differential Revision: D16379947

Pulled By: ezyang

fbshipit-source-id: b7fa8995028e180603d7830b6f170a7a57310385
2019-07-19 10:27:59 -07:00
5417ddbdae Fix get_all_math_dtypes for device='cuda' retuning None (#23028)
Summary:
This PR fixes the invalid None return when calling get_all_math_dtype(device='cuda').

Issue came from the __append__ method which doesn't have any return value used in `return dtypes.append(...)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23028

Differential Revision: D16362732

Pulled By: colesbury

fbshipit-source-id: 0bbc30a0c663749d768159f1bc37b99f7263297b
2019-07-19 09:29:16 -07:00
84c2c89e2c Revert D16199356: [qat] Quantization aware training in eager mode
Differential Revision:
D16199356

Original commit changeset: 62aeaf47c12c

fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666
2019-07-19 03:18:48 -07:00
f19aa12ae5 Revert D16274792: [qat] Conv module
Differential Revision:
D16274792

Original commit changeset: 1da10194123b

fbshipit-source-id: 71b34774b463f2350289bd39b8cfd798e095ffa5
2019-07-19 03:18:45 -07:00
c362e72d4a Revert D16349133: [quant] Add fused modules in nn._intrinsic
Differential Revision:
D16349133

Original commit changeset: 04d862ac4a0d

fbshipit-source-id: d96d9d98e9b29fddf93d4106621752abb00947eb
2019-07-19 03:18:41 -07:00
2401a05aae Revert D16373996: [fix] conv module missing return
Differential Revision:
D16373996

Original commit changeset: 1ec85d23c9dd

fbshipit-source-id: e507db59405aa240d20f132c3d6df323b241a542
2019-07-19 03:06:39 -07:00
25f0dc3490 BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)
Summary:
This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`.
The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy.
1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`;
2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory.

With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`:

1. before (unit: iterations/sec):
```bash
408/408 [00:24<00:00, 16.69it/s]
```
2. after (unit: iterations/sec):
```bash
408/408 [00:16<00:00, 24.06it/s]
```

The latency reduces from `59.92 ms` to `41.56ms` correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851

Differential Revision: D16056334

Pulled By: dzhulgakov

fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a
2019-07-19 00:54:29 -07:00
12ac9171db fix error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22982

Differential Revision: D16356464

Pulled By: soumith

fbshipit-source-id: 3ddd5de4cf5c000dcf5b2faed39283dc715cba25
2019-07-18 23:38:55 -07:00
cdfdeb74af conv module missing return (#23058)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23058
ghstack-source-id: 86807313

Reviewed By: jianyuh

Differential Revision: D16373996

fbshipit-source-id: 1ec85d23c9ddd9975bc32f6c5d30cde04eb1109e
2019-07-18 22:24:56 -07:00
6601978012 Use ProfiledTensorType in peephole.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22767

Differential Revision: D16342954

Pulled By: Krovatkin

fbshipit-source-id: a577ea942ff4bab6ae15f14d6ba04a68675c70aa
2019-07-18 21:49:45 -07:00
d153b0b58b Updating submodules
Reviewed By: yns88

fbshipit-source-id: 87bb7a817dea65783436d6d6dfbbd492724d20a7
2019-07-18 20:43:55 -07:00
23badc60f3 Fix TBB build for older versions of cmake
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23038

Test Plan:
with-proxy pip install --upgrade cmake==3.11.0
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.13.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.6.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

Imported from OSS

Differential Revision: D16365699

Pulled By: ilia-cher

fbshipit-source-id: cbf779dff63e4e186d9b4c2fc21539a24ce0d5a2
2019-07-18 20:12:26 -07:00
e57b682abf Add fused modules in nn._intrinsic (#22999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22999

Using nn.Sequential to represent fused modules

Reviewed By: zafartahirov

Differential Revision: D16349133

fbshipit-source-id: 04d862ac4a0d20e83dc9d6de6b7d0d0c26bdedfd
2019-07-18 18:58:11 -07:00
12d9d768b8 Conv module (#22899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22899

Added Conv module for qat

Reviewed By: zafartahirov

Differential Revision: D16274792

fbshipit-source-id: 1da10194123b2759a6a35c60d1c2d2c0b569ccdc
2019-07-18 18:58:07 -07:00
65ef671d11 Quantization aware training in eager mode (#22732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732

Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Reviewed By: zafartahirov

Differential Revision: D16199356

fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
2019-07-18 18:58:03 -07:00
8dfbbf7bf2 Add nn.qat.Linear (#22714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22714

We need this module for add fake_quant for weight

Reviewed By: zafartahirov

Differential Revision: D16193585

fbshipit-source-id: ed6c04ecf574ca1fe1dcded22c225da05976f7a3
2019-07-18 18:27:27 -07:00
b6011c3caf Update torchvision in CI. (#22754)
Summary:
To include dea1afbf5e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22754

Differential Revision: D16366676

Pulled By: zhangguanheng66

fbshipit-source-id: abfcb785973f9caa2a5aa1154fa689bbba8ff2dd
2019-07-18 18:22:24 -07:00
358e0d3d44 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 3998c7b50a8b0377e8f1748a8dbd3b7d2afc99a4
2019-07-18 16:36:25 -07:00
9897ec4701 Recursively compile class types (#22475)
Summary:
Try to compile for class types encountered in recursive script
](https://our.intern.facebook.com/intern/diff/16340717/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22475

Pulled By: driazati

Differential Revision: D16340717

fbshipit-source-id: 5e1a46db517be2412f57156efbc4eb3347b01a8a
2019-07-18 15:43:16 -07:00
425d28c30a Reapply: optimize topk on cpu using parallel and partial sort (#19736) (#22865)
Summary:
https://github.com/pytorch/pytorch/issues/19736 was reverted as it was suspected to be broken on the master, trying to reapply
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22865

Differential Revision: D16265457

Pulled By: VitalyFedyunin

fbshipit-source-id: 784bd6405471f15a8a49ebd0f3e98160d7d0679e
2019-07-18 14:15:54 -07:00
c1c4014bba Add warning for legacy autograd function (#22922)
Summary:
When working on https://github.com/pytorch/pytorch/pull/22762, we discovered that we haven't actually deprecated legacy autograd function. This PR puts up the deprecation warning for 1.2, with the goal to remove legacy function support completely in the near future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22922

Differential Revision: D16363916

Pulled By: yf225

fbshipit-source-id: 4b554010a3d1f87a3fa45cc1aa29d019c8f1033c
2019-07-18 14:02:17 -07:00
a2b3403962 Mark protobuf include path as system include (#23012)
Summary:
To suppress (many) compiler warnings from protobuf headers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23012

Differential Revision: D16364573

Pulled By: bddppq

fbshipit-source-id: adbc4921e29389131d43e7bcc1e6fcba19450c76
2019-07-18 13:44:39 -07:00
84d892b645 Remove DistributedDataParallelCPU as DDP now supports CPU models (#22864)
Summary:
cc ailzhang aazzolini yifuwang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22864

Differential Revision: D16358011

Pulled By: mrshenli

fbshipit-source-id: 8db2dc035dea03f07a32c749e754f625fda1bf28
2019-07-18 12:50:45 -07:00
a5e6586618 Revert D16357177: [pytorch][PR] Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup.
Differential Revision:
D16357177

Original commit changeset: f4ca9cd46cc6

fbshipit-source-id: 49e66e7e59df6cbc7f5d847bacc07da134067956
2019-07-18 12:28:46 -07:00
11d257e5df Fix SGD memory leak when there is weight_decay (#23007)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/20146. I am working on another PR that adds CPU and CUDA memory leak checking to all C++ API tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23007

Differential Revision: D16358973

Pulled By: yf225

fbshipit-source-id: 5ee7ed4e61e60424031540a633e1fae09d9df171
2019-07-18 12:10:10 -07:00
502766e99e Add the mathematical definition of torch.sign to clarify this is the sgn function.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22894

Differential Revision: D16345027

Pulled By: ezyang

fbshipit-source-id: 1421571f1f8764539a35b9060d90ea6075f889d3
2019-07-18 11:45:27 -07:00
662fe699c5 Named inference rules for some initializer fns
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22972

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16342782

Pulled By: zou3519

fbshipit-source-id: 25277688ab51e1e98af0e19aeb9c79399171d2fb
2019-07-18 10:04:29 -07:00
57cec0a720 Named inference rules for split/chunk
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22971

Test Plan: Imported from OSS

Differential Revision: D16342783

Pulled By: zou3519

fbshipit-source-id: 379edc8eb2f45a82ee8a6320f8285f8f81ea0b1b
2019-07-18 10:04:25 -07:00
6b70217a7e Adding README for binaries to OSS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23001

Differential Revision: D16359617

Pulled By: pjh5

fbshipit-source-id: bfe3f0e1dcb00f34e9362a74227e8a0bb90a8aaf
2019-07-18 10:04:21 -07:00
b91ab177a0 Add support to print QTensor in cpp (#22950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22950

Print quantized tensor by first dequantizing it and then printing. Also print the scale, zero_point. size and type of tensor.

Reviewed By: jerryzh168

Differential Revision: D16286397

fbshipit-source-id: 2d6fb1796e5b329a77c022b18af0a39f6edde0d7
2019-07-18 09:44:20 -07:00
0c091380cc disable non-deterministic cudnn ctcloss (#22977)
Summary:
Associated issue: https://github.com/pytorch/pytorch/issues/21680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22977

Differential Revision: D16357873

Pulled By: nairbv

fbshipit-source-id: 58711bac7d3e8390e868d594dc265ba053a1537c
2019-07-18 08:28:42 -07:00
29347cc9cf Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. (#22173)
Summary:
There are two mutexes within CUDACachingAllocator that cause a deadlock.  One of the mutexes was added in order to work around the issue of NCCL interacting poorly with cudaFree.  See

- 68ff58d771
- https://github.com/pytorch/pytorch/pull/880

As of NCCL version 2 and its new group start/end APIs, the protection surrounding cudaFree() is no longer needed.  The PyTorch code was updated to use the NCCL2 group start/end API, but the corresponding cuda_free_mutex and its getter getFreeMutex() were not revised.  This PR removes the use of the getFreeMutex() when NCCL2 is used by moving calls to getFreeMutex() into the AutoNcclGroup.  That way, depending on the NCCL version used, we either use the mutex or we use the new group APIs.

The race condition is as follows, thanks to skeelyamd:

The deadlock occurs between hip_free_mutex (aka cuda_free_mutex in github) (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L165) and mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L162).

hip_free_mutex is exported from THCCachingAllocator in getFreeMutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L660) and is acquired in ProcessGroupNCCL::collective (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L397), which then calls back into THCCachingAllocator via c10::cuda::CUDACachingAllocator::recordStream (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L416 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L655 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L379).  At this point it acquires mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L384).

This requires hip_free_mutex to be locked before mutex.

However, in free_blocks (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L505) THCCachingAllocator locks hip_free_mutex.  Free_blocks is called from emptyCache (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L328) which locks mutex.

That requires mutex to be locked before hip_free_mutex.

emptyCache and ProcessGroupNCCL::collective may not be executed concurrently but this is occurring and deadlocking the CPU.

free_blocks is also called by malloc (via cuda_malloc_retry -> free_cached_blocks -> free_blocks) which also locks mutex first and so malloc must not execute concurrent with ProcessGroupNCCL::collective.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22173

Differential Revision: D16357177

Pulled By: pietern

fbshipit-source-id: f4ca9cd46cc6d5e15290d99577d88be3f4fa8972
2019-07-18 07:31:02 -07:00
14ecf92d42 Slightly improve irfft doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22995

Differential Revision: D16356435

Pulled By: soumith

fbshipit-source-id: f6cfd9990fd79faebfb566704359c866ddf36525
2019-07-18 03:12:49 -07:00
c2df54d6d0 avg_pool2d avg_pool3d for LongTensor (#22433)
Summary:
Generate avg_pool2d/avg_pool3d for LongTensor for CPU.
Added divisor_override parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433

Differential Revision: D16108809

Pulled By: ifedan

fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50
2019-07-17 19:59:09 -07:00
52bf38007b Remove usage of legacy autograd function (#22925)
Summary:
We are planning to put up a deprecation warning for legacy autograd function in 1.2: https://github.com/pytorch/pytorch/pull/22922. This PR removes all usage of legacy function in PyTorch core and test suite, to prepare for the eventual removal of legacy function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22925

Differential Revision: D16344834

Pulled By: yf225

fbshipit-source-id: 8bf4cca740398835a08b7a290f3058c3e46781ba
2019-07-17 19:50:36 -07:00
29853293d7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 4f801d353ee14ec0bd6fd24830f0e7a4343d67f8
2019-07-17 18:05:09 -07:00
992f3860a3 Quantized relu to native_functions (#22316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22316

Adding the quantized ReLU to the native_functions.yamp, as it has the same signature as non-quantized relu

Reviewed By: jerryzh168

Differential Revision: D16038441

fbshipit-source-id: 1cfbb594eb9bca1b7ec49ca486defcf1908b0d26
2019-07-17 17:31:02 -07:00
e24f18cea0 Revert D15854892: [pytorch][PR] [tensorboard] Cleanup API and remove 'experimental' warning
Differential Revision:
D15854892

Original commit changeset: 06b849882694

fbshipit-source-id: 588edc4616d020a23645f8c8181782c8412c4c6e
2019-07-17 16:45:54 -07:00
a0ef4abeed Add missing comment from #22103 (#22984)
Summary:
One important comment is missing from https://github.com/pytorch/pytorch/issues/22103 (not sure what happened).
This commit makes it up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22984

Differential Revision: D16347044

Pulled By: ezyang

fbshipit-source-id: 0903909a5fb6740b43195136f1a23c28cfb2a02f
2019-07-17 16:21:38 -07:00
442dd7b906 Implement "trimmed lasso" regularization and support all available regularization in a single interface (#22966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22966

We want to implement "trimmed lasso" for feature selection with learnable and regularizable weights. Trimmed lasso is a simple yet powerful improved version from traditional lasso. More reference can be found at https://arxiv.org/abs/1708.04527 and http://proceedings.mlr.press/v97/yun19a.html. For quick and necessary intro, please refer to P1-3 of the paper at https://arxiv.org/abs/1708.04527.

Given n weights, traditional lasso sums up all weights' l1 norms. The trimmed lasso takes an input integer k (how many weights you want to select from n) and only sums over the smallest n - k weights. Given lambda as the regularization constant, the penalty term is only on the smallest n - k weights, but not other larger weights. If lambda becomes larger than certain threshold, the smallest n - k weights are shrunk to zero. That means we have those weights "dropped". With this property, the number k is the number of weights left after lasso, which we can easily control.

Meanwhile, we further support all available regularization in a single interface. Current supported regularizers on weights include no reg, l1, l2, elastic, trimmed l1, elastic with trimmed l1, group l1, and logbarrier.

Differential Revision: D16326492

fbshipit-source-id: 6e1fd75606005d9bc09d6650435c96a7984ba69c
2019-07-17 16:12:31 -07:00
eb76b7a564 Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6
Differential Revision:
D16199862

Original commit changeset: 46ca6029a232

fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b
2019-07-17 14:26:56 -07:00
796a39ba85 Automatic update of fbcode/onnx to 707064980b9825b8705b9d1c9aad34d8b022d5dd (#22981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22981

Previous import was 806aa863020fa180e57f576cb032ec44ce8ddcca

Included changes:
- **[70706498](https://github.com/onnx/onnx/commit/70706498)**: TensorProto::INT8 & INT16 were missed here (#2164) <ZINEKS>
- **[8218a4ea](https://github.com/onnx/onnx/commit/8218a4ea)**: Fix LabelEncoder's shape inference (#2170) <Wei-Sheng Chin>
- **[0f1a9a1c](https://github.com/onnx/onnx/commit/0f1a9a1c)**: Fixing a unit test in Cumsum Operator (#2157) <Jeff Saremi>
- **[2c03cff0](https://github.com/onnx/onnx/commit/2c03cff0)**: [New Operator] CumSum (#2030) <Jeff Saremi>
- **[220b8300](https://github.com/onnx/onnx/commit/220b8300)**: Fix globalpool output shape (#2147) <daquexian>

Reviewed By: benoitsteiner

Differential Revision: D16341736

fbshipit-source-id: 7e7a2684d8c821991231bfd6558f9f6cb4fb05fb
2019-07-17 14:05:14 -07:00
031b406c38 Update ROCm CI to python3.6 (#22322)
Summary:
Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Open tasks/questions:
* RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped?
* for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322

Differential Revision: D16199862

Pulled By: ezyang

fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8
2019-07-17 13:42:30 -07:00
5adba33c01 Use integer floor division for pooling shape computation (#22304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21935 by using the integer floor division that was introduced for convolution shapes in https://github.com/pytorch/pytorch/issues/9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't.

Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22304

Differential Revision: D16181955

Pulled By: ezyang

fbshipit-source-id: a2405372753572548b40616d1206848b527c8121
2019-07-17 13:23:29 -07:00
332824551c Fix F.one_hot doc signature
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22929

Differential Revision: D16290741

Pulled By: ezyang

fbshipit-source-id: d8b979e64d92b94c5a70bb4ffe2a83042ed6abfc
2019-07-17 13:23:25 -07:00
074afd7143 Remove unneeded IValue copy in unpickler.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22883

Test Plan: Imported from OSS

Differential Revision: D16270330

Pulled By: zdevito

fbshipit-source-id: ffd05b8c6860889d75172a288f339a434af76d45
2019-07-17 11:00:38 -07:00
b6adb568fb Cleanup some logic in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22882

Test Plan: Imported from OSS

Differential Revision: D16270332

Pulled By: zdevito

fbshipit-source-id: 714f293493965b13e471945fde11831a04875604
2019-07-17 11:00:34 -07:00
3c0814ffeb add docs to onnx APIs (#22938)
Summary:
Add docs to onnx APIs, including
  - export
  - export_to_pretty_string
  - is_in_onnx_export

Fix https://github.com/pytorch/pytorch/issues/14698
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22938

Differential Revision: D16296182

Pulled By: zhangguanheng66

fbshipit-source-id: 1a1fa769b430db6428e6dfafba5447e6e2a75517
2019-07-17 10:50:41 -07:00
4861527446 Cleanup API and remove 'experimental' warning (#21786)
Summary:
This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc ezyang lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21786

Reviewed By: natalialunova

Differential Revision: D15854892

Pulled By: orionr

fbshipit-source-id: 06b8498826946e578824d4b15c910edb3c2c20c6
2019-07-17 10:34:00 -07:00
2630109727 always restore dlopen flag in dyndep (#22958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22958

When we use `extension_loader.DlopenGuard()` to dyndep or import modules, it sets a `RTLD_GLOBAL` flag, and restores the original flags after the `yield`. However, if the modules is not there, yield will fail, and the flags won't be restored, creating all kinds of symbol conflict problems.

Reviewed By: bddppq

Differential Revision: D16311949

fbshipit-source-id: 7b9ec6d60423ec5e78cae694b66c2f17493840b0
2019-07-17 10:26:25 -07:00
35b6cdc2eb Rewriting hypothesis_utils (#22830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22830

Separating the tensor generation and the generation of the quantization parameters

- Introducing hypothesis filter `assume_not_overflowing`, which makes sure that the generated tensor and qparams play well with each other. **Note: This is an expensive filter!**
- `qtensor` -> Renameed to `tensor`
- `qtensor_conv` -> Renamed to `tensor_conv2d`
- The tensors don't return the quantization parameters anymore, use `qparams` for it
- The `dtypes` argument is just a quantized dtype now.
- The enforcement for zero_point is predefined as before. As before, if set to `None` the zero_point will be sampled. However, if `None`, you can override sampling with `zero_point_min` and `zero_point_max`
- Scale sampling can also be overriden using `scale_min` and `scale_max`

Reviewed By: jerryzh168

Differential Revision: D16234314

fbshipit-source-id: 5b538a5aa9772b7add4f2ce5eff6fd0decd48f8e
2019-07-17 10:16:13 -07:00
b96610bf5a fix the CI job for onnx (#22946)
Summary:
ONNX uses virtualenv, and PyTorch doesn't. So --user flag is causing problems in ONNX ci...

Fixing it by moving it to pytorch only scripts. And will install ninja in onnx ci separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22946

Reviewed By: bddppq

Differential Revision: D16297781

Pulled By: houseroad

fbshipit-source-id: 52991abac61beaf3cfbcc99af5bb1cd27b790485
2019-07-17 09:50:06 -07:00
f72d754877 qlinear operator level benchmark (#22914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914

Adding op level benchmarking for qlinear operator

Reviewed By: mingzhe09088

Differential Revision: D16285204

fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb
2019-07-17 09:13:17 -07:00
7a99f3987b Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16286198

Pulled By: zou3519

fbshipit-source-id: a5a6ec829084bdb752ca6006b8795227cbaf63b1
2019-07-17 07:38:23 -07:00
5911cb8e5c Make load() create only one CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22727

Differential Revision: D16197603

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 3eaefe6f229032b109d63a151fe0a20268b5cf56
2019-07-16 20:08:10 -07:00
ec57d9215f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 7eb29a58ff20b8ff0b793a84eb2f00e0a1bbe4b5
2019-07-16 19:53:06 -07:00
7ed82ea461 Added generation of transpose and dilated 2D and 3D for LongTensor (#22594)
Summary:
Added implementations: transpose2D transpose3D dilated2D and dilated3D for LongTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22594

Differential Revision: D16155462

Pulled By: ifedan

fbshipit-source-id: af57330314bc2c3e0a38b9e75105b20030a1f9bb
2019-07-16 18:58:39 -07:00
bcfa023a00 hardshrink_cpu and hardshrink_backward_cpu refactoring with at::native::cpu_kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22459

Differential Revision: D16132625

Pulled By: pbelevich

fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5
2019-07-16 18:58:35 -07:00
ef36046ad7 Better error message for using Python builtin_function_or_method (#22935)
Summary:
* better error in `toSugaredValue`
* removes a bunch of periods on error messages, `ErrorReport` already adds a `:` at the end of the message](https://our.intern.facebook.com/intern/diff/16291079/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22935

Pulled By: driazati

Differential Revision: D16291079

fbshipit-source-id: 478724fc7d1ae79093f4ede18553ffeafa2c7964
2019-07-16 16:49:04 -07:00
25b69997c3 Tensorboard Metrics (#22492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22492

Collect metrics about Tensorboard usage
[internal] fbcode/pytorch/tensorboardX/tensorboardX/writer.py
[OSS] fbcode/caffe2/torch/utils/tensorboard/writer.py
Tensorboard Ondemand
https://fb.quip.com/JRvqAKtzgy6z

Reviewed By: dzhulgakov

Differential Revision: D16105544

fbshipit-source-id: de14e6ec781889e367a6eba39fc777f707628263
2019-07-16 16:18:00 -07:00
7793ab0871 More documentation about the pyobj field.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22885

Test Plan: Imported from OSS

Differential Revision: D16283076

Pulled By: ezyang

fbshipit-source-id: 4f6a87d900c4d430eedc90661de89e0f6916347e
2019-07-16 14:47:38 -07:00
cd11109c2e Fix messed up tests for dropout (#22893)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22109.

I've confirmed with suo that this wasn't intentional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22893

Differential Revision: D16288640

Pulled By: Chillee

fbshipit-source-id: 00fd6fe418ecefb304866a723051d0e5451ba4d5
2019-07-16 14:17:11 -07:00
8ced53d62b Correct the check of whether src is defined in copy_. (#22715)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22715

Differential Revision: D16205243

Pulled By: ezyang

fbshipit-source-id: 9bf5a7885691d057198ae482259b36c1773457dd
2019-07-16 14:03:43 -07:00
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
7586ffdc57 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 30926ecb8fabee3f020ae183bb568a11145bcada
2019-07-16 13:37:24 -07:00
01f03d56ee Revert D16283037: Add sanity checks for NCCL detection.
Differential Revision:
D16283037

Original commit changeset: fc09c9443a56

fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e
2019-07-16 13:20:43 -07:00
7a370dbb41 Enable recursive script mode as the default (#22887)
Summary:
This fixes up the test suite (mostly just adding `ignore` decorations
to tests that need to call Python function) so that it all passes with
recursive script enabled.

The main user-facing result of this change is that Python functions are
compiled without any decorators, so non-TorchScriptable code must be
decorated with `torch.jit.ignore` (or
`torch.jit.ignore(drop_on_export=True` to maintain the functionality of
the current `ignore`)

Details can be found in #20939
](https://our.intern.facebook.com/intern/diff/16277608/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22887

Pulled By: driazati

Differential Revision: D16277608

fbshipit-source-id: 0abd0dc4291cf40651a1719bff813abb2b559640
2019-07-16 13:00:08 -07:00
eaee0c6cd9 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22902

Test Plan: Imported from OSS

Differential Revision: D16278159

Pulled By: suo

fbshipit-source-id: 6aa682e347847e808b44218d38ff1dae66945a07
2019-07-16 12:04:20 -07:00
b6a88b3344 Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901

Test Plan: Imported from OSS

Differential Revision: D16278160

Pulled By: suo

fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a
2019-07-16 12:04:16 -07:00
c6fe864db3 Add key_padding_mask kwarg to Transformer (#22588)
Summary:
Motivation:
The forward method of MultiheadAttention has a kwarg a key_padding_mask. This mask is of shape (N,S) where N is batch and S is sequence length. This mask is applied prior to attention softmax where True values in the mask are set to float('-inf'). This allows you to mask position j from attention for all position i in input sequence. It's typically used to mask padded inputs. So for a sample in a batch we will be able to make sure no encoder outputs depend on padding inputs. Currently the Transformer, TransformerEncoder, and TransformerEncoderLayer do not have this kwarg, and only have options for a (S,S), (T,T), and (S,T) masks which are applied equally across the batch for source input, target output, and target-source memory respectively. These masks can't be used for padding and are instead used for things like subsequent masking in language modeling, by masking the attention of position i to position j.

This diff exposes the key_padding_mask to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods which is ultimately passed to MultiheadAttention forward.

Open question: should we also allow a key_padding_mask for the decoder layer? As padding is usually at the end of each sentence in a batch and sentences are usually decoding from left to right, usually people deal with padding on decoded outputs by just masking those outputs at the loss layer. There might be some scenarios where it's needed though I don't think it would be common. People can also still just subclass and override the layers. We could also pass the input key_padding_mask to the memory <> decoder attention layer. Not sure if that's necessary though because the output of position i from each attention encoder layer won't depend on any masked positions in the input (even if position i is a masked position itself) so there's not really any point in masking position i again.
Adds the key_padding_mask kwarg to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods.
The standard TransformerEncoderLayer uses a MultiheadAttention layer as self_attn. MultiheadAttention forward method has a key_padding_mask kwarg that allows for masking of values such as padding per sequence in a batch, in contrast to the attn_mask kwarg which is usually of shape (S,S) and applied equally across the batch.

MultiheadAttention calls functional.multi_head_attention_forward, which has the same key_padding_mask kwarg of shape (N,S). Masked (True) values are set to float('-inf').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22588

Test Plan:
buck test mode/dev caffe2/test:nn -- 'test_transformerencoderlayer \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_Transformer_cell \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_transformer_args_check \(test_nn\.TestNN\)'

Differential Revision: D16112263

Pulled By: lucasgadams

fbshipit-source-id: dc4147dd1f89b55a4c94e8c701f16f0ffdc1d1a2
2019-07-16 11:57:22 -07:00
9b9546a498 replace ByteTensor with bool in fill_test (#22913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913

as title

Reviewed By: hl475

Differential Revision: D16285248

fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828
2019-07-16 11:51:55 -07:00
31497799b9 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16283037

Pulled By: ezyang

fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6
2019-07-16 11:32:36 -07:00
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
3ea04b59c0 Resolve the doc issue in which two asterisks have weird links. (#22896)
Summary:
Asterisks start emphases in rst. We should either escape them or put them as interpreted text.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22896

Differential Revision: D16282869

Pulled By: zou3519

fbshipit-source-id: 15ec4286434db55fb8357b1a12e6f70ef54f8c66
2019-07-16 11:23:06 -07:00
3f3f5d042a Revert D16227440: [pytorch][PR] Update note about tensors on CPU for certain MAGMA functions, elimina…
Differential Revision:
D16227440

Original commit changeset: 97d5537c5da9

fbshipit-source-id: 2dacfcc821e1fb64466e185efa0f6abd0c9ba526
2019-07-16 11:13:59 -07:00
52de340629 Export torch.masked_fill with onnx::where
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22521

Reviewed By: zrphercule

Differential Revision: D16155168

Pulled By: houseroad

fbshipit-source-id: 5d419f08213324d474b839ba1ae13c799aeee92a
2019-07-16 10:55:30 -07:00
6c997538b7 Unwrap sccache post-build for ROCm compilations. (#22743)
Summary:
The sccache wrapping strategy causes problems for at-runtime kernel
compilation of MIOpen kernels. We therefore - after the builds of
caffe2/pytorch are complete - unwrap sccache again by moving the clang-9
actual binary back into its original place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22743

Differential Revision: D16283329

Pulled By: bddppq

fbshipit-source-id: 4fcdc92be295d5ea9aba75c30e39af1a18a80c13
2019-07-16 10:28:16 -07:00
ba38445cfd Fix alias annotations for dict ops (#22900)
Summary:
Fixes #22553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22900

Pulled By: driazati

Differential Revision: D16277794

fbshipit-source-id: 657f18c50c9a87597ec1a7d568cc532638cfe386
2019-07-16 10:28:12 -07:00
8482efb203 pin_memory malloc now uses existing context if available. (#22229)
Summary:
This is achieved by using `cuDevicePrimaryCtxGetState` as a way to check whether a primary context exists on a device. It is not too slow, from this benchmark of a single call to it on CUDA 10.1, Titan Xp, driver 415.27:
```
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_cuDevicePrimaryCtxGetState        301 ns        301 ns    2319746
```

Commits:

1. Add `CUDAHooks::getDeviceWithPrimaryContext` which returns a device index with primary context (if exists).
    Link `c10/cuda` against `libcuda` for device API calls.
2. Use `getDeviceWithPrimaryContext` to check primary context in `pin_memory`.
    Fix `OptionalDeviceGuard` doc.
3. Refactor `test_cuda_primary_ctx.py` to support multiple tests.
    Add test for this in that file.

Fixes https://github.com/pytorch/pytorch/issues/21081.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22229

Differential Revision: D16170194

Pulled By: zou3519

fbshipit-source-id: 485a45f211b7844c9e69c63f3b3b75194a796c5d
2019-07-16 10:18:30 -07:00
054c7eb0f4 Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16227440

Pulled By: zou3519

fbshipit-source-id: 97d5537c5da98c0ed3edc4668a09294794fc426b
2019-07-16 10:09:10 -07:00
f8ad65adb1 Fix torch.triu / torch.tril on contiguous tensors with non-default st… (#22730)
Summary:
…rides

Changelog:
- Fix behavior of `torch.triu` / `torch.tril` on certain unsqueezed tensors that lead to uninitialized values on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22730

Test Plan:
- Add tests for these cases in test_triu_tril in test_torch

Fixes https://github.com/pytorch/pytorch/issues/22581

Differential Revision: D16222897

Pulled By: zou3519

fbshipit-source-id: b86b060187797e5cd2a7731421dff1ba2b5c9596
2019-07-16 10:09:03 -07:00
0ea8e61f03 For consistent CUDA_HOME behavior (#22845)
Summary:
Align the behavior of `torch.utils.cpp_extension.CUDA_HOME` with that of `tools.setup_helpers.cuda.CUDA_HOME`.

Typically, I swapped the position of guess 2 and guess 3 in `torch.utils.cpp_extension.CUDA_HOME` .

Fixing issue https://github.com/pytorch/pytorch/issues/22844
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22845

Differential Revision: D16276241

Pulled By: zou3519

fbshipit-source-id: 3b62b439b2f794a6f3637a5fee58991f430985fe
2019-07-16 09:55:56 -07:00
560d847da6 add benchmark for PT fill_ op (#22867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867

as title

Reviewed By: hl475

Differential Revision: D16263458

fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d
2019-07-16 09:50:41 -07:00
3b1c3996e1 remove RTTI check for TensorImpl shadow copy (#22773)
Summary:
We introduced RTTI in recent change: https://github.com/pytorch/pytorch/pull/21613

For internal mobile build we don't enable '-frtti' yet. This diff is trying to replace
RTTI with alternative approach.

According to dzhulgakov we could compare two tensors' type_id directly in most cases -
which is more strict than comparing TensorImpl subclass type as TensorImpl -> type_id
mapping is 1-to-n but it's more proper for this use case.

The only two cases where we can relax direct type comparison (for legacy reason) are:
1. CPUTensor <-> CUDATensor;
2. SparseCPUTensor <-> SparseCUDATensor;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22773

Differential Revision: D16277696

Pulled By: ljk53

fbshipit-source-id: 043e264fbacc37b7a11af2046983c70ddb62a599
2019-07-15 23:21:57 -07:00
c5afdd0b55 Revert D16197605: [jit] Make traced fns also go into the global python CU
Differential Revision:
D16197605

Original commit changeset: d32c975486b0

fbshipit-source-id: a00f0490cc23824792f3e745d7b5a003b1a33d20
2019-07-15 22:31:33 -07:00
a326aad816 Revert D16197608: [jit] Make classtypes hold a weak_ptr to their CU
Differential Revision:
D16197608

Original commit changeset: 22250d6f0d24

fbshipit-source-id: 47a8cdeb62b1033252070ecb92906358014b551a
2019-07-15 19:49:41 -07:00
5f05037de6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: a7af5bc022abbfb81af31dbb653e25a3b8d54c4f
2019-07-15 18:07:21 -07:00
94d99f2522 add num_runs flag to the benchmark (#22892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892

Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations.

Reviewed By: hl475

Differential Revision: D16271597

fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd
2019-07-15 17:18:25 -07:00
6ffacd5f02 Use original module's class name for ScriptModules (#22873)
Summary:
Since recursive script creates a ScriptModule from an `nn.Module`,
there's no ties to the original module to pull a type name from, so we
have to explicitly pass it in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22873

Pulled By: driazati

Differential Revision: D16268547

fbshipit-source-id: 902a30e6e36427c6ba7033ded027a29d9dcbc1ee
2019-07-15 15:27:29 -07:00
248336946e remove stray print
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22825

Differential Revision: D16266401

Pulled By: Krovatkin

fbshipit-source-id: 214f90578061aad83eab143381b3c05386edee3d
2019-07-15 14:54:10 -07:00
f7de9be3c0 Add FakeQuantize Module (#21767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767

Adding FakeQuantize Module
for quantization aware training

Reviewed By: dzhulgakov

Differential Revision: D15728503

fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6
2019-07-15 14:08:55 -07:00
0cddd3e751 update README (#21312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312

This diff updates the README of op-bench.

Reviewed By: zheng-xq

Differential Revision: D15612665

fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c
2019-07-15 13:34:05 -07:00
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
260b0e8476 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22726

Differential Revision: D16197608

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 22250d6f0d249f61f269afb4fe8e7d1af0be1205
2019-07-15 13:13:16 -07:00
5fc1260e0a Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22725

Differential Revision: D16197605

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d32c975486b0cb4808687f0aa89325571f2817c4
2019-07-15 13:13:12 -07:00
16aa235f43 _script_compile and _script_class_compile add to the python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22724

Differential Revision: D16197609

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: e12b31f8c8ce14b0968f4ac9445e7d225126b210
2019-07-15 13:13:08 -07:00
f2f80744be Close loophole to create untyped tuples (#22518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22518

-

Reviewed By: dzhulgakov

Differential Revision: D16115216

fbshipit-source-id: 1afae3666f7acd7d7833db8a72168364fed4879d
2019-07-15 11:33:45 -07:00
800f4936f0 Deprecate untyped Lists (#22517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22517

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Reviewed By: dzhulgakov

Differential Revision: D16115214

fbshipit-source-id: 2c8d0e4e375339c699d583995f79c05c59693c3e
2019-07-15 11:33:35 -07:00
bd88fd0793 Added .bfloat16() (#22852)
Summary:
Add conversion method for bfloat16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22852

Differential Revision: D16256760

Pulled By: izdeby

fbshipit-source-id: 01d75495f9df513a0cdf78791c3eb013ab92bd95
2019-07-15 09:32:18 -07:00
8399197df6 Set up CI with Azure Pipelines (#22839)
Summary:
Introduce Azure Pipelines for the linting checks.  This is meant to be equivalent to the existing Travis linting phase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22839

Differential Revision: D16260376

Pulled By: ezyang

fbshipit-source-id: 1e535c3096358be67a0dad4cd920a92082b2d18e
2019-07-15 06:41:56 -07:00
535c5540bc Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794)
Summary:
Original commit changeset: 227df3b85316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794
ghstack-source-id: 86400904

Differential Revision: D16222777

fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a
2019-07-15 06:28:56 -07:00
317cf7c874 Remove tensor_data() call in Python Variable() and nn.Parameter() constructors (#22821)
Summary:
As part of the Variable/Tensor merge, `variable.tensor_data()` should be removed in favor of `variable.detach()`. This PR removes  `tensor_data()` call sites in Python `Variable()` and `nn.Parameter()` constructor paths.

Note that this PR is BC-breaking in the following way:
- For Python `Variable()` constructor:
Previously, in-place updating a tensor after it's been used to create a Variable does not bump the Variable's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.autograd.Variable(t).requires_grad_()
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create a Variable will also bump the Variable's version counter, thus preserving the correctness of the Variable's version counter.

- For Python `nn.Parameter()` constructor:
Previously, in-place updating a tensor after it's been used to create an nn.Parameter does not bump the nn.Parameter's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.nn.Parameter(t)
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create an nn.Parameter will also bump the nn.Parameter's version counter, thus preserving the correctness of the nn.Parameter's version counter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22821

Differential Revision: D16258030

Pulled By: yf225

fbshipit-source-id: 9a6d68cea1864893193dbefbb6ef0c1d5ca12d78
2019-07-14 21:09:29 -07:00
14e8fb70a1 Make the signature of fill_out consistent with fill_.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22761

Test Plan: Imported from OSS

Differential Revision: D16257779

Pulled By: ezyang

fbshipit-source-id: b1201500042ae1f4678835da957de1777c1038a3
2019-07-14 19:20:59 -07:00
1c266c2738 Move the body of fill_kernel_impl into fill_kernel_cuda
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22760

Test Plan: Imported from OSS

Differential Revision: D16257782

Pulled By: ezyang

fbshipit-source-id: d214d2d77affd937109b33ca841af76004f85834
2019-07-14 19:20:53 -07:00
fc297b8e83 Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22758

Test Plan: Imported from OSS

Differential Revision: D16257781

Pulled By: ezyang

fbshipit-source-id: 9e5ed06e95ef65b036eb388488faad981f1e8012
2019-07-14 19:20:46 -07:00
815e73bc20 make_variable consumes the Tensor if it only has one reference
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22705

Test Plan: Imported from OSS

Differential Revision: D16192220

Pulled By: jamesr66a

fbshipit-source-id: 9c42bb759077b74a1370d3a2d7114ed3593f333b
2019-07-14 18:36:20 -07:00
b5fa9a340a Temporarily skip mypy-0.720 to unbreak master type checks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22835

Differential Revision: D16239190

Pulled By: bddppq

fbshipit-source-id: e97fd3aae0676de8a06dc9fb498f36ed28dc92c3
2019-07-14 09:49:24 -07:00
1a93b96815 Revert da315a4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22837

Differential Revision: D16239667

Pulled By: llyfacebook

fbshipit-source-id: 1a625d78d633927129dd2791e65b333b3902f94f
2019-07-13 01:54:20 -07:00
92468f0a6b Revert D16238204: Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16238204

Original commit changeset: a6b5eb3f4820

fbshipit-source-id: bdfae93c522c1ce734ab8dc736ced66411fe50ee
2019-07-12 22:58:50 -07:00
da315a4e2a Revert D16037021: Support GRU module quantization in Pytorch
Differential Revision:
D16037021

Original commit changeset: 71145c67d869

fbshipit-source-id: 33cd2e57eba30ea33cc4f3116732a721c26f6efb
2019-07-12 21:05:34 -07:00
fcfefc3439 Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16224780

Original commit changeset: 331dafbb7689

fbshipit-source-id: a6b5eb3f4820fbb58d4a329aa4c93b40a111ff27
2019-07-12 20:55:05 -07:00
ead1193241 Transfer Learning: Caffe2 load op changes to return shape inference (#22829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829

Sending out caffe2 load op changes separately since we want pick it to open source.

This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group.

Reviewed By: boryiingsu

Differential Revision: D16229465

fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2
2019-07-12 19:45:13 -07:00
d8c1b86135 Support GRU module quantization in Pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22498

Reviewed By: BIT-silence

Differential Revision: D16037021

fbshipit-source-id: 71145c67d8696e525b686cd3313033e5b6771718
2019-07-12 18:31:08 -07:00
ba9d559a12 Get rid of torch.mean shape analysis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22810

Test Plan: Imported from OSS

Differential Revision: D16226973

Pulled By: jamesr66a

fbshipit-source-id: ad23f48782e8d21788ecae39fc512ff4502716bf
2019-07-12 17:50:10 -07:00
9eb039334f Use Linear Operator with fp16 weights in JIT (#22323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22323

This diff adds an interface to use quantized Linear op in JIT.

Reviewed By: jamesr66a

Differential Revision: D16040724

fbshipit-source-id: 90e90aff9973c96ea076ed6a21ae02c349ee2bcf
2019-07-12 15:59:17 -07:00
573d9e6975 Support Linear operation with fp16 weights in ATen (#22023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023

This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y  =   X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y

Reviewed By: jianyuh

Differential Revision: D15921768

fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d
2019-07-12 15:59:13 -07:00
35ee4bf4e5 Revert D16204820: [pytorch][PR] optimize topk on cpu using parallel and partial sort
Differential Revision:
D16204820

Original commit changeset: ea70562c9149

fbshipit-source-id: c8f8e262c7c681593d243f035bf1f0d84675c9dc
2019-07-12 15:14:06 -07:00
cf2889ad8f add support for breaks and continues (#21692)
Summary:
Add support for breaks and continues in the jit. We do with a Graph transform pre-SSA.

A graph of the form
```
def test():
    while i < 5:
        if i == 3:
            break
        i += 1
        print(i)
```
has the body of the loop transformed to
```
if i == 3:
    did_break = True
else:
    did_break = False
if did_break:
    loop_exit = True
else:
    i += 1
    print(i)
    loop_exit = i < 5
```

I am going to add more tests but I think it is ready for review now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21692

Differential Revision: D16215807

Pulled By: eellison

fbshipit-source-id: 365102f42de4861d9323caaeb39a96de7619a667
2019-07-12 15:02:44 -07:00
b3147bc674 PyTorch export to ONNX Opset 7 and 8 - Cont (#22421)
Summary:
This is an extension to the original PR https://github.com/pytorch/pytorch/pull/21765

1. Increase the coverage of different opsets support, comments, and blacklisting.
2. Adding backend tests for both caffe2 and onnxruntime on opset 7 and opset 8.
3. Reusing onnx model tests in caffe2 for onnxruntime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22421

Reviewed By: zrphercule

Differential Revision: D16225518

Pulled By: houseroad

fbshipit-source-id: 01ae3eed85111a83a0124e9e95512b80109d6aee
2019-07-12 14:52:48 -07:00
9f8e2c067f MIOpen integration into pytorch RNN operators (#22774)
Summary:
This PR enables pytorch RNN operators to use MIOpen engine

ezyang bddppq

cc: lcskrishna iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22774

Differential Revision: D16224780

Pulled By: bddppq

fbshipit-source-id: 331dafbb76892d7390b620a95d8f384d38ee5533
2019-07-12 14:47:48 -07:00
30e03df638 Speeds up fast-path for 1D tensors (#22756)
Summary:
Using PMCTest (https://www.agner.org/optimize/) to measure
TensorIterator construction, this results in ~600 fewer instructions
retired (~300 fewer cycles) for constructing TensorIterator on a 1D
tensor. (Should be roughly ~100 ns, but it's hard to measure that
precisely end-to-end).

```
Before:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      5082       2768       5690       7644          3

After:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      4518       2437       5109       6992          0
```

Note that Instruct is reliable, Core cyc is a little noisy, and Clock
is a little more noisy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22756

Differential Revision: D16207777

Pulled By: VitalyFedyunin

fbshipit-source-id: bcc453a90472d9951a1c123bcb1b7a243fde70ac
2019-07-12 12:33:38 -07:00
02bc06a683 avoid kernel launches for zero-sized tensor inputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790

Test Plan: Imported from OSS

Differential Revision: D16226168

Pulled By: wanchaol

fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc
2019-07-12 12:24:52 -07:00
b1b65f34a9 Make PythonArgs::tensor and PythonArgs::scalar faster (#22782)
Summary:
Speeds up the common case where Tensor is a torch.Tensor (not a
subclass). This reduces the number of executed instructions for a
torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster).

Note that most of the PythonArgs accessors are too large to be inlined.
We should move most of them to the cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782

Differential Revision: D16223592

Pulled By: colesbury

fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1
2019-07-12 11:57:29 -07:00
10c14ad17c optimize topk on cpu using parallel and partial sort (#19736)
Summary:
This PR aims at improving `topk()` performance on CPU. This is useful when computing **beam search** during `Transformer` and `BERT`.

Given a tensor x of size `[N, C]`, and we want to apply `x.topk(K)`, the current logic is **sequentially** loop on the dimension of `N` and do **quick select** on the dimension of `C` so as to find out top K elements.

Performance can be further improved from:

- On the dimension of `N`, it can be paralleled
- Maybe a faster sorting algorithm for `topk`. (After a bunch of experimenting, `std::partial_sort` seems to be the most promising)

So i compared 3 versions:

1. vanilla: sequential + quick select
2. reference PR https://github.com/pytorch/pytorch/issues/19737: parallel + quick select
3. this PR: parallel + partial sort

with the following benchmark, on `Xeon 8180, 2*28 cores@2.5 GHz`:
```python
import torch
from time import time

num_iters = 1000

def bench_topk(N=8, C=168560, k=10):
    a = torch.randn(N, C)
    # warm up
    for i in range(100):
        torch.topk(a, k)

    t = 0
    for i in range(num_iters):
        a = torch.randn(N, C)
        start = time()
        value, indice = torch.topk(a, k)
        t += time() - start
    print("#[%d, %d] times: %f ms" % (N, C, t / num_iters * 1000))

Ns = [10, 20, 30]
Cs = [10000, 20000, 40000, 80000, 160000, 320000]

for n in Ns:
    for c in Cs:
        bench_topk(N=n, C=c)

```
### vanilla: sequential + quick select
```
#[10, 10000] times: 0.746740 ms
#[10, 20000] times: 1.437399 ms
#[10, 40000] times: 2.832455 ms
#[10, 80000] times: 5.649426 ms
#[10, 160000] times: 11.309466 ms
#[10, 320000] times: 22.798765 ms
#[20, 10000] times: 1.511303 ms
#[20, 20000] times: 2.822024 ms
#[20, 40000] times: 5.564770 ms
#[20, 80000] times: 11.443044 ms
#[20, 160000] times: 22.747731 ms
#[20, 320000] times: 46.234449 ms
#[30, 10000] times: 2.214045 ms
#[30, 20000] times: 4.236179 ms
#[30, 40000] times: 8.418577 ms
#[30, 80000] times: 17.067578 ms
#[30, 160000] times: 33.826214 ms
#[30, 320000] times: 68.109420 ms
```
### reference PR: parallel + quick select
```
#[10, 10000] times: 0.271649 ms
#[10, 20000] times: 0.593016 ms
#[10, 40000] times: 1.133518 ms
#[10, 80000] times: 2.082355 ms
#[10, 160000] times: 4.049928 ms
#[10, 320000] times: 7.321285 ms
#[20, 10000] times: 0.315255 ms
#[20, 20000] times: 0.539054 ms
#[20, 40000] times: 1.000675 ms
#[20, 80000] times: 1.914586 ms
#[20, 160000] times: 4.437122 ms
#[20, 320000] times: 8.822445 ms
#[30, 10000] times: 0.347209 ms
#[30, 20000] times: 0.589947 ms
#[30, 40000] times: 1.102814 ms
#[30, 80000] times: 2.112201 ms
#[30, 160000] times: 5.186837 ms
#[30, 320000] times: 10.523023 ms
```
### this PR: parallel + partial sort
```
#[10, 10000] times: 0.150284 ms
#[10, 20000] times: 0.220089 ms
#[10, 40000] times: 0.521875 ms
#[10, 80000] times: 0.965593 ms
#[10, 160000] times: 2.312356 ms
#[10, 320000] times: 4.759422 ms
#[20, 10000] times: 0.167630 ms
#[20, 20000] times: 0.265607 ms
#[20, 40000] times: 0.471477 ms
#[20, 80000] times: 0.974572 ms
#[20, 160000] times: 3.269645 ms
#[20, 320000] times: 6.538608 ms
#[30, 10000] times: 0.204976 ms
#[30, 20000] times: 0.342833 ms
#[30, 40000] times: 0.589381 ms
#[30, 80000] times: 1.398579 ms
#[30, 160000] times: 3.904077 ms
#[30, 320000] times: 9.681224 ms
```
In summary, `2` is **5x** faster than `vanilla` on average and `3` is **8.6x** faster than `vanilla`.
On `Fairseq Transformer`, the default parameter on dataset `wmt14` would have a `topk` size of `[8, 168560]`, and this operator gets `3x` faster with this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19736

Differential Revision: D16204820

Pulled By: VitalyFedyunin

fbshipit-source-id: ea70562c9149a0d832cf5872a891042ebd74fc63
2019-07-12 11:10:20 -07:00
fc23d7f3bd Speed up TensorIterator::compute_strides a little (#22779)
Summary:
For three 1-D operands, compute_strides now takes 298 instructions instead
of 480. (Saves ~36 ns).  We'll want to make Tensor::sizes(), strides(), and
element_size() trivially inlinable to speed this up more.

(Using PMCTest from https://www.agner.org/optimize/ to measure instructions retired)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22779

Differential Revision: D16223595

Pulled By: colesbury

fbshipit-source-id: e4730755f29a0aea9cbc82c2d376a8e6a0c7bce8
2019-07-12 10:57:32 -07:00
f266a63eeb Initiate checkCuda90Bug warning (#22757)
Summary:
Initiate checkCuda90Bug warning to THCudaBlas_Sgemm and THCudaBlas_Hgemm.
https://github.com/pytorch/pytorch/pull/22034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22757

Differential Revision: D16223085

Pulled By: zhangguanheng66

fbshipit-source-id: 470c6cbaba16a3cec295993c2673f02008a602a6
2019-07-12 09:55:09 -07:00
ccb28939bf Revert D16222539: [pytorch][PR] Let users pass CMake-specific options starting with CMAKE_ to CMake.
Differential Revision:
D16222539

Original commit changeset: 1cc6e69c85cd

fbshipit-source-id: c79d68976ac1047c54b32c093429b23e9482cd8f
2019-07-12 07:57:57 -07:00
612eed31a9 Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16222539

Pulled By: ezyang

fbshipit-source-id: 1cc6e69c85cdf0d7f8074653445410d85746847c
2019-07-12 07:28:32 -07:00
7eb0319339 add new tests to benchmark_all_test (#22787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787

as title

Reviewed By: hl475

Differential Revision: D16219329

fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b
2019-07-11 22:50:55 -07:00
1878800f47 make custom op work in OSS environment (#22781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781

The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required.

Reviewed By: hl475

Differential Revision: D16214430

fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376
2019-07-11 21:17:17 -07:00
8ec712da30 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_linear (#22403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22403

- C10 Operator Registration (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/op_registration/op_registration.cpp) supports None type.

- ATen has None Tensor support, e.g., https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L1078

Reviewed By: zafartahirov

Differential Revision: D16069522

fbshipit-source-id: 3acaec783fc138ff36b14ffc0582d0764be4ad34
2019-07-11 17:33:08 -07:00
9d11004ee4 Update ONNX constant folding to support opset 10. (#22515)
Summary:
Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 of ONNX. For opset 10, it is a no-op. This change enables ONNX constant folding for opset 10. Specifically there are three main changes:
1) Turn on constant folding ONNX pass for opset 10.
2) Update support for opset 10 version of `onnx::Slice` op for backend computation during constant folding.
3) Enable constant folding tests in `test/onnx/test_utility_funs.py` for multiple opsets (9 and 10).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22515

Reviewed By: zrphercule

Differential Revision: D16189336

Pulled By: houseroad

fbshipit-source-id: 3e2e748a06e4228b69a18c5458ca71491bd13875
2019-07-11 16:29:03 -07:00
291570e085 make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22723

Test Plan: Imported from OSS

Differential Revision: D16197604

Pulled By: suo

fbshipit-source-id: b22491a58aa9ea476acab06614093ff004291407
2019-07-11 14:55:43 -07:00
de819be93e refactor self to be a class again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22722

Test Plan: Imported from OSS

Differential Revision: D16197607

Pulled By: suo

fbshipit-source-id: b4dd96b3f9cc46b48678aab0ff89afc3666e2185
2019-07-11 14:55:39 -07:00
22d70e0d4b Give functions qualified names
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22721

Test Plan: Imported from OSS

Differential Revision: D16197606

Pulled By: suo

fbshipit-source-id: 94718fcdb0d3b651f16674af3cfd6249ed4533ae
2019-07-11 14:55:34 -07:00
4b48ae4aec Suppress progress bar only for pip install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22708

Test Plan: Imported from OSS

Differential Revision: D16206329

Pulled By: ezyang

fbshipit-source-id: 4ec29e0e9e48a168e88ec716ee8e270c56a38cdb
2019-07-11 13:50:29 -07:00
05d56bd1b6 Remove hard-coded NVRTC specific constant from fuser header
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699

Test Plan: Imported from OSS

Differential Revision: D16192290

Pulled By: bwasti

fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf
2019-07-11 13:44:25 -07:00
513b7a7a06 assert_no_internal_overlap pass op name by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22729

Test Plan: Imported from OSS

Differential Revision: D16205448

Pulled By: jamesr66a

fbshipit-source-id: b383c461dd58e8a3d0bfeae43ebfd1e021668f80
2019-07-11 13:38:10 -07:00
9690f8629d Move the storage in empty_cpu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22728

Test Plan: Imported from OSS

Differential Revision: D16205449

Pulled By: jamesr66a

fbshipit-source-id: 6fd198d0d526b5de393e2988906dac2a63064f24
2019-07-11 13:38:07 -07:00
a797815198 bucketize op shape inference (#22716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22716

add shape inference func to bucketize op

Reviewed By: ipiszy

Differential Revision: D16193718

fbshipit-source-id: 6e893356b6408255538545673047dd5124837e70
2019-07-11 12:44:29 -07:00
ac78a86e1d Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749)
Summary:
Original commit changeset: add2ee8a8865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749
ghstack-source-id: 86323899

Differential Revision: D16203552

fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e
2019-07-11 12:21:21 -07:00
8bdda03ae1 optimize RNN on CPU (#22512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22512

optimize RNN on CPU

Reviewed By: llyfacebook

Differential Revision: D16113360

fbshipit-source-id: 9ee53b3b4bb9b636e7be1ccdf25420e2caa60762
2019-07-11 12:16:27 -07:00
Jie
3135298dde (#22602)
Summary:
1. update on restricting block.z <= 64, compliant to CUDA maximum z-dimension of
a block;
2. clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22602

Differential Revision: D16203857

Pulled By: ezyang

fbshipit-source-id: 567719ae175681a48eb0f818ca0aba409dca2550
2019-07-11 12:02:58 -07:00
1682d38a25 Improve hypothesis_utils.py for qtensor (#22693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22693

change np.finfo to torch.finfo

Differential Revision: D16185556

fbshipit-source-id: 594f8ba1d6317ac2de47af754a8bd6015d40ea15
2019-07-11 11:56:01 -07:00
3fabb9f105 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22737

Differential Revision: D16200090

Pulled By: bddppq

fbshipit-source-id: 3819716a9b01f073966fc8b420c6a0b8d13232ac
2019-07-11 11:09:24 -07:00
45cf33a731 add fill_diagonal function (#21892)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21892

Differential Revision: D16164678

Pulled By: colesbury

fbshipit-source-id: 85df8ae9b7a6a91b6023fe7295b3a8124e4526ea
2019-07-11 09:20:44 -07:00
89d6e88042 Add environment variables used in CONTRIBUTING example (#22736)
Summary:
Some other environment variables can be added to speed things up for development.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22736

Differential Revision: D16200904

Pulled By: soumith

fbshipit-source-id: 797ef91a863a244a6c96e0adf64d9f9b4c9a9582
2019-07-11 04:15:51 -07:00
5147819f9d enabled MIOpen depthwise convolutions (#22696)
Summary:
They mistakenly got removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22696

Differential Revision: D16191442

Pulled By: bddppq

fbshipit-source-id: 7ceda274c557879e11f84596040efe9e0c9b861f
2019-07-11 00:14:58 -07:00
d21e476dcd Quantized Conv2d Module (#21323)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21808 Quantized conv avoid functional usage&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15835572/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)

Quantized Conv2d Module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21323

Test Plan:
Tests are split into two parts: functional and API.

`buck test mode/dev caffe2/test:quantized -- test_conv_api` : https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491

```
Parsing buck files: finished in 1.4 sec
Building: finished in 4.6 sec (100%) 7136/7136 jobs, 2 updated
  Total time: 6.1 sec
Trace available for this run at /tmp/testpilot.20190703-153023.392592.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 7149de230b9e1cdc7a872bb31fe099f0616dee09 fbpkg e59e6ab0fe8e47a496f915d34555c3ad at Fri Jun 28 12:20:54 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/647/t.par
Discovering tests
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.044 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.109 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
Summary (total time 9.08s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D15551835

Pulled By: zafartahirov

fbshipit-source-id: 481a7df4b8a88e485437e1596eefb08d5e6766fa
2019-07-10 21:31:24 -07:00
ad634875d0 Mark Unpickler data ptr arg as const
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22690

Differential Revision: D16184299

Pulled By: mrshenli

fbshipit-source-id: 332954028533952dad01df03eca8e95bf6fe67a9
2019-07-10 20:07:13 -07:00
4240220926 Revert D16183577: Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Differential Revision:
D16183577

Original commit changeset: f86838c407db

fbshipit-source-id: bbf53ce52a20b1e90b1fe522d73e558d8044c4ba
2019-07-10 18:29:22 -07:00
1ecc945ab2 Revert D15998762: [jit] Give functions qualified names
Differential Revision:
D15998762

Original commit changeset: bc2b734f626a

fbshipit-source-id: a118cc4e9a34233279e8380529a8d8120a25839d
2019-07-10 16:10:28 -07:00
a1ca32409f Revert D15998758: [jit] refactor self to be a class again
Differential Revision:
D15998758

Original commit changeset: 14bad87bb6e4

fbshipit-source-id: f2c29974d4afc4d8f88a36e9c266e6d5a22a6191
2019-07-10 16:10:24 -07:00
e6eb17303f Revert D16184799: [jit] make CompilationUnit::define return defined functions
Differential Revision:
D16184799

Original commit changeset: 9f77a7ca2223

fbshipit-source-id: a0e08220d924a6ca55bf2f1f77754553d0133595
2019-07-10 16:10:20 -07:00
fffa7200c1 fixing lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22703

Differential Revision: D16188326

Pulled By: izdeby

fbshipit-source-id: 72e6b6f957068c3995010a1b811f24cd2304ff6f
2019-07-10 16:02:21 -07:00
67c634d58e add a comment to native_functions explaining softmax interfaces (#22651)
Summary:
Address the review comment made by gchanan here:

https://github.com/pytorch/pytorch/pull/22456#discussion_r300715866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22651

Differential Revision: D16181828

Pulled By: nairbv

fbshipit-source-id: 0d41a9024c2664298c281e198a997be73e7f8499
2019-07-10 15:34:29 -07:00
0196e0bafb add line numbers to jit_log.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22630

Differential Revision: D16172090

Pulled By: Krovatkin

fbshipit-source-id: 26cdb0077a0bfbf9981e39359472f3251546db53
2019-07-10 15:28:29 -07:00
c49a71f91f make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22667

Test Plan: Imported from OSS

Differential Revision: D16184799

Pulled By: suo

fbshipit-source-id: 9f77a7ca2223237fbcb4b12a4734b7d334f7be13
2019-07-10 15:19:11 -07:00
ee9c8a75f4 refactor self to be a class again (#22207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22207
ghimport-source-id: 36ee8bd17411a2e220665ad2a27364653061070e

Test Plan: Imported from OSS

Differential Revision: D15998758

Pulled By: suo

fbshipit-source-id: 14bad87bb6e44bf1a43ae86339d8cc7b311c76dd
2019-07-10 15:19:07 -07:00
c0674cebf1 Give functions qualified names (#22206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22206
ghimport-source-id: d453219d907e048f24eb7f63c096b2c300307c83

Test Plan: Imported from OSS

Differential Revision: D15998762

Pulled By: suo

fbshipit-source-id: bc2b734f626ab07f97dc50ddf1b021e8b46de312
2019-07-10 15:19:03 -07:00
86fc417147 Move Quantization Models to common_quantization (#22706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22706

Moved the models used for quantization test from the test_quantization.py file to common_quantization.py

Reviewed By: jerryzh168

Differential Revision: D16189865

fbshipit-source-id: 409b43454b6b3fe278ac16b1affb9085d6ed6835
2019-07-10 15:05:49 -07:00
ebafa2e15f Turn on USE_DIRECT_NVRTC in fbcode again. (#22685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22685
ghstack-source-id: 86247780

Reviewed By: bddppq

Differential Revision: D16182352

fbshipit-source-id: fc51aa7c1112904b8cccd055dc87e10c836cf2fb
2019-07-10 15:05:45 -07:00
edeb4dbdcb register __getitem__ builtin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22276

Test Plan: Imported from OSS

Differential Revision: D16060595

Pulled By: wanchaol

fbshipit-source-id: e1e27d6be8d62fc1a841860a783aff108980d9d3
2019-07-10 14:53:35 -07:00
368dbb9ab3 Fix a FIXME in test_nn (#22675)
Summary:
https://github.com/pytorch/pytorch/issues/17262 is already resolved, so this should pass now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22675

Differential Revision: D16188003

Pulled By: zou3519

fbshipit-source-id: 32693229a0590b274ed1bf76b815f17e77c2d3ea
2019-07-10 13:12:50 -07:00
00df49c984 Fix Trace inlining of graphs with optional inputs (#22686)
Summary:
Previously in tracing when we called a script function we would inline the graph and set the graph inputs equal to the types the graph was invoked with.

This breaks for optional arguments invoked with None since we rely on None being set to Optional[T] in schema matching.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22686

Differential Revision: D16186372

Pulled By: eellison

fbshipit-source-id: e25c807c63527bf442eb8b31122d50689c7822f5
2019-07-10 12:57:06 -07:00
3e3e6ee335 Add common_quantized test case utilities (#22694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22694

Move quantization and quantized utility functions for testing to common_quantized.py and common_quantization.py.  Addditionally, add a quantized test case base class which contains common methods for checking the results of quantization on modules.  As a consequence of the move, fixed the import at the top of test_quantized.py, and test_quantization to use the new utility

Reviewed By: jerryzh168

Differential Revision: D16172012

fbshipit-source-id: 329166af5555fc829f26bf1383d682c25c01a7d9
2019-07-10 12:23:36 -07:00
7750cae722 Refactor and improve randperm tests.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121

Test Plan: Imported from OSS

Differential Revision: D16153794

Pulled By: li-roy

fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9
2019-07-10 12:23:33 -07:00
32709af8f4 Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22103

Test Plan: Imported from OSS

Differential Revision: D16153585

Pulled By: li-roy

fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da
2019-07-10 12:23:29 -07:00
0f7c3710dd Support Half type in randperm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102

Test Plan: Imported from OSS

Differential Revision: D16153586

Pulled By: li-roy

fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff
2019-07-10 12:23:25 -07:00
9c4c9c3af0 Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22326

Test Plan: Imported from OSS

Differential Revision: D16183577

Pulled By: colesbury

fbshipit-source-id: f86838c407db4ded9ce70998bf1ab1ffd75b3b58
2019-07-10 12:17:52 -07:00
574e808680 Add a bitwise NOT operator for integer and Boolean types (CUDA).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320

Test Plan: Imported from OSS

Differential Revision: D16183578

Pulled By: colesbury

fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034
2019-07-10 12:17:48 -07:00
e2dc1fc715 Add a bitwise NOT operator for integer and Boolean types (CPU).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22283

Test Plan: Imported from OSS

Differential Revision: D16183576

Pulled By: colesbury

fbshipit-source-id: 2e539fab8ff885dddb9bff334d1d784b28d65b8f
2019-07-10 12:17:44 -07:00
mal
58e20638f7 Refactoring _wrap_outputs to remove python dependence.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631

Test Plan:
test suite

Imported from OSS

Differential Revision: D16185040

fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b
2019-07-10 12:12:16 -07:00
ec1b669d23 fix dce over loops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22632

Test Plan: Imported from OSS

Differential Revision: D16184469

Pulled By: suo

fbshipit-source-id: b7cc2d20a7dd8b287e1b6128ddb70d3936032a7e
2019-07-10 12:03:19 -07:00
9b8d771733 skip import nccl and gloo_gpu in cpu machine (#22522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22522

Skip importing nccl and gloo_gpu modules in cpu machine

Reviewed By: bddppq

Differential Revision: D16115827

fbshipit-source-id: 329b7a0bb5eccb78c9e772bdab5db7c79b546d55
2019-07-10 11:56:56 -07:00
b984b0ab4b fix print (#22689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22689

att

Reviewed By: Lucaskabela

Differential Revision: D16184260

fbshipit-source-id: 1a6ad51a37918d0c81d6e3baa0ca0baa32cb9673
2019-07-10 11:26:34 -07:00
f81395b3e3 Enable more passes in ProfilingGraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22079

Differential Revision: D16119322

Pulled By: Krovatkin

fbshipit-source-id: 301fcc42d0e1f031d9de5bcd9679fb8c2d742fef
2019-07-10 10:44:18 -07:00
10c60b601a Added Bfloat16 tensor for cpu with very limited support (#21860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21860
ghimport-source-id: 5290755b63033cdfdeb911a4ecf4aa282b3db02d

Test Plan: Imported from OSS

Differential Revision: D15856091

Pulled By: izdeby

fbshipit-source-id: 54e7e17be1b5c5a2e80a41feaeaeba75dbb8108f
2019-07-10 09:08:52 -07:00
6eb3969ac7 keep reuqires_grad unchanged after converting bn to syncbn (#22569)
Summary:
After converting BN layers to SyncBN layers, the function will set all `requires_grad = True` regardless of the original requires_grad states. I think it is a bug and have fixed it in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22569

Differential Revision: D16151647

Pulled By: zou3519

fbshipit-source-id: e2ad1886c94d8882485e7fb8be51ad76469ecc67
2019-07-10 08:38:04 -07:00
cbb0b8166d Revert D16161144: [pytorch][PR] Add traces to LowerGradOf and SpecializeAutoGrad
Differential Revision:
D16161144

Original commit changeset: 9e206fcfb179

fbshipit-source-id: 8f9eecb5cd6ca715bd0c647c32cf77cd9d88e6ac
2019-07-10 06:55:01 -07:00
3a8d7463bd Enabled BFloat16 storage (#21523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523
ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0

Test Plan: Imported from OSS

Differential Revision: D15819368

Pulled By: izdeby

fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c
2019-07-09 21:51:06 -07:00
932ec8aa9f Updating submodules
Reviewed By: zpao

fbshipit-source-id: f5636ab0457c1b2e15df95a5677a7194978d9cd0
2019-07-09 21:39:57 -07:00
e72b617eb5 Intoducing bfloat16 type (#21522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21522
ghimport-source-id: 4803f197ec04938501fdb10c1741280331c349d2

Test Plan: Imported from OSS

Differential Revision: D15819369

Pulled By: izdeby

fbshipit-source-id: 46408dc316a5c4dc644a736dc42da2422b34bcb9
2019-07-09 21:14:10 -07:00
de5a481c6e add forward declaration in stateful dataset (#22562)
Summary:
Addressing potential dependency issue by adding forward declaration for OutputArchive/InputArchive.

This change follows the same pattern in base.h in 'torch/csrc/api/include/torch/data/samplers/base.h'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22562

Differential Revision: D16161524

Pulled By: soumith

fbshipit-source-id: d03f8a2ece5629762f9fa8a27b15b0d037e8f07b
2019-07-09 16:41:56 -07:00
3cf5f22f02 Enable C2 operators running with {cpu, gpu} * {forward, backward} (#22664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664

This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}.

Reviewed By: hl475

Differential Revision: D15781789

fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b
2019-07-09 16:41:53 -07:00
95a5da175d change c2 bench to use new tensor creation interface (#22663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663

as title

Reviewed By: hl475

Differential Revision: D15744502

fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016
2019-07-09 16:41:49 -07:00
e1fdf8a46f Add comments about adding new build options. (#22641)
Summary:
Also revert the change of cmake.py in
c97829d7011bd59d662f6af9c3a0ec302e7e75fc . The comments are added to
prevent future similar incidents in the future (which has occurred a couple of times in the past).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22641

Differential Revision: D16171763

Pulled By: ezyang

fbshipit-source-id: 5a65f9fbb3c1c798ebd25521932bfde0ad3d16fc
2019-07-09 16:41:46 -07:00
e2216ada65 Properly formats errors rising up from C++ extension compilation (#22445)
Summary:
Here's a C++ extension with a missing semicolon:
```python
torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }')
```
which currently generates this error
```
RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
-isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In
function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23:
error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int
main() { return 0 }\n                       ^\nninja: build stopped: subcommand failed.\n'
```

After this PR, the error is
```
RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
FAILED: main.o
c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test -
DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-
packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
/tmp/torch_extensions/test/main.cpp: In function ‘int main()’:
/tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token
 int main() { return 0 }
                       ^
ninja: build stopped: subcommand failed.
```
which is a lot easier to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445

Differential Revision: D16094205

Pulled By: ezyang

fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 16:41:42 -07:00
50901be9fb Add traces to LowerGradOf and SpecializeAutoGrad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22599

Differential Revision: D16161144

Pulled By: Krovatkin

fbshipit-source-id: 9e206fcfb1796e9448e80f178b75d0c277bd348f
2019-07-09 16:41:39 -07:00
0c2cd93e43 Avoid potential extra copy in _lu_with_info_cuda (#22634)
Summary:
No need to `clone` if the expanded size matches original size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22634

Differential Revision: D16171091

Pulled By: ezyang

fbshipit-source-id: 3d8f116398f02952488e321c0ee0ff2868768a0c
2019-07-09 16:41:36 -07:00
45aad2e680 change unary, pool, max ops to use new interface (#22661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661

as title

Reviewed By: hl475

Differential Revision: D16170825

fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217
2019-07-09 16:41:32 -07:00
2b2fe525b9 introduce a new interface to add a list of operators (#21209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209

This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface:

- create op_list:
```unary_ops_list = op_bench.op_list(
    attr_names=["op_name", "op_function"],
    attrs=[
         ["abs", torch.abs],
         ["abs_", torch.abs_],
   ],
)
```
-  create a bench class:
```
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
    def init(self, M, N, op_function):
        self.input_one = torch.rand(M, N)
        self.op_func = op_function

    def forward(self):
        return self.op_func(self.input_one)
```
- 3. register those ops
``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
 ```

Reviewed By: zheng-xq

Differential Revision: D15514188

fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f
2019-07-09 16:41:29 -07:00
164388150a fix lint (#22654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22654

att

Reviewed By: bddppq

Differential Revision: D16168219

fbshipit-source-id: db1a5e2161e7be70b2f6e6b4beaa27ea91f853f2
2019-07-09 16:41:26 -07:00
8a233b99cb Report errors through call stack (#22280)
Summary:
The error for `test_error_stack_module`:

```
Traceback (most recent call last):
  File "../test.py", line 35, in <module>
    scripted = torch.jit.script(M())
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1119, in script
    return _convert_to_script_module(obj)
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1825, in _convert_to_script_module
    raise e
RuntimeError:

d(int x) -> int:
Expected a value of type 'int' for argument 'x' but instead found type 'str'.
:
at ../test.py:11:12
def c(x):
    return d("hello") + d(x)
           ~ <--- HERE

'c' is being compiled since it was called from 'b'
at ../test.py:14:12
def b(x):
    return c(x)
           ~~~ <--- HERE

'b' is being compiled since it was called from 'forward'
at ../test.py:22:16
    def forward(self, x):
        return b(x)
               ~~~ <--- HERE

'forward' is being compiled since it was called from 'forward'
at ../test.py:31:20
    def forward(self, x):
        return x + self.submodule(x)
                   ~~~~~~~~~~~~~~~~ <--- HERE
```

This also unifies our error reporting in the front end with `ErrorReport`

TODO
* Include module names in message, #22207 should make this easy

](https://our.intern.facebook.com/intern/diff/16060781/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22280

Pulled By: driazati

Differential Revision: D16060781

fbshipit-source-id: c42968b53aaddb774ac69d5abbf7e60c23df8eed
2019-07-09 16:41:22 -07:00
13d58fd9f5 README for the quantized op creation (#22165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22165

Workflow description for the quantized ops design.

Reviewed By: jerryzh168

Differential Revision: D15975977

fbshipit-source-id: ef73b172f609adef149c157c404bb452b5457a9f
2019-07-09 16:41:19 -07:00
dd4982e287 Cleanup integer sign warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22560

Test Plan: Imported from OSS

Differential Revision: D16151479

Pulled By: bwasti

fbshipit-source-id: a54139d8f95ed964530862f96723e4365724b2da
2019-07-09 16:41:16 -07:00
046c4589df lu: When not using pivoting, return the identity permutation instead of zeros (#22242)
Summary:
Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :)

These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option.

Here's a MWE to reproduce the broken behavior, and my fix.

```python
torch.manual_seed(0)

n = 4
L = torch.randn(n,n)
A = L.mm(L.t()).unsqueeze(0)
b = torch.randn(1, n)

A_lu_cpu = torch.lu(A)
A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False)
A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True)
print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot)
print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot)

x_cpu = b.lu_solve(*A_lu_cpu)
x_cuda_nopivot = b.cuda().lu_solve(*A_lu_cuda_nopivot)
x_cuda_nopivot_fixed = b.cuda().lu_solve(
    A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int())
x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot)

print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot)
```

Output:

```
A_lu_cuda_nopivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

-----

A_lu_cuda_pivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

(tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]),
 tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242

Differential Revision: D16049334

Pulled By: ezyang

fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d
2019-07-09 11:16:50 -07:00
7fcfed19e7 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:43 -07:00
5040d52a5a torch.quantization conversion utilities, observers for eager mode quantization (#22010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22010

torch.quantization module with observers and conversion routines

Reviewed By: zafartahirov

Differential Revision: D15554183

fbshipit-source-id: 05a3fabe28dd701978b8ecebf5bfc3a4c044ba5c
2019-07-09 10:51:38 -07:00
073fa6f411 add GRAPH_UPDATE logging to guard_elimination.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22497

Differential Revision: D16165106

Pulled By: Krovatkin

fbshipit-source-id: aeb48d81d92c71f7038903b1656d760b6b95c562
2019-07-09 10:09:35 -07:00
a3346e100e Performance improvements for depthwise convolutions in FP16 (#22302)
Summary:
This PR activates faster depthwise convolution kernels for Volta and Turing GPUs using cudnn >= 7600.
The script to benchmark the current PyTorch master branch and this PR branch can be found [here](https://gist.github.com/ptrblck/4590cf20721d8f43296c9903abd4a774).
(50 warmup iterations, 1000 iterations for timing)

I've used https://github.com/pytorch/pytorch/issues/3265 to create a similar benchmark and added a few additional setups.
Since the results are quite long, I've uploaded them in a spreadsheet [here](https://docs.google.com/spreadsheets/d/13ByXcqg7LQUr3DVG3XpLwnJ-CXg3GUZJ3puyTMw9n2I/edit?usp=sharing).
Times are given in ms per iteration.
We've benchmarked this PR on a DGX1 using V100 GPUs.

The current workload check in `check_cudnn_depthwise_workload` is quite long and can be moved to another file, if wanted.

CC ngimel (Thanks for the support while benchmarking it ;) )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22302

Differential Revision: D16115057

Pulled By: ezyang

fbshipit-source-id: bad184658518e73b4d6b849d77e408f5a7a757de
2019-07-09 07:28:31 -07:00
31d821e267 Move thnvrtc and DynamicLibrary to ATen (#22362)
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229.

`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362

Differential Revision: D16131787

Pulled By: ezyang

fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
2019-07-09 07:28:27 -07:00
74883d4865 Fix typos in gradcheck error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22357

Differential Revision: D16065935

Pulled By: ezyang

fbshipit-source-id: f131655eaca27f9df4cd6c511faabf0b8f2bf0de
2019-07-09 07:12:56 -07:00
92e8d04098 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 488
LARGE: 29
XXLARGE: 2

Updated actions:
From MEDIUM to LARGE: 227
From XLARGE to MEDIUM: 1
From XLARGE to LARGE: 1
From XLARGE to XXLARGE: 1
From LARGE to MEDIUM: 2
From LARGE to XLARGE: 2

Differential Revision: D16161669

fbshipit-source-id: 67a4e0d883ca3f1ca3185a8285903c0961537757
2019-07-09 05:24:19 -07:00
9fb4386c14 Add a higher-level log traces to DCE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22511

Differential Revision: D16153694

Pulled By: Krovatkin

fbshipit-source-id: 5edbc04bdccfa89f5ad0bf37f51e1bd2cb28962a
2019-07-08 21:55:02 -07:00
5395db22a4 Typo fixed in documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22600

Differential Revision: D16156989

Pulled By: mrshenli

fbshipit-source-id: e491b083d872eaceb829028dadbab2e28ecfc785
2019-07-08 19:29:07 -07:00
b5860b5572 torchvision version changed to the latest one (#22598)
Summary:
Version changed to 487c9bf4b7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22598

Differential Revision: D16155471

Pulled By: ifedan

fbshipit-source-id: 2d54883c91add28c0f076858f292363eb95340a9
2019-07-08 17:13:59 -07:00
738aba171b use caffe2_dnnlowp_force_slow_path in FC (#22143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22143

Like Conv DNNLOWP operator, allow FC to run the slow path to debug numerical issues caused by Intel's int8 instruction that does horizontal addition of 2 int8 multiplication results in 16 bit

Reviewed By: hx89

Differential Revision: D15966885

fbshipit-source-id: c6726376a3e39d341fd8aeb0e54e0450d2af8920
2019-07-08 17:01:04 -07:00
905b9a89b2 fix uninitialized variable in BailOutInserter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22596

Differential Revision: D16156883

Pulled By: Krovatkin

fbshipit-source-id: 8926262a2d3115f34400c9ebb0c98c540e1cc623
2019-07-08 16:45:51 -07:00
c97829d701 Adding FC and Relu QNNPACK ops to C10 registry (#22174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174

This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on.

The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models.

Reviewed By: ljk53

Differential Revision: D15806325

fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae
2019-07-08 14:21:42 -07:00
0e7b65e48a Convert VariableVersion counter to intrusive_ptr, saving a memory allocation on every Tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22514

Differential Revision: D16114714

fbshipit-source-id: 441043d18938710869b64cb67884f49cd6060727
2019-07-08 13:40:58 -07:00
0c1ecf19e1 Simplify the flow control in div_kernel_cuda. (#22555)
Summary:
Some duplicated code is removed. It also becomes clear that there is only one special case `div_kernel_cuda` is handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22555

Differential Revision: D16152091

Pulled By: zou3519

fbshipit-source-id: bb875370077c1f84efe4b766b3e1acc461e73e6c
2019-07-08 12:13:10 -07:00
478d480d37 Add Module.requires_grad_ (#22576)
Summary:
addresses https://github.com/pytorch/pytorch/issues/20241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22576

Differential Revision: D16149314

Pulled By: zou3519

fbshipit-source-id: 1cc4c1ec084df30e00e9ae73ce1a53494a034d5c
2019-07-08 12:13:07 -07:00
456d27dff0 Update module.h (Fix a grammatical error of the comment in line 233) (#22548)
Summary:
Fix a grammatical error of the comment in line 233.
change from " Returns an `OrderedDict` of he submodules of this `Module`"
to " Returns an `OrderedDict` of the submodules of this `Module`"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22548

Differential Revision: D16134534

Pulled By: zou3519

fbshipit-source-id: 33b1dd0fbc3a24bef99b6e0192566e2839292842
2019-07-08 11:51:49 -07:00
3a12520844 Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473)
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.

Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.

This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473

Differential Revision: D16099042

Pulled By: yf225

fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
2019-07-08 11:31:10 -07:00
304552b61a Enabled masked fill and scatter for bool tensors (#22491)
Summary:
Enabled masked_fill, masked_fill_, masked_scatter_, masked_scatter on bool tensors.
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22491

Differential Revision: D16108817

Pulled By: izdeby

fbshipit-source-id: 8b1808f41d7a4f65fe6d3797a3c83b2dac3446c7
2019-07-08 10:49:40 -07:00
a48cf8f52d Fixed RNNImplBase reset and flat_weights methods to handle bidirectional flag correctly (#22493)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/19545:
Changed torch/csrc/api/src/nn/modules/rnn.cpp to be consistent with torch/nn/modules/rnn.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22493

Differential Revision: D16111433

Pulled By: pbelevich

fbshipit-source-id: edfa41e8a9889d64918998dc7c46b8763fdf5765
2019-07-08 10:34:04 -07:00
595e344769 Add type stubs to import 'nn' modules (#22411)
Summary:
Forgot to mirror the `nn/ __init__.py` semantics in the new `nn` type stub.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22411

Differential Revision: D16149798

Pulled By: ezyang

fbshipit-source-id: 0ffa256fbdc5e5383a7b9c9c3ae61acd11de1dba
2019-07-08 09:22:37 -07:00
9bafe5d4da Fix torch.normal with CUDA tensors (#22533)
Summary:
`addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`.

Changelog:
- Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533

Test Plan:
- Enable tests for test_normal on all devices

Fixes https://github.com/pytorch/pytorch/issues/22529

Differential Revision: D16141337

Pulled By: ezyang

fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62
2019-07-08 08:27:38 -07:00
80e2fab952 Deprecate and set a date for removing NO_* and WITH_* (user) build options (#22474)
Summary:
Currently specifying different build options in respect to the "USE_"
series is in quite a disarray. There are a lot of build options that
accept three variants: USE_OPTION, WITH_OPTION, and NO_OPTION. Some
build options only accept USE_ and NO_ variant. Some accept only USE_.
This inconsistency is quite confusing and hard to maintain.

To resolve this inconsistency, we can either let all these build options
support all three variants, or we only support the USE_ variant.

This commit makes a step to the latter choice, i.e., deprecates and sets
a date for removing the NO_ and WITH_ variants and keeps only the
USE_ variant. This is likely better than the former solution because:

- NO_ and WITH_ variants are not documented.
- CMakeLists.txt only has the USE_ variants for relevant build options
  defined. It would be a surprise that when user pass these variables to
  CMake during rebuild and find them ineffective.
- Multiple variants are difficult to maintain.
- The behavior is confusing if more than one variant is passed. For
  example, what to be expected if one sets "NO_CUDA=1 USE_CUDA=1"?

The downside is that this will break backward compatibility for existing
build scripts in the future (if they used the undocumented build
options).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22474

Differential Revision: D16149396

Pulled By: ezyang

fbshipit-source-id: 7145b88ad195db2051772b9665dd708dfcf50b7d
2019-07-08 08:22:08 -07:00
43d36415b9 torch.utils.data.Dataloader: documentation about RNG state consumption (#22540)
Summary:
the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631

The discussion is here: https://github.com/pytorch/pytorch/pull/20749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540

Differential Revision: D16131777

Pulled By: ezyang

fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848
2019-07-08 08:22:04 -07:00
ce8c9d9bd5 Fix cuda detection script (#22527)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22507
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22527

Differential Revision: D16126220

Pulled By: ezyang

fbshipit-source-id: eb05141282b0f058324da1b3d3cb34566f222a67
2019-07-08 07:06:59 -07:00
d4464d3418 Use system locale in collect_env.py (#22579)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22570.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22579

Differential Revision: D16147304

Pulled By: soumith

fbshipit-source-id: db73447bffbfdf54f7b830447d4b9584f363f05f
2019-07-07 20:55:31 -07:00
d48cbd62cd Fix spectral_norm load_state_dict with strict=False (#22545)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21251

also fixes some missing hook removals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22545

Differential Revision: D16139506

Pulled By: soumith

fbshipit-source-id: 552a9f9f91be328a47ee8f1e1d29c1f59b0ebca3
2019-07-07 19:08:48 -07:00
94bd5ddf7f Add some essentials for building c++ extensions on Windows (#22563)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22489.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22563

Differential Revision: D16142615

Pulled By: ezyang

fbshipit-source-id: d7c27a874f788dd27065fad6699485e4a6372ec4
2019-07-06 19:29:25 -07:00
9db7bc8bc7 fix uninitialized variable warning (#22477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477

There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together.

Reviewed By: hx89

Differential Revision: D16100211

fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea
2019-07-06 00:36:45 -07:00
42c6ea5faa ONNX Export Topk with Dynamic k (+ add test cases)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21104

Differential Revision: D16061592

Pulled By: houseroad

fbshipit-source-id: 855b310a138fdde9c25869ffe9f127189dc2eaf5
2019-07-05 23:46:36 -07:00
221af09ca7 Move GradMode / AutoGradMode / NoGradGuard to ATen core (#18573)
Summary:
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen.

Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning:
Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example:
```python
#  torch/tensor.py
with torch.no_grad():
   ...
   new_tensor = self.new()    # `at::GradMode` is false at this point
   ...
```
```cpp
// tools/autograd/templates/python_variable_methods.cpp
static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error)
  return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs));
  ...
}
```
For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573

Differential Revision: D16134413

Pulled By: yf225

fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
2019-07-05 23:41:37 -07:00
39a4ec4141 Make device_of take tensor by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22556

Test Plan: Imported from OSS

Differential Revision: D16137540

Pulled By: jamesr66a

fbshipit-source-id: 8a6be6edd602c53edc954508ea27d8a6071bd964
2019-07-05 23:34:43 -07:00
7730346853 Make shuffling optional in DistributedSampler (#22479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479

In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples.

This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case.

Reviewed By: soumith

Differential Revision: D16100388

fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d
2019-07-05 18:56:28 -07:00
9e1ae4b264 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 8932f509ab9b14988428a1b9a42d3388ff5a4ad5
2019-07-05 18:36:03 -07:00
577042a3cc Better Constant Propagation through Tuples (#22561)
Summary:
Replaces https://github.com/pytorch/pytorch/pull/21501 because ghimport had errors when i tried to import the stack that i couldn't figure out :'(

has the two commits that were previously accepted and the merge commit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22561

Differential Revision: D16135743

Pulled By: eellison

fbshipit-source-id: f0a98842ccb334c7ceab04d1437e09dc76be0eb1
2019-07-05 18:06:46 -07:00
a09150adc0 Deprecate untyped Dicts (#22516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22516

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Differential Revision: D16115215

fbshipit-source-id: 2ef4cb443da1cdf4ebf5b99851f69de0be730b97
2019-07-05 18:00:13 -07:00
595d2dbb4d Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 1a6c249837151564f48f675ced6a221ec739aae9
2019-07-05 15:39:56 -07:00
91706d1044 Primitive Jit Logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22278

Differential Revision: D16134598

Pulled By: Krovatkin

fbshipit-source-id: e64b14d0d68801189fc78c059a4e8b322acce3fa
2019-07-05 15:27:38 -07:00
ed60d9fcf9 List/Dict remember and check their element type (#22005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005

When a Dict or List is created with type information, it will remember that.
If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type.

Differential Revision: D15914462

fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7
2019-07-05 15:17:51 -07:00
mal
042da2171e Skip test_slogdet_sign if LAPACK library is not installed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22551

Test Plan:
ran test locally

Imported from OSS

Differential Revision: D16132182

fbshipit-source-id: 5b9efbf883efa66c4d8b7c400bdb804ac668a631
2019-07-05 11:57:24 -07:00
3507eaf3ea Add clone() implementation for QTensor (#22510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22510

Added a new function to implement clone operation on quantized tensors. Also added a test case which can be tested as shown in test plan.

This change is required to be able to call torch.jit.trace on quantized models.
Clone implementation calls copy_ on QTensor internally.

Differential Revision: D16059576

fbshipit-source-id: 226918cd475521b664ed72ee336a3da8212ddcdc
2019-07-05 11:24:53 -07:00
mal
0140a756d8 Prioritize reentrant tasks and execute them recursively until close to limit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22397

Test Plan:
Added test for reentrant backwards with checkpoint and a test for a recursive backwards function (which should fail if we run all the reentrant tasks recursively in the same thread) and for testing priority of reentrant tasks.
~~Will add a test for priority of reentrant tasks in future pr.~~

Imported from OSS

Differential Revision: D16131955

fbshipit-source-id: 18301d45c1ec9fbeb566b1016dbaf7a84a09c7ac
2019-07-05 08:51:06 -07:00
e5d640341f Set stream for softmax kernel launch (#22470)
Summary:
Currently, the **stream** parameter is not set when launching these two kernels: softmax_warp_forward() and softmax_warp_backward(), i.e. the kernels are always put on the default stream, which may fail to respect the stream that was set previously. Add **at::cuda::getCurrentCUDAStream()** as a launch argument to fix this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22470

Differential Revision: D16115051

Pulled By: izdeby

fbshipit-source-id: 38b27e768bb5fcecc1a06143ab5d63b0e68a279e
2019-07-05 07:33:55 -07:00
d2ceab2766 update video input (#22471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22471

update C2 video input with latest augmentation

Reviewed By: HengCV

Differential Revision: D16096127

fbshipit-source-id: bb07394e211cd52b50005d801b6d03250248ea9e
2019-07-05 00:56:33 -07:00
4ba1c4f798 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_conv (#22472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22472

As Title says.

Reviewed By: dskhudia, bddppq

Differential Revision: D16097594

fbshipit-source-id: 7f56b7906dd9c2792e21a8aa553c0b8d05b19012
2019-07-04 19:37:37 -07:00
57dbc79674 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22543

Test Plan: Imported from OSS

Differential Revision: D16127755

Pulled By: suo

fbshipit-source-id: 5f6ba507bf5b671987e2cabf03c2118271800595
2019-07-04 18:26:04 -07:00
b952011bae use save/load for emitFunctionHook
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22314

Test Plan: Imported from OSS

Differential Revision: D16121781

Pulled By: suo

fbshipit-source-id: b376afb082726d78f082a0ff6902c4b501435d4b
2019-07-04 17:12:16 -07:00
bc24593a80 remove unused argument in import.cpp (#22205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22205
ghimport-source-id: afdf13f6515a1352cde4cbb7b45bf01e717bcf4d

Test Plan: Imported from OSS

Differential Revision: D15998763

Pulled By: suo

fbshipit-source-id: 6e5c1c668b9de5e352d2aa7ca27197deb42ca94b
2019-07-04 17:12:12 -07:00
4b9b7d6f03 improvements to QualifiedName (#22204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22204
ghimport-source-id: 319afc622f7137ca9075efefca1a05acedc19a4a

Test Plan: Imported from OSS

Differential Revision: D15998759

Pulled By: suo

fbshipit-source-id: 4534443aef61255af0fa3d2ed1be5e87266e2f2c
2019-07-04 17:12:08 -07:00
f5919dba45 refactoring of module/object (#22203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22203
ghimport-source-id: 6b3807ac8aa53df2fdd770b43d8e54b8f0d69c20

Test Plan: Imported from OSS

Differential Revision: D15998760

Pulled By: suo

fbshipit-source-id: dd51edbcb66561189ae9d94a129434092bcad01b
2019-07-04 17:12:04 -07:00
3b2844eeea Make CompilationUnit own Functions (#22202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22202
ghimport-source-id: de6c963af1df76d2d6357155e64a5913ab879f76

Test Plan: Imported from OSS

Differential Revision: D15998761

Pulled By: suo

fbshipit-source-id: 5414a6424953738d823b265d20dc67dde6e5b2d8
2019-07-04 17:12:00 -07:00
66378c7025 make test context managers exception safe
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22502

Test Plan: Imported from OSS

Differential Revision: D16121783

Pulled By: suo

fbshipit-source-id: e5f991b189261f596355541cc331ef92667bd1a5
2019-07-04 17:11:56 -07:00
2b06e7cd50 add #pragma once to jit headers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22505

Differential Revision: D16119310

Pulled By: Krovatkin

fbshipit-source-id: 8b742411f40d66690ce28726c213741e0c2de618
2019-07-04 11:10:59 -07:00
6f6a680481 remove erase_fork_wait.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22509

Differential Revision: D16119307

Pulled By: Krovatkin

fbshipit-source-id: 3251f594be6d365652b847bdc35dd4f4b62c35e6
2019-07-03 22:47:57 -07:00
799633e4cd move casting ops from prim to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22275

Test Plan: Imported from OSS

Differential Revision: D16060597

Pulled By: wanchaol

fbshipit-source-id: a11d8ad3b037e15bd670cc7cd3fefd4f0abd0bba
2019-07-03 22:22:28 -07:00
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
10c4b98ade Remove weak script (#22212)
Summary:
* Deletes all weak script decorators / associated data structures / methods
   * In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
   * Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand

This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212

Differential Revision: D15988346

Pulled By: driazati

fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
2019-07-03 17:28:25 -07:00
b93f29ded3 add JIT path to the benchmark (#22309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309

This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.

In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT.  With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.

Reviewed By: zheng-xq

Differential Revision: D16033082

fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
29ec4769bb Fix SyncBatchNorm running var update issue (#22248)
Summary:
## Fix https://github.com/pytorch/pytorch/issues/22192

+ change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)`
+ change cuda & cuda head
```cuda
std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean,
                                                        const Tensor& running_var, double momentum, double epsilon, int64_t count) {
                                                        const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
+ change python interface
```python
class SyncBatchNorm(Function):
    def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size):
        ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248

Differential Revision: D16002146

Pulled By: mrshenli

fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956
2019-07-03 17:17:59 -07:00
325ec2327f create tensor based on provided datatype (#22468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468

as title

Reviewed By: ajauhri

Differential Revision: D15744503

fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9
2019-07-03 17:08:23 -07:00
319ef3bcbb Fix onnx custom op export & add initial test case (#21321)
Summary:
- Fix typo in ```torch/onnx/utils.py``` when looking up registered custom ops.
- Add a simple test case
    1. Register custom op with ```TorchScript``` using ```cpp_extension.load_inline```.
    2. Register custom op with ```torch.onnx.symbolic``` using ```register_custom_op_symbolic```.
    3. Export model with custom op, and verify with Caffe2 backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21321

Differential Revision: D16101097

Pulled By: houseroad

fbshipit-source-id: 084f8b55e230e1cb6e9bd7bd52d7946cefda8e33
2019-07-03 16:59:12 -07:00
9c44f6c723 generate tests based on op metadata (#21432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432

This diff introduce a new interface to generate tests based on the metadata of operators.

Reviewed By: ajauhri

Differential Revision: D15675542

fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c
2019-07-03 16:48:41 -07:00
2732a5e534 Another dce fix (#22499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22499

Another place where onnx export is running dead code elimination after making the jit graph invalid. Fixing it.

Reviewed By: houseroad

Differential Revision: D16111969

fbshipit-source-id: 5ba80340c06d091988858077f142ea4e3da0638c
2019-07-03 16:37:53 -07:00
d9e15bccb0 Perform weight re-init for embedding table in sparse_lookup.py (#22348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348

This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient.

Reviewed By: itomatik

Differential Revision: D16044736

fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb
2019-07-03 10:33:40 -07:00
c9f41e9bc0 Add device guard around MPI operations (#22446)
Summary:
If the current CUDA device is not the same as the device that hosts
the tensor the operation works on then OpenMPI will segfault, as
reported in https://github.com/pytorch/pytorch/issues/21922. This changes adds a device guard for every
operation to ensure the correct device is set.

Fixes https://github.com/pytorch/pytorch/issues/21922.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22446

Differential Revision: D16106823

Pulled By: pietern

fbshipit-source-id: 99d762eb3851c0a0e0b4fe81cf27c1c8d35596cc
2019-07-03 02:01:53 -07:00
abb2e68989 Don't construct a single element array for unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22469

Test Plan: Imported from OSS

Differential Revision: D16105794

Pulled By: jamesr66a

fbshipit-source-id: 6bb5a6703c8dba3cda20a6db192d2a15711751a1
2019-07-02 23:29:56 -07:00
7fef0b7b72 Take const refs in TensorIterator::mark_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22465

Test Plan: Imported from OSS

Differential Revision: D16105795

Pulled By: jamesr66a

fbshipit-source-id: 22945fc3f02f8450ae6b92492a0f7baad80c5cb5
2019-07-02 23:29:52 -07:00
0d63619414 Deprecate vector/unordered_map again (#22478)
Summary:
The deprecation was temporarily removed in https://github.com/pytorch/pytorch/pull/21999 because it threw warnings on our codebase.

https://github.com/pytorch/pytorch/pull/22162 fixed these internal call sites so we can now re-enable the deprecation.

I checked locally that this PR doesn't add any new warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22478

Differential Revision: D16101342

Pulled By: smessmer

fbshipit-source-id: 378eb208f6a3dd3a28d2efe2e001fd71a9569297
2019-07-02 21:59:05 -07:00
17cc79865d Fix dead code elimination in onnx export (#22476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22476

Dead code elimination assumes a valid jit graph because it checks if operators have side effects.
The onnx export path destroys the jit graph right before calling dead code elimination, but it actually doesn't care about side effects.
We can just call dead code elimination and disable side effect lookup and things should work.

Reviewed By: houseroad

Differential Revision: D16100172

fbshipit-source-id: 8c790055e0d76c4227394cafa93b07d1310f2cea
2019-07-02 21:28:57 -07:00
76e14c1e51 remove caffe2/core dependency from ATen/core/jit_type.h (#22441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22441

This include doesn't seem to be needed. Remove it to simplify mobile build dependency.

Reviewed By: dreiss

Differential Revision: D16088224

fbshipit-source-id: f6aec21655e259726412e26a006d785912436c2a
2019-07-02 21:07:59 -07:00
e210c65097 Add torch.where overload with only condition argument (#21986)
Summary:
Requested in https://github.com/pytorch/pytorch/issues/21798
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21986

Differential Revision: D16081577

Pulled By: zhangguanheng66

fbshipit-source-id: 658c0f451b833aceb1a41ee424c7990eec00bc02
2019-07-02 18:18:15 -07:00
dcd902bdde provide "size" parameter in torch.normal when called with two floats (#20545)
Summary:
This has been requested in https://github.com/pytorch/pytorch/issues/20323

(It is still not exactly the same as NumPy, which allows you to pass tensors at mean/std and broadcast them with size, but the present PR is extremely simple and does the main thing people are asking for)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20545

Differential Revision: D15358736

Pulled By: zhangguanheng66

fbshipit-source-id: 762ea5eab5b8667afbac2df0137df017ba6e413c
2019-07-02 18:18:11 -07:00
bb0f299f27 Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288)
Summary:
The changes include:

1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions.
2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision.
3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model.
4. Add more test cases to cover the arguments.

Note: current users should not be affected by the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288

Differential Revision: D15738808

Pulled By: zhangguanheng66

fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c
2019-07-02 18:06:25 -07:00
d684112ec9 Output sequence probability with CTC beam search, optional multiple output sequences (#21927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21927

Add `OUTPUT_PROB` output to CTCBeamSearchDecoderOp to return a probability for each sequence.

Add argument to output top-k instead of top-1 decoded sequences.

Reviewed By: SuperIRabbit

Differential Revision: D15797371

fbshipit-source-id: 737ca5cc4f90a0bcc3660ac9f58519a175977b69
2019-07-02 17:29:13 -07:00
830c6590ef EraseNumberTypes cleans itself up (#22461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22461

We shouldn't call dead code elimination after EraseNumberTypes because dead code elimination assumes a valid jit graph which EraseNumberTypes just broke.
Let's have it clean up itself isntead.

Reviewed By: houseroad

Differential Revision: D16094656

fbshipit-source-id: f2752277d764e78ab276c57d56b2724b872b136f
2019-07-02 17:06:53 -07:00
a6441c00d6 Remove build variable NCCL_EXTERNAL (#22467)
Summary:
It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL
build. See 30da84fbe1614138d6d9968c1475cb7dc459cd4b
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467

Differential Revision: D16098581

Pulled By: ezyang

fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815
2019-07-02 15:36:44 -07:00
34f950c800 Create C2 operator to replace values in embedding table (#22279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22279

This new operator is used for embedding table weight re-init. After we get the evicted indices, they will be the rows need reseting in embedding table. Then we can create a 1d tensor with default values, and apply this operator to copy the tensor to all evicted rows in embedding table

Will add gradient op in next diff

Reviewed By: itomatik

Differential Revision: D15709866

fbshipit-source-id: 2297b70a7326591524d0be09c73a588da245cc08
2019-07-02 15:26:22 -07:00
040a4bd914 include conv_op_impl.h from conv_dnnlowp_op.cc (#22458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22458

To make sure template instantiation.

Reviewed By: jianyuh

Differential Revision: D16094183

fbshipit-source-id: 7861df0b303bec42ab80a53477c4b608edebb61d
2019-07-02 15:09:34 -07:00
474dec4b00 Warn on conditions that can trigger cuBLAS sgemm bug (#22034)
Summary:
The sgemm in cuBLAS 9.0 has some issues with sizes above 2M on Maxwell and Pascal architectures. Warn in this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22034

Differential Revision: D15949930

Pulled By: zhangguanheng66

fbshipit-source-id: 0af977ec7900c76328d23898071de9c23778ff8b
2019-07-02 15:09:31 -07:00
f5b3f9ecd9 Remove unnecessary ROCm detection code in Python scripts. (#22464)
Summary:
ROCm is already detected in cmake/public/LoadHIP.cmake. No need to
detect twice. Plus, the Python script read environment variable
ROCM_HOME, but what is really used in CMake scripts is ROCM_PATH -- A
user must specify both environment variables right. Since ROCM_HOME is
undocumented, this commit completely eradicates it.

 ---

ezyang A remake of https://github.com/pytorch/pytorch/issues/22228 because its dependency has been dismissed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22464

Differential Revision: D16096833

Pulled By: bddppq

fbshipit-source-id: fea461e80ee61ec77fa3a7b476f7aec4fc453d5d
2019-07-02 14:46:03 -07:00
e68dc899d1 Fix compiler warnings (#22162)
Summary:
Fix various compiler warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162

Differential Revision: D16085339

Pulled By: smessmer

fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d
2019-07-02 14:12:55 -07:00
53a52f574f infer shape until no more change (#22425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22425

Currently, in bound_shape_inference.cc: InferBoundShapeAndType, we firstly infer ops in the order and then infer inputs of concat in reverse order. In ctr_instagram_model tiny version, concat is right before FC, so we can infer the inputs for concat. But in production version, we found there are some ops between concat and FC(or other ops we know the shape), so the shape of these ops cannot be inferred.
This diff is a tmp solution for this problem: infer shape in order and in reverse order repeatly until no more change.

Reviewed By: yinghai, ipiszy

Differential Revision: D16082521

fbshipit-source-id: d5066509368029c6736dce156030adf5c38653d7
2019-07-02 13:13:55 -07:00
07ef85e326 Add USE_MKLDNN_CBLAS build option. (#19014)
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014

Differential Revision: D16094090

Pulled By: ezyang

fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
2019-07-02 12:29:54 -07:00
6d5871300b Use concrete types on call sites for Dict/List (#22004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22004

In future, we want all dicts/lists to store information about the types they contain.
This is only possible if the creation API doesn't allow creating lists/dicts without type information.
This diff removes some call sites that don't specify type information and have it specify type information.

Reviewed By: dzhulgakov

Differential Revision: D15906387

fbshipit-source-id: 64766a2534b52c221e8a5501a85eaad13812e7bd
2019-07-02 11:52:35 -07:00
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
bb07f2d063 Pass LRU hash output evicted_values to SparseLookup (#21389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21389

As titled. To do weight re-init on evicted rows in embedding table, we need to pass the info of the evicted hashed values to SparseLookup, which is the layer model responsible for constructing the embedding table and do pooling.

To pass evicted values, we need to adjust the output record of lru_sparse_hash to include the evicted values, and add optional input to all processors that needs to take in sparse segment. For SparseLookup to get the evicted values, its input record needs to be adjusted. Now the input record can have type IdList/IdScoreList/or a struct of feature + evicted values

Reviewed By: itomatik

Differential Revision: D15590307

fbshipit-source-id: e493881909830d5ca5806a743a2a713198c100c2
2019-07-02 11:27:37 -07:00
869ce89474 use feenableexcept when glibc is available (#22241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387

glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace.

Reviewed By: jspark1105

Differential Revision: D15301095

fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb
2019-07-02 10:49:55 -07:00
e74b0fc87c Fix empty_like for quantized tensors. (#21978)
Summary:
empty_like uses the tensor options of `self`, rather than the passed in tensor options.  This means it messes up variable/tensor types, and ignores specifications like different dtypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21978

Differential Revision: D15903948

Pulled By: gchanan

fbshipit-source-id: f29946be01c543f888daef2e99fe928e7b7d9d74
2019-07-02 10:28:59 -07:00
7235532df3 Revert D16088193: Refactored math tests to iterate over all math ops
Differential Revision:
D16088193

Original commit changeset: 81b6e536b450

fbshipit-source-id: 81ee8857c3d5353955d75e05203cfbf2d3b8dacd
2019-07-02 10:18:46 -07:00
a845d02cd5 Revert D16088191: Added math.log2 and hypot
Differential Revision:
D16088191

Original commit changeset: 5d80c480243d

fbshipit-source-id: 12ea2617e3af5bf81b1f2a57f8633ca06a99db5b
2019-07-02 10:18:42 -07:00
2dd71b18c4 Fix PoolWindow crash from thread_local (#22405)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19394

See https://developercommunity.visualstudio.com/content/problem/124121/thread-local-variables-fail-to-be-initialized-when.html for context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22405

Differential Revision: D16090822

Pulled By: ezyang

fbshipit-source-id: 9fdd2c272fa7723fb62b906336d2e2620411b12b
2019-07-02 09:48:14 -07:00
7ca7edc307 ONNX Export LayerNorm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22265

Reviewed By: zrphercule

Differential Revision: D16076268

Pulled By: houseroad

fbshipit-source-id: 29b4ecab2fa0dc7250c9d1ad6924903181a66ab2
2019-07-02 09:37:07 -07:00
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
c9a8413306 Numerical stability of embedding kernels (#22401)
Summary:
Address the issue raised in https://github.com/pytorch/pytorch/issues/22377.

The PR https://github.com/pytorch/pytorch/issues/22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`.
In this PR, it now use a `float32` temporary tensor when the input is `float16`.

ngimel, can I get you to review? I think I have fixed the issues you have pointed out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22401

Differential Revision: D16077319

Pulled By: mrshenli

fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309
2019-07-02 08:53:22 -07:00
b76877728a Added math.log2 and hypot
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21512

Test Plan: Imported from OSS

Differential Revision: D16088191

Pulled By: Chillee

fbshipit-source-id: 5d80c480243d2644c96df26337cf65918d79443e
2019-07-02 06:28:34 -07:00
3d3d07b7dd Refactored math tests to iterate over all math ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21511

Test Plan: Imported from OSS

Differential Revision: D16088193

Pulled By: Chillee

fbshipit-source-id: 81b6e536b4505178c829a9d925c30cd185b7a706
2019-07-02 06:28:30 -07:00
0ffda97aa4 Make Gloo an optional c10d dependency (#22257)
Summary:
The CMake modifications include removal of some unnecessary paths
(e.g. find_package(CUDA) and friends) that are no longer used since
c10d is always part of the larger torch build. The macro
`C10D_USE_...` was ambiguous and is now removed in favor of only
having top level `USE_...`. The c10d test suite is changed to include
skip annotations for the tests that depend on Gloo as well.

Now, if you compile with `USE_DISTRIBUTED=1` and `USE_GLOO=0` you get
a functioning build for which the tests actually pass.

Closes https://github.com/pytorch/pytorch/issues/18851.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22257

Differential Revision: D16087993

Pulled By: pietern

fbshipit-source-id: 0cea66bd5cbd9736b06fa1d45ee13a18cab88adb
2019-07-02 02:39:48 -07:00
b9ede6600e Remove the USE_MIOPEN build option as MIOpen is always used when built with ROCm. (#22420)
Summary:
Close https://github.com/pytorch/pytorch/issues/22200
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22420

Differential Revision: D16087538

Pulled By: bddppq

fbshipit-source-id: ecf3e7eb8213bb093e1c5290d096c233284a2ff9
2019-07-02 00:05:59 -07:00
6721e67c10 Remove hacky stub for quantized ops (#22388)
Summary:
Effectively reverts https://github.com/pytorch/pytorch/pull/18267 - this was a temporary measure and is not used any more.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22388

Differential Revision: D16070725

Pulled By: dzhulgakov

fbshipit-source-id: ee5db11a608f248b0da981169d4cc90470fd482f
2019-07-01 23:21:42 -07:00
2dd1323379 Fix the GPU trainer for NoneCalibration and RNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22385

Reviewed By: Wakeupbuddy

Differential Revision: D16053190

fbshipit-source-id: 6304c5c51f33691c201c78d4c921a9c250d9b4f5
2019-07-01 22:55:18 -07:00
5bd97be309 Fix lint error in format_time() in throughput_benchmark.py and clean it up a bit. (#22424)
Summary:
The `assert False` lint error has been causing CI to fail:

    ./torch/utils/throughput_benchmark.py:14:13: B011 Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22424

Differential Revision: D16083464

Pulled By: bddppq

fbshipit-source-id: 6d96e36c8fcbb391d071b75fe79c22d526c1ba3c
2019-07-01 22:15:37 -07:00
edd5b770be Remove API-level guard on NeuralNetworks.h (#22429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22429

Android NDK r20 removes the guard `(__ANDROID_API__ <= __ANDROID_API_O_MR1__)`, so we do it here also. There is insufficient reason to keep these decls undefined for earlier API levels. NDK r15 and earlier don't even define `__ANDROID_API_O_MR1__`, so the preprocessor defaults it to 0 and the guard evaluates as TRUE.

Reviewed By: smeenai, hlu1

Differential Revision: D16084105

fbshipit-source-id: f0857b3eb0573fe219f0d6c5e6583f89e2b5518f
2019-07-01 22:09:11 -07:00
de84104059 Lint ONNX Related Code (#22423)
Summary:
Lint the code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22423

Differential Revision: D16086518

Pulled By: houseroad

fbshipit-source-id: c6e5143f42c73a70beeaa2e089df4164f6265c32
2019-07-01 21:44:16 -07:00
ffa15d2285 Load original SourceRanges on import (#22180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22180
ghimport-source-id: efa46dcb845c099f0a746f523901ab2c2cd3b004

Test Plan: Imported from OSS

Differential Revision: D15981425

Pulled By: jamesr66a

fbshipit-source-id: bef682bd13c1a5be95bdb97e025690c6f2d523d3
2019-07-01 21:14:39 -07:00
2c2a913a4f Preserve SourceRanges across serialization (#22179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22179
ghimport-source-id: 9879551127da09d78ca348b9e436db5a09a92a38

Test Plan: Imported from OSS

Differential Revision: D15981423

Pulled By: jamesr66a

fbshipit-source-id: a2506f5a2f05916b6e8226841b0229110e758671
2019-07-01 21:14:35 -07:00
e05942c09b Serialization methods for SourceRange and Source (#22178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22178
ghimport-source-id: 85ca4d4454c6d4b57a82f211004c4bb712d1c980

Test Plan: Imported from OSS

Differential Revision: D15981426

Pulled By: jamesr66a

fbshipit-source-id: f81f5ee3b66fc4a0d4a708b8109712b5df9f241a
2019-07-01 21:14:31 -07:00
671782d88a Refactor file:line:col to be less ugly (#22177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22177
ghimport-source-id: e35f068c2d39bd8fa2058a9bfc0b1a3856f9383d

Test Plan: Imported from OSS

Differential Revision: D15981424

Pulled By: jamesr66a

fbshipit-source-id: b7748c5cfd4f8ea594314cb601a2b8045173700a
2019-07-01 21:14:28 -07:00
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
2c18bf21be Fix ScriptModule.__dir__() (#22426)
Summary:
`_method_names` is on `_c`, so `dir(script_module)` calls previously
didn't work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22426

Differential Revision: D16085330

Pulled By: driazati

fbshipit-source-id: 6f9f1bef5da4306c0f26aa0be1bcec6dd3a6f0fb
2019-07-01 19:33:14 -07:00
f0f2331a1c Add support for cross-chunk shuffling in ChunkDataset (#22347)
Summary:
This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22347

Differential Revision: D16081378

Pulled By: zhangguanheng66

fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0
2019-07-01 19:13:34 -07:00
1f9c4fdb5e split onnx passes (#22413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22413

_jit_pass_erase_number_types invalidates the jit graph but parts of _jit_pass_onnx rely on having a valid jit graph.

This splits _jit_pass_onnx into _jit_pass_onnx_remove_print and _jit_pass_onnx_preprocess_caffe2 (which rely on the valid jit graph), runs these before _jit_pass_erase_number_types,
and then runs the rest of _jit_pass_onnx after _jit_pass_erase_number_types

Reviewed By: houseroad

Differential Revision: D16079890

fbshipit-source-id: ae68b87dced077f76cbf1335ef3bf89984413224
2019-07-01 18:16:53 -07:00
a54acd3755 Update the way boolean tensor are being printed (#22238)
Summary:
In case when the boolean tensor gets printed out, no need to specify the dtype.

Example:
```
>> x = torch.tensor([[True, True, True], [True, True, True]])
>> print(x)
tensor([[True, True, True],
        [True, True, True]])

>> x = torch.tensor([True])
>> print(x)
tensor([True])

>> x = torch.tensor(True)
>> print(x)
tensor(True)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22238

Differential Revision: D15996304

Pulled By: izdeby

fbshipit-source-id: 5699acf3e00abca8a2bbb5384f8271eeb063dce7
2019-07-01 18:04:42 -07:00
cbf572671d update mkldnn-bridge to avoid mem leak (#22392)
Summary:
fix the memory leak issue exposed by https://github.com/pytorch/pytorch/issues/21537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22392

Test Plan: {F164886124}

Reviewed By: yinghai

Differential Revision: D16074150

Pulled By: bddppq

fbshipit-source-id: b70192aad3d531f349fea5d2d477b827715a2363
2019-07-01 17:12:48 -07:00
402b9f9a6d add PT chunk op to the benchmark (#22409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409

as title

Reviewed By: hl475

Differential Revision: D16079031

fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a
2019-07-01 16:37:05 -07:00
8a726f5815 add PT split op to the benchmark (#22410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410

as title

Reviewed By: hl475

Differential Revision: D16078705

fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b
2019-07-01 16:37:01 -07:00
8281909e73 add PT cat operator to the benchmark (#22404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404

as title

Reviewed By: hl475

Differential Revision: D16078395

fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f
2019-07-01 16:36:57 -07:00
007fd01e9b Enable PT operators running with {cpu, gpu} * {forward, backward} (#22416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416

This diff tests the combination of cpu/gpu and forward/backward path for PT add operator.

Reviewed By: hl475

Differential Revision: D15770792

fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae
2019-07-01 16:30:58 -07:00
dfa6fca1c6 Supporting Manifold DB in Predictor Exporter (#22334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22334

Improve the function signatures of save_to_db and load_from_db in predictor_exporter.

Reviewed By: akyrola

Differential Revision: D16047208

fbshipit-source-id: a4e947f86e00ef3b3dd32c57efe58f76a38fcec7
2019-07-01 16:17:02 -07:00
30fedeae4a Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 26112fb218995b292bb28e65332f6259b3c289f6
2019-07-01 15:51:30 -07:00
10e4137396 Optimize InstanceNormGradientOp (#22288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22288

Optimize InstanceNormGradientOp

Benchmarks:

CPU with [N, C, H, W] = [128, 256, 56, 56],
NCHW order: 616ms -> 128ms
NHWC order: 1612ms -> 174ms

GPU with [N, C, H, W] = [128, 256, 112, 112],
NCHW order: 6450ms -> 37ms
NHWC order: 1419ms -> 82ms

Reviewed By: houseroad

Differential Revision: D16023630

fbshipit-source-id: 5af9bf1103cde2fc2bcb5cd5a057d039732f052e
2019-07-01 15:10:17 -07:00
d0348c0ef9 ThroughputBenchmark: improve formatting for ExecutionStats (#22293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22293

Just wrapping C class with nicer python interface which now
ust print dirrectly to get all the data. Later we can add various
visualizations there

Differential Revision: D16023999

fbshipit-source-id: 8436e37e36965821a690035617784dcdc352dcd1
2019-07-01 14:24:34 -07:00
d0db2a76a0 PyTorch ThroughputBenchmark: fix inaccuracy in number of iterations reporting (#22292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22292

as we do atomic fetch_add to validate if a thread should
finish, we should not take the last iteration into account. As a
result total number of iterations should be exactly the same as user
sets via config.num_iters

Now when running a unit test I see exact number of iterations reported

Differential Revision: D16023963

fbshipit-source-id: 3b12ee17276628ecd7b0979f28cd6deb777a1543
2019-07-01 14:24:29 -07:00
813b01e4a8 Use at::AutoNonVariableTypeMode before calling ATen tensor factory functions (#22364)
Summary:
As part of the Variable/Tensor merge, one invariant for tensor libraries such as ATen / Caffe2 / XLA is that they should only deal with Tensors, not Variables. However, currently in `variable_factories.h` we are potentially passing Variables into those tensor libraries without the `at::AutoNonVariableTypeMode` guard, which will cause those libraries to treat those Variables as Variables (i.e. their `is_variable()` is true), not Tensors.

Consider the following example for `full_like`:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
  //
  // When `self` is a Variable, since we are not using `at::AutoNonVariableTypeMode`,
  // `at::full_like` will also use `self` as a Variable (and it will see that `self.is_variable()` is true),
  // which breaks the invariant that ATen / XLA should never deal with Variables.
  at::Tensor tensor = at::full_like(self, fill_value, self.options().is_variable(false));
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```

Instead, the invariant-preserving implementation would be:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  at::Tensor tensor = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
    //
    // When `self` is a Variable, since we have `at::AutoNonVariableTypeMode` in the scope,
    // `at::full_like` will use `self` as a Tensor (and it will see that `self.is_variable()` is false),
    // which preserves the invariant that ATen / XLA should only deal with Tensors.
    return at::full_like(self, fill_value, self.options().is_variable(false));
  })();
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```
This PR makes the suggested change for all variable factory functions.

cc. ailzhang This should allow us to remove all `tensor_data()` calls in the XLA codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22364

Differential Revision: D16074862

Pulled By: yf225

fbshipit-source-id: 3deba94b90bec92a757041ec05d604401a30c353
2019-07-01 14:08:28 -07:00
d632b1ff3c Expose is_mkldnn to python and register it as torchscript prim op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22386

Differential Revision: D16074722

Pulled By: bddppq

fbshipit-source-id: b9b2a05a894847640084f063fba68d9db4e6aec1
2019-07-01 12:31:59 -07:00
2ab6ff42d1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: c9d3be641389f3c45a9e5a65280d8bfd20e38ea0
2019-07-01 12:25:21 -07:00
577c04c490 add mutation support for forward_pre_hook and forward_hook (#22285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22285

Previously forward hooks are expected to return None, this PR adds the support to overwrite input and output in `forward_pre_hook` and `forward_hook`, this is used to implement inserting quant/dequant function calls around forward functions.

Differential Revision: D16022491

fbshipit-source-id: 02340080745f22c8ea8a2f80c2c08e3a88e37253
2019-07-01 11:06:42 -07:00
f7421b82ad Remove versions constraints from external_deps (#22113)
Summary:
As per attached tasks, these are noops and are being deprecated/removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22113

Reviewed By: philipjameson

Differential Revision: D15901131

fbshipit-source-id: 3acf12208f692548afe4844be13717a49d74af32
2019-07-01 10:55:30 -07:00
bfeff1eb8f Stubs for torch.nn (#19089)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19089

Differential Revision: D16073654

Pulled By: ezyang

fbshipit-source-id: 5642179651ce45ab7c5a46cc1fcc4fd6b37fa71c
2019-07-01 09:50:17 -07:00
a43d9af52c Comment on why Windows build_pytorch.bat builds twice (#22363)
Summary:
I've noticed that Windows CI seems to build twice, e.g., https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/60304/console

This adds a comment explaining why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22363

Differential Revision: D16073609

Pulled By: zou3519

fbshipit-source-id: ddb422b7c7e18cc436caff2c5838373a82f69429
2019-07-01 09:45:01 -07:00
451c907a47 Adding qconv unpack operator for serialization (#22354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22354

qconv weight unpack operator

Reviewed By: zafartahirov, jianyuh

Differential Revision: D16059668

fbshipit-source-id: b068b1a13bcf6a9148d864db384db780d474bfbf
2019-07-01 09:39:14 -07:00
f894ef7263 Add smoke test for information fn/method/attrs to test_namedtensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22341

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22341 gh/zou3519/66/head

Imported from OSS

Differential Revision: D16053440

Pulled By: zou3519

fbshipit-source-id: 400f2e1c136cd7db4346a42b58813e42595ca755
2019-07-01 07:24:54 -07:00
496e35f76b More named inference rules for pointwise unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22308

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22308 gh/zou3519/65/head

Imported from OSS

Differential Revision: D16053441

Pulled By: zou3519

fbshipit-source-id: 2e8d4cc11d7a711d2b789752a316a11fffc0996e
2019-07-01 07:24:51 -07:00
2a698682e4 Remove Type dispatch (#21964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964
ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f

Test Plan: Imported from OSS

Differential Revision: D15897424

Pulled By: li-roy

fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c
2019-06-30 04:11:35 -07:00
6c454ff14c Stop using Type in Python bindings (#21963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963
ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e

Test Plan: Imported from OSS

Differential Revision: D15897423

Pulled By: li-roy

fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572
2019-06-30 04:11:32 -07:00
9c8f9f0ecb Remove many usages of Type (#21941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941
ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1

Test Plan: Imported from OSS

Differential Revision: D15893331

Pulled By: li-roy

fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a
2019-06-30 04:11:28 -07:00
3cba9e8aaa Error Message Paraphrasing (#22369)
Summary:
Saying `I` in an err msg is too subjective to be used in a framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369

Differential Revision: D16067712

Pulled By: soumith

fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6
2019-06-30 00:13:02 -07:00
41e51ce142 Fix QNNPACK and NNPACK settings (#22367)
Summary:
`setup.py` recommends setting `USE_QNNPACK=0` and `USE_NNPACK=0` to disable building QNNPACK and NNPACK respectively. However this wasn't reflected correctly because we were looking for `NO_QNNPACK` and `NO_NNPACK`. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22367

Differential Revision: D16067393

Pulled By: soumith

fbshipit-source-id: 6491865ade9a6d41b7a79d68fd586a7854051f28
2019-06-29 21:15:59 -07:00
d8de69d621 Adds symbolic op for logsumexp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22306

Differential Revision: D16046027

Pulled By: houseroad

fbshipit-source-id: 7319fd58321220941250c5b8eff024914798e392
2019-06-29 00:09:06 -07:00
b52621c870 Revise error message for invalid Reduction (#22160)
Summary:
Say the user inputs reduction=False. Of course, we can't add a bool and a string, so the ValueError itself will error -which is more confusing to the user. Instead, we should use string formatting. I would use `f"{reduction} is not..."` but unsure whether we are ok with using f"" strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22160

Differential Revision: D15981826

Pulled By: soumith

fbshipit-source-id: 279f34bb64a72578c36bdbabe2da83d2fa4b93d8
2019-06-28 22:37:04 -07:00
9e18234109 Automatic update of fbcode/onnx to 806aa863020fa180e57f576cb032ec44ce8ddcca (#22359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22359

Previous import was 355a4954ea4e5836a5e943589509951c44feb6b4

Included changes:
- **[806aa863](https://github.com/onnx/onnx/commit/806aa863)**: Expose ONNX_ML build option to python (#2138) <bddppq>
- **[8f6e60db](https://github.com/onnx/onnx/commit/8f6e60db)**: Missing newline fix   (#2128) <Chris Seymour>
- **[d94f99d2](https://github.com/onnx/onnx/commit/d94f99d2)**: Avoid unnecessary copies of names by checker (#2098) <Scott McKay>
- **[01f77251](https://github.com/onnx/onnx/commit/01f77251)**: update qlinear conv test (#2120) <Ashwini Khade>
- **[1f0c13d3](https://github.com/onnx/onnx/commit/1f0c13d3)**: Add shape inference for LinearClassifier (#2077) <Hariharan Seshadri>
- **[eb798fcf](https://github.com/onnx/onnx/commit/eb798fcf)**: Fix inconsistency in describing graph's initializer. The initializer (#2115) <xykong58>

Reviewed By: bddppq, zrphercule

Differential Revision: D16061494

fbshipit-source-id: 6ccb63c135c27b307048aa42c11313675027ffb7
2019-06-28 22:22:24 -07:00
7cc8f37f56 Reduce needless copying when returning lists of tensors in the JIT interpreter. (#21690)
Summary:
This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21690

Differential Revision: D15783709

fbshipit-source-id: 23bb4acda6b60c27e95667e1d53c7d261a87167d
2019-06-28 19:00:05 -07:00
737f8a7638 Fix onnx passes (#22319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22319

The onnx pass replacing ints with Tensors produces an invalid JIT graph. It should only be called right before the onnx pass.
Also, it should only be called if we actually export to onnx.

Reviewed By: houseroad

Differential Revision: D16040374

fbshipit-source-id: e78849ee07850acd897fd9eba60b6401fdc4965b
2019-06-28 17:08:55 -07:00
e76c9751c4 Use lazy initialization in autograd record_function to avoid static (#22317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22317

About to add an observer that is also statically initialized in a different
file, so we need to enforce initialization order.

Reviewed By: ilia-cher

Differential Revision: D16012275

fbshipit-source-id: f26e57149a5e326fd34cb51bde93ee99e65403c4
2019-06-28 14:52:56 -07:00
3a198400f8 modify pool benchmarks
Summary: as title

Reviewed By: hl475

Differential Revision: D16058193

fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849
2019-06-28 14:35:23 -07:00
89c709d217 modify unary operators benchmark
Summary: as title

Reviewed By: hl475

Differential Revision: D16057665

fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa
2019-06-28 14:03:41 -07:00
6cf4df5d06 add PT softmax ops to the benchmark suite (#21208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208

The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite.

Reviewed By: zheng-xq

Differential Revision: D15526265

fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68
2019-06-28 13:58:20 -07:00
2132ea1d8d Fix "python: can't open file '.jenkins/pytorch/print_sccache_log.py': [Errno 2] No such file or directory"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22315

Test Plan: Imported from OSS

Differential Revision: D16049862

Pulled By: ezyang

fbshipit-source-id: d9c83208e6b5ee7eb009ddb585dbfa0ea1cbb9e6
2019-06-28 07:15:33 -07:00
042a2fd810 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 445
LARGE: 354

Updated actions:
From MEDIUM to LARGE: 21
From LARGE to XLARGE: 34
From LARGE to MEDIUM: 9
From XLARGE to MEDIUM: 1

Differential Revision: D16047893

fbshipit-source-id: 7afab2ef879277f114d67fd1da9f5102ec04ed7f
2019-06-28 04:13:06 -07:00
e259894e83 Test raising TypeError in torch.from_numpy() (#21607)
Summary:
With some additional cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21607

Differential Revision: D16046063

Pulled By: li-roy

fbshipit-source-id: 15256a0e94afea39db3cb581c546c2a18a8a7fda
2019-06-27 23:54:47 -07:00
0804452709 fix lint in torch/nn/quantized/modules/linear.py (#22325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22325

att

Reviewed By: bddppq

Differential Revision: D16042464

fbshipit-source-id: 0610896c08667fdaa95983f49140193ecb9ede16
2019-06-27 23:18:42 -07:00
1bea27be9d Remove three cpu sigmoid functions that are identical to IMPLEMENT_UNARY_OP_VEC (#22271)
Summary:
This does not occur in CUDA code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22271

Differential Revision: D16024605

Pulled By: bddppq

fbshipit-source-id: bb4f16bacbdc040faa59751fba97958f4c2d33cd
2019-06-27 23:05:05 -07:00
5e77111486 nn.quantized.Relu and nn.quantize.Quantize/DeQuantize modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21930

Differential Revision: D15554224

fbshipit-source-id: 1de9ac7412468106be60e53852c23318ead37bc6
2019-06-27 16:15:17 -07:00
6f0f7e316d Support building caffe2 with clang-cl on Windows (#22307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22307

MSVC-specific pragma doesn't silence the warning about throwing constructor and therefore `clang-cl` fails to compile this file. This diff fixes the problem by adding additional check for `clang` compiler.

Reviewed By: smessmer

Differential Revision: D16032324

fbshipit-source-id: 6dbce0ebf0a533d3e42b476294720590b43a8448
2019-06-27 15:43:38 -07:00
83768f0756 Add ONNX export support for multidim torch.sum. (#22240)
Summary:
This change fixes the issue reported in https://github.com/pytorch/pytorch/issues/22066.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22240

Reviewed By: zrphercule

Differential Revision: D15996934

Pulled By: houseroad

fbshipit-source-id: 3a842ba26f54aa710233fbe87d727fc1f2568d9c
2019-06-27 15:02:33 -07:00
2832e33a94 Add serialization for nn.quantized.Linear module (#21925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21925

att

Differential Revision: D15483071

fbshipit-source-id: 3a218dad5b653b38a0885339889ff70c75a13bef
2019-06-27 14:57:22 -07:00
5c46e701fc Implementation of nn.quantized.linear module (#21921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21921

Call FBGEMM kernels to implement quantized linear operator. This operator is used only for inference.

Differential Revision: D15375695

fbshipit-source-id: b9ca6c156fd60481fea83e55603b2897f7bfc3eb
2019-06-27 14:09:48 -07:00
7a40412158 Delay reduction of unused parameters until first autograd hook is called (#22219)
Summary:
Reduction of gradients for unused parameters should happen as soon as
possible, because they potentially block reduction of gradients for
used parameters. This used to happen instantly when
`prepare_for_backward` was called and it found parameters that didn't
contribute. This meant that if you have a model with unused
parameters, and you want to discard the model output (i.e. not call
backward on some loss), reduction of the gradients of those unused
parameters would have been kicked off, and you'd see an error the next
time you called `forward`.

In this commit, this original approach is slightly changed to delay
reduction of the gradients of those unused parameters until the first
autograd hook is called. This means that you can now discard the model
output regardless of the model having unused parameters or not.

This is a prerequisite for making the `find_unused_parameters`
argument to DDP default to `True`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22219

Differential Revision: D16028698

Pulled By: pietern

fbshipit-source-id: c6aec2cd39c4a77746495d9cb1c9fb9c5ac61983
2019-06-27 14:09:44 -07:00
ac39869370 Fixed list() not making a copy (#22093)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22087
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22093

Differential Revision: D16036814

Pulled By: Chillee

fbshipit-source-id: 3c7106f907415ed0f600acaf45d2c61e1c60867a
2019-06-27 13:55:43 -07:00
b1096995d5 Update ThroughputBenchmark to reflect new script::Module API (no (#22291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22291

There was a race between landing the benchmark diff and
https://github.com/pytorch/pytorch/pull/21934 from zdevito. This PR
should fix the issue.

Reviewed By: zdevito

Differential Revision: D16023640

fbshipit-source-id: 931714352e656f045f9ef3cd17422db51b168384
2019-06-27 12:57:27 -07:00
177b8bf6e7 Named inference rule for more pointwise ops. (#22268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22268
ghimport-source-id: c722f9fbb3fc529c872dcccbf58ba1a8c5fcda8e

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

Imported from OSS

Differential Revision: D16030549

Pulled By: zou3519

fbshipit-source-id: 5cbb2c8626335a32a22ed8079245a5faa7cf553f
2019-06-27 12:49:36 -07:00
7732b1a604 Enable named inference for some unary pointwise ops (#22267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22267
ghimport-source-id: 1566df9a20712cada6ea1209e000c5ff757daa14

Test Plan: Imported from OSS

Differential Revision: D16030550

Pulled By: zou3519

fbshipit-source-id: 183ca1d14dc0fb6f1ee6e114b48c2703c61e11ce
2019-06-27 12:49:32 -07:00
69b702a6eb Implement unify_from_right (#22223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22223
ghimport-source-id: b88bd2a13c1c9c699945a69ec05300c6e598e95a

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

gh-metadata: pytorch pytorch 22223 gh/zou3519/62/head

Imported from OSS

Differential Revision: D16030551

Pulled By: zou3519

fbshipit-source-id: f3d53e3f9b2428a4926c61a02631e6cd29f89e4b
2019-06-27 12:49:29 -07:00
6386e4d244 Named inference rule for abs. (#22151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22151
ghimport-source-id: 54c1726b578ac162af817f78df6f540b764e46e3

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

Imported from OSS

Differential Revision: D15970326

Pulled By: zou3519

fbshipit-source-id: 4ea25f0a73bbc24b604d3ded2027eeb4ce800de0
2019-06-27 12:49:25 -07:00
2913f6a26d Adding modules for Python 3 compatibility (#22295)
Summary:
To improve the python 3 compatibility and make linter happy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22295

Reviewed By: zrphercule

Differential Revision: D16024957

Pulled By: houseroad

fbshipit-source-id: c0eddf731891b2f547ba619b3c2f6b2d7a32f034
2019-06-27 12:06:40 -07:00
6947e192f7 Remove unused param in Caffe2 LayerNormGradientOp (#22282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22282

Remove unused param in Caffe2 LayerNormGradientOp

Reviewed By: bddppq, houseroad

Differential Revision: D16017117

fbshipit-source-id: bdd0bd2aca009e549dfd2bf622494dfc791589e3
2019-06-27 11:22:44 -07:00
be0631b6ee Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D16023573

fbshipit-source-id: 3ea9bd905090e2a176af654a8ca98c7d965ea679
2019-06-27 11:08:18 -07:00
c9626a11cc Made a += b for lists do an in place add (#21896)
Summary:
In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()`

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:
```
Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.
```
Previously, `batch_gather jit` took 3x as long as `batch_gather`.

Some logic taken from https://github.com/pytorch/pytorch/pull/21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`.

Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21896

Differential Revision: D15998628

Pulled By: Chillee

fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3
2019-06-27 10:59:24 -07:00
bf677b8849 Set MKLDNN (default) build variables in CMakeLists.txt, not in Python build scripts (#22215)
Summary:
This is yet another step to disentangle Python build scripts and CMake
and improve their integration (Let CMake handle more build environment
detections, and less by our handcrafted Python scripts).

The processor detection logic also changed a bit: Instead of detecting
whether the system processor is PPC or ARM, this PR changes to detect
Intel CPUs, because this is more precise as MKL only supports Intel
CPUs. The build option `USE_MKLDNN` will also not be presented to
users on non-Intel processors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215

Differential Revision: D16005953

Pulled By: ezyang

fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763
2019-06-27 10:21:55 -07:00
d2bad941f4 Fix lint issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22303

Differential Revision: D16030302

Pulled By: ifedan

fbshipit-source-id: 5564f6f810382f31f9416e5881978b03f51e53a9
2019-06-27 09:27:16 -07:00
zaf
e9d1b852c4 Functional conv2d (#21225)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21225 Functional conv2d**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15544061/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225

Test Plan:
`buck test mode/dev caffe2/test:quantized -- test_conv_api`: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929

```
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.1 sec
Building: finished in 5.1 sec (100%) 6958/6958 jobs, 2 updated
  Total time: 6.3 sec
Trace available for this run at /tmp/testpilot.20190603-163323.4026295.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 17661db57af88ec71497f5c21efa86531c07662b fbpkg ce57c6c1c73f45c4aa890e9df65820c3 at Sat Jun  1 17:06:32 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/625/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 6.962 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
Summary (total time 10.65s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: dskhudia

Differential Revision: D15544061

Pulled By: zafartahirov

fbshipit-source-id: 700c0c78b5915bf7e54bda7c44f44b7b1e247f4d
2019-06-27 09:19:54 -07:00
59c42595e0 Enabled gather and scatter for bool tensor (#21924)
Summary:
- moving stuff around in order to enable bool.
- Added implementation of atomicAdd(bool, bool)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924

Differential Revision: D15883711

Pulled By: izdeby

fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e
2019-06-27 09:07:50 -07:00
f13fadd510 fix python2 corner-case in torch.distributed.launch (#20996)
Summary:
Small fix for the comment raised in 4cf76574b9 (r33134850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20996

Differential Revision: D15991510

Pulled By: pietern

fbshipit-source-id: 4e5a35864b5a4ec9402aa83a19c4a3ba0df2f01f
2019-06-27 05:19:37 -07:00
f39b6624ba ChunkDataset checkpoint support (#21889)
Summary:
When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint.

This change adds support for checkpoint save/load logic in ChunkDataset.

On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint.

The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889

Differential Revision: D16024582

Pulled By: ailzhang

fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5
2019-06-26 22:54:14 -07:00
30d890c672 Removed an outdated comment above IMPLEMENT_UNARY_OP_VEC(abs) (#22272)
Summary:
due to 82b570528db0a43fc04bb90f5d4538c01e4a5582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22272

Differential Revision: D16024544

Pulled By: bddppq

fbshipit-source-id: 37955bff3301975c0dd6abde8a3ba79af0555111
2019-06-26 22:24:13 -07:00
f144b9ebef Fix two overindent lint errors in test/common_nn.py. (#22287)
Summary:
This keeps causing lint tests to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22287

Differential Revision: D16024524

Pulled By: bddppq

fbshipit-source-id: a3e3780a55943283e9c854e94ac06ea4715e5319
2019-06-26 21:41:41 -07:00
e6d4a2d289 Remove unused file cmake/Modules/FindMIOpen.cmake (#22244)
Summary:
`cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244

Differential Revision: D16000771

Pulled By: bddppq

fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34
2019-06-26 21:21:46 -07:00
5e0a74dd70 Rename copy_tensor_data to copy_tensor_metadata (#22266)
Summary:
The original name `copy_tensor_data` could be confusing because users are not sure whether it deep-copies data in the tensor's storage or just copies the tensor's metadata. The renaming makes it more clear.

cc. ailzhang This might break XLA build, but I think the renaming makes it more clear why we use `copy_tensor_data` in XLATensorImpl's shallow-copy functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22266

Differential Revision: D16014724

Pulled By: yf225

fbshipit-source-id: f6ee966927d4d65d828b68264b3253b2f8fd768d
2019-06-26 21:16:57 -07:00
45c6fa0007 Refactor Tests for Multiple ONNX Opsets (#20036)
Summary:
Refactor tests for https://github.com/pytorch/pytorch/pull/19294.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20036

Reviewed By: zrphercule

Differential Revision: D16016593

Pulled By: houseroad

fbshipit-source-id: eaae324e347679acf3d0ac1c14be03919f54496e
2019-06-26 17:06:57 -07:00
f51de8b61a Back out "Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark" (#22185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22185

Original commit changeset: 72a0eac1658b

Differential Revision: D15981928

fbshipit-source-id: d2455d79e81c26ee90d41414cde8ac0f9b703bc3
2019-06-26 16:05:51 -07:00
3f2a839dda Add comments to bailoug_graph.*
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22161

Differential Revision: D15975355

Pulled By: Krovatkin

fbshipit-source-id: dca0095b4f05cff8277663ad38b65eeb44417f40
2019-06-26 15:39:38 -07:00
04fe2453c4 conv2d/conv3d for LongTensor (#20730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20730

Generates forward conv2d function for LongTensor

Differential Revision: D15423753

fbshipit-source-id: 0e770b61257cc4c6559581796bf104ef68155c84
2019-06-26 15:29:56 -07:00
3ba72a11db Revert D15999938: [jit] Add the rest of the dict API
Differential Revision:
D15999938

Original commit changeset: 7bc2a55e3f79

fbshipit-source-id: e377c00e990d6f058960936e69712b77851c06fa
2019-06-26 14:16:37 -07:00
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
8b02522b93 Avoid copy in ArrayRef<->vector comparison (#22218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22218

-

Differential Revision: D15990763

fbshipit-source-id: 53c98f915fadc8a65aea896c80292d5804d967a4
2019-06-26 13:36:21 -07:00
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
5bdc4db26e Refactor named tensor helper code (#22150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22150
ghimport-source-id: 460022febc24f49b86f0d5dbf8dc227564bde6cb

Test Plan: Imported from OSS

Differential Revision: D15970325

Pulled By: zou3519

fbshipit-source-id: 86a3e3ca82bbf4ff815431e25c5f9a35fcd23be0
2019-06-26 11:33:29 -07:00
29b53b0259 Fix bug in caffe2 transpose on GPU (#22233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233

Fix bug in caffe2 transpose on GPU

Reviewed By: hl475

Differential Revision: D15994973

fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d
2019-06-26 11:33:25 -07:00
2dc9643080 Better error message for mismatched dict key type (#22231)
Summary:
](https://our.intern.facebook.com/intern/diff/15993936/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22231

Pulled By: driazati

Differential Revision: D15993936

fbshipit-source-id: 6822ef01477a3b32beb8c037a621fa71abd022c8
2019-06-26 10:46:45 -07:00
af9e0085f2 Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D15999938

fbshipit-source-id: 7bc2a55e3f791015a0ff2e3731703075cf0770ee
2019-06-26 10:40:29 -07:00
25eae3ed08 Disable test_proper_exit flaky worker_kill (#22208)
Summary:
I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208

Differential Revision: D15990307

Pulled By: soumith

fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7
2019-06-26 09:47:40 -07:00
a4f281446b introduce flags to set omp and mkl threads (#21472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472

as title

Reviewed By: hl475

Differential Revision: D15695846

fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc
2019-06-26 09:33:05 -07:00
5f84f372a6 Use variable_data() in tensor_to_numpy (#22214)
Summary:
As part of the Variable/Tensor merge, we want to gradually remove call sites of `tensor_data()` and the API itself, and instead uses `variable_data()`. This PR removes the `tensor_data()` call in the tensor_to_numpy conversion path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22214

Differential Revision: D15997397

Pulled By: yf225

fbshipit-source-id: 6fcab7b14e138824fc2adb5434512bcf868ca375
2019-06-26 08:57:47 -07:00
f176950a67 Use lower case for strong wolfe option. (#22092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092
ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5

Test Plan: Imported from OSS

Differential Revision: D15955996

Pulled By: vincentqb

fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e
2019-06-26 08:20:25 -07:00
9f22805cc6 Refactor function_wrapper.create_generic (#22077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22077
ghimport-source-id: 39cf0a2e66e7fa2b6866af72782a22a4bd025e4c

Test Plan:
- Compared the build/aten/src folder before and after this change
locally and verified they are identical (`diff -r`).
- Wait for CI + Also, [namedtensor ci]

Imported from OSS

Differential Revision: D15941967

Pulled By: zou3519

fbshipit-source-id: d8607df78f48325fba37e0d00fce0ecfbb78cb36
2019-06-26 08:20:21 -07:00
b297552887 Make nn functions configurable for different scalar types (#20729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20729

Currently there is no way to specify what scalar types each nn function will support.

This change will allow to specify supported scalar types for each function/backward function and device. By default each function will support Float, Double, Half.

If you want to scpecify any extra supported scalar types, other then default, you will need to change nn.yalm:

- name: _some_func(Tensor self)
  cname: SomeFunction
  CPU:
    forward_scalar_types: ['Float', 'Double', 'Long']
    backward_scalar_types: ['Float', 'Double']

Differential Revision: D15423752

fbshipit-source-id: b3c157316d6e629bc39c1b377a3b23c71b1656cf
2019-06-26 07:53:38 -07:00
95b5718007 Prevent VS from emitting errors when using swap in Optional.h (#22182)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21706
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22182

Differential Revision: D15981740

Pulled By: ezyang

fbshipit-source-id: d58b3ca3aea8d3d383150208b87fa4bbd4f6fe33
2019-06-26 07:29:35 -07:00
fde75a33e1 update IterableDataset doc to be consistent with current behavior
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230

Differential Revision: D15994680

Pulled By: ezyang

fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4
2019-06-26 06:49:22 -07:00
655a370859 restoring HEADs for ideep and onnx to more recent versions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22250

Differential Revision: D16003227

Pulled By: Krovatkin

fbshipit-source-id: bf906a8e9e5e0f79391e5984c6cdfb9638d84981
2019-06-26 02:19:17 -07:00
17b37eb353 Bump gloo (#22225)
Summary:
This includes:
* Removal of builder classes
* Add allgatherv
* Add bcube allreduce algorithm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22225

Differential Revision: D16003629

Pulled By: pietern

fbshipit-source-id: fd062b82bfeeddb8190206d9931a781c7daff6f9
2019-06-26 00:55:36 -07:00
c1fc2f25c2 export deleteFunction in torch/csrc/autograd/function.h (#22236)
Summary:
In `torch/csrc/autograd/function.h` we define `torch::autograd::Function`, a (the?) central autograd record-holding class. `Function` is declared public API (`TORCH_API`).

We also define a custom deleter `deleteFunction` which we use throughout PyTorch's own use of `Function`. This trivial PR declares the deleter public API as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22236

Differential Revision: D16001335

Pulled By: yf225

fbshipit-source-id: 6ef0a3630e8f82f277a0e6e26cc64455ef7ee43e
2019-06-25 20:46:09 -07:00
e8bc992b03 print device when it's not on default device (#22094)
Summary:
we used to not print device when it's on xla. It's sometimes confusing as it looks the same as cpu tensor...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22094

Differential Revision: D15975405

Pulled By: ailzhang

fbshipit-source-id: f19ceb9e26f5f2f6e7d659de12716f0dfe065f42
2019-06-25 20:28:50 -07:00
1a164bf30b remove unused mkldnn include (#22217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22217

Seems introduced in https://github.com/pytorch/pytorch/pull/19209/files which is
no longer used. Remove it to simplify mobile build.

Reviewed By: bddppq

Differential Revision: D15983344

fbshipit-source-id: 37ee0bfbd022da09af6bc44c6e3fec1c99a8e732
2019-06-25 17:45:39 -07:00
de85abf226 Allow default construction of Dict/List (#22084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084

For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.

Differential Revision: D15948098

fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
2019-06-25 17:40:48 -07:00
e425789286 Fix "missing return statement" warning (#22216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216

-

Differential Revision: D15989670

fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043
2019-06-25 16:57:42 -07:00
f7a126f941 fix optional type subtype relation (#22186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22186
ghimport-source-id: 05ef8c3a176fe2a67d4835888e6db52b57a6d199

Test Plan: Imported from OSS

Differential Revision: D15994644

Pulled By: wanchaol

fbshipit-source-id: 7c5c4eebd421f6c9470661c2c2eb38bafdff8bbd
2019-06-25 16:57:38 -07:00
defd23b8b9 Clean up old uses of checkScript (#22002)
Summary:
This cleans up the `checkScript` API and some old tests that were hardcoding outputs. It also now runs the Python function when a string is passed in to verify the outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22002

Differential Revision: D15924485

Pulled By: driazati

fbshipit-source-id: ee870c942d804596913601cb411adc31bd988558
2019-06-25 16:24:19 -07:00
7b1ffba3bf ArgumentStash for Scalar arguments (#21931)
Summary:
Scalars are being traced as constants.
This PR is to fix this issue.

The ONNX Graph for Test_Full_op() before and after this change:

def Test_Full_op():
  class Test_Full(nn.Module):
    def forward(self, x):
      return torch.full((3, 4), x, dtype=torch.long)
  model = Test_Full()
  x = torch.tensor(12)
  output = model(x)

Before this change:
graph(%input1 : Long()):
%output1 : Float(3, 4) = onnx::Constant[value=<Tensor>]
return (%output1)

After this change:
graph(%input1 : Long()):
%1 : int[] = onnx::Constant[value= 3 4 [ Variable[CPULongType]{2} ]]
%2 : Tensor = onnx::ConstantOfShape[value={0}]
%output1 : Float(3, 4) = onnx::Add(%2, %input1)
return (%output1)

Similar PR : https://github.com/pytorch/pytorch/pull/12939
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21931

Reviewed By: zrphercule

Differential Revision: D15950066

Pulled By: houseroad

fbshipit-source-id: 3470665d88fa34faa600940ef16b069a06002cd5
2019-06-25 15:22:08 -07:00
7ee82d48a8 Removed work around for convolution transpose op since the bug has be… (#22184)
Summary:
…en fixed in v0.18
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184

Differential Revision: D15982627

Pulled By: bddppq

fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638
2019-06-25 14:34:34 -07:00
5b87049c66 remove uses of std::shared_ptr<Module> (#21934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21934
ghimport-source-id: e64ab9096f43749ead3ac5567675b815da295664

Test Plan: Imported from OSS

Differential Revision: D15892401

Pulled By: zdevito

fbshipit-source-id: 6424139206593ff944556c69d8a54723884eacaf
2019-06-25 13:24:38 -07:00
1d705b4b07 Run clang-format on c10d bits (#22194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22194

TSIA

Differential Revision: D15983780

fbshipit-source-id: 1365bcf9bbc262a3657f646e81d2fc9c32f24c97
2019-06-25 12:34:52 -07:00
f5a1ea170b SIMD version average pooling added (#22148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22148

Average pooling is added into dnnlowp optimization code.

Reviewed By: jspark1105

Differential Revision: D15936556

fbshipit-source-id: 6177ee62529801898f230c6fb89e9c4b598593a5
2019-06-25 12:19:21 -07:00
a7cb07eb0f Add missing algorithm header to Array utility (#22157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22157

This header uses `std::swap_ranges` function which is defined in `<algorithm>` header (https://en.cppreference.com/w/cpp/algorithm/swap_ranges). Therefore this file isn't guaranteed to compile on all platforms.

This diff fixes the problem by adding the missing header.

Reviewed By: smessmer

Differential Revision: D15971425

fbshipit-source-id: e3edcec131f72d729161f5644ee152f66489201a
2019-06-25 12:19:17 -07:00
6ff0c6ca3f Remove THD (#22065)
Summary:
It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it.

Closes https://github.com/pytorch/pytorch/issues/18967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065

Reviewed By: mrshenli

Differential Revision: D15983669

Pulled By: pietern

fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165
2019-06-25 12:19:13 -07:00
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
4ec6fbefa6 Show deprecation warning when stateful lambdas are used as kernels (#21885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21885

If a kernel is defined as a stateful lambda

    static auto registry = torch::RegisterOperators().op("my::op", [some_closure] (Tensor a) {...});

this can have very unexpected behavior when kernels are instantiated. There is no guarantee that the state is kept.

In the options based API, state is already disallowed:

    // this is a compiler error
    static auto registry = torch::RegisterOperators().op("my::op", torch::RegisterOperators::options().kernel([some_closure] (Tensor a) {...}));

but we can't disallow it in the non-options-based API for backwards compatibility reasons.

We can, however, show a deprecation warning. This is what this diff introduces.

Differential Revision: D15867089

fbshipit-source-id: 300fa4772fad8e7d177eb7cb910063d360537a4a
2019-06-25 11:53:18 -07:00
c68119387d serialize torch.Size object (#20952)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20952

Differential Revision: D15514274

Pulled By: ailzhang

fbshipit-source-id: 8340a40fadfd06063f7f33b0d99d693e74d5defb
2019-06-25 10:44:35 -07:00
7daa96a3ce porting convtranspose3d to ATen (#22019)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/18353

CPU and GPU porting for convolution transpose 3d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22019

Differential Revision: D15985353

Pulled By: ezyang

fbshipit-source-id: 1c579577a32db24a1ce38f5ab9b3f1cb9c8f2a6e
2019-06-25 10:22:34 -07:00
9af8ea1ce5 Not expose mkldnn reshape and transpose (#22193)
Summary:
This PR is to make mkldnn reshape and transpose not exposes as Tensor API, please see the comments in https://github.com/pytorch/pytorch/pull/21943.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22193

Differential Revision: D15983434

Pulled By: bddppq

fbshipit-source-id: ad3514dfd8a3b0d89442eef752864e5d3f3d04f0
2019-06-25 09:52:47 -07:00
mal
c8b5f1d2f8 Switch autograd to use a pool of workers for each device (#21911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911
ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f

Test Plan:
Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error.
Performance testing:
[word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change)
[mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change)
[Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected)

Imported from OSS

Differential Revision: D15985852

fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c
2019-06-25 09:08:26 -07:00
94e83da55c Optimization of the Embedding and Embedding-Bag CUDA Kernel (#22016)
Summary:
Re-implementation of the `embedding_dense_backward_cuda()` and the `embedding_bag_backward_cuda_sum_avg()` functions.

#### Performance
Running a [Mortgage Workflow](https://github.com/EvenOldridge/MortgageWorkflowA) with a block size of 100K on a DXG-2 (single GPU), we see a 270% speedup:
```
Original version:    370,168 example/s
Optimized version: 1,034,228 example/s
```
The original version is bounded by the `EmbeddingBag_accGradParametersKernel_sum_avg`, which takes 70% of the CUDA execution time. In the optimized version, the optimized kernel now takes only 17% of the time.

#### Greater Numerical Stability
An added benefit is greater numerical stability. Instead of doing a flat sum where a single variable are used to accumulate the weights, this code uses two-steps where each GPU-thread computes a sub-result defined by `NROWS_PER_THREAD` before the final result are accumulated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22016

Differential Revision: D15944339

Pulled By: mrshenli

fbshipit-source-id: 398d5f48826a017fc4b31c24c3f8b56d01830bf0
2019-06-25 08:14:15 -07:00
b0bd8758fc Further remove redundant CMake option passing code for those CMake variables that are directly controlled by environment variables but with a different name. (#22154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22154
ghimport-source-id: 714b98566e70063925c4c9e10940a4fe46fb5a3d

Test Plan: Imported from OSS

Differential Revision: D15985376

Pulled By: ezyang

fbshipit-source-id: 60710125009cd8bf60b5600a3f05854d931d9844
2019-06-25 07:23:06 -07:00
ce1a9653a8 Remove more build options not needed to be explicitly set in Python build scripts. (#22153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22153
ghimport-source-id: 129d90626a8e64079477a744fbbaba58e139a852

Test Plan: Imported from OSS

Differential Revision: D15985375

Pulled By: ezyang

fbshipit-source-id: 925bb1c886633b002beb1da0754bb055aa971e21
2019-06-25 07:23:03 -07:00
839b496fbd Fixes bugs in torch.multinomial without replacement (#22183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22183
ghimport-source-id: f03c17178de115adbe983953a8f9f205e3df7721

Test Plan: Imported from OSS

Differential Revision: D15985324

Pulled By: ezyang

fbshipit-source-id: 6e9dc3b54d448f4bb374b004d7f1dd1ac5c014f6
2019-06-25 07:15:18 -07:00
b61693c0ed Optimize InstanceNormOp forward (#22130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22130

Optimize InstanceNormOp forward

For InstanceNormOp on CPU with order = NHWC, N = 128, C = 256, H = 56, W = 56: 183ms -> 115ms.

For InstanceNormOp on GPU with N = 256, C = 256, H = 112, W = 112:
NCHW: 1475ms -> 45ms
NHWC: 1597ms -> 79ms

Reviewed By: houseroad

Differential Revision: D15963711

fbshipit-source-id: 3fa03109326456b9f301514fecbefa7809438d3e
2019-06-25 01:04:53 -07:00
ac4913ee62 support both regularizable and sofmax re-weighting on sparse features in dot product (#22176)
Summary:
In order to select more important features in dot product among a list of candidate sparse features, we can assign one learnable weight on each feature, reweight each feature by multiplying the weight onto its embedding before dot product. We finally select features based on the weight magnitude after training.

We can perform L1 and/or L2 regularization on the weights. To summarize, the weights tend to shrink their values (avoiding overfitting) due to L2 regularization, and some weights will vanish to zero as L1. To avoid sparse feature embedding being ignored due to early collapse of weights, a piece lr warm up policy is used in optimizing regularization term, such that regularization is weak at first stage and gets stronger afterwards (a small lr constant in iters less than threshold 1, a medium lr constant in stage 2, and a final reasonable large lr constant in all iters after threshold 2). The features with nonzero and relatively large weights (in absolute value) will be selected for the module.

We can also apply softmax on the original weights to make it sum to 1. We can even boosting the softmaxed weights by multiply the number of softmax components, which essentially make them sum to the number of softmax components and avergae to 1. In this idea, all the weights are positive and sum to a constant. Regularization is not a must since we can count on the competition between softmax weights themselves to achieve reasonable re-weighting. We expect those weights be more dense, comparing with sparse ones from L1 regularization and we can select features based on top K weights.

Overall, we aim to demonstrate the selected feature set outperform current v0 feature set in experiments. Special acknowledgement goes to Shouyuan Chen, who initiated the work of regularizable weighting.

 ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22176

The diff will export updates to Github repository, as stated below.

{F162787228}

Basically, the updates on the files are summarized as below:

- adding logger messages
`caffe2/python/layer_model_helper.py`
- add ElasticNet regularizer, which combines both L1 and L2 regularization
`caffe2/python/regularizer.py`
- implement piecewarmup, specifically warm up with three constant pieces
`caffe2/sgd/learning_rate_functors.h, caffe2/sgd/learning_rate_op.cc, caffe2/sgd/learning_rate_op.h`

Differential Revision: D15923430

fbshipit-source-id: ee18902cb88c23b1b7b367cc727d690a21e4cda9
2019-06-24 21:27:33 -07:00
299ea84a70 Use latest stable flake8-bugbear in CI and fix B011 flake8 error. (#21944)
Summary:
- PyCQA/flake8-bugbear#53 has been fixed (but not yet closed on their side) and a new version of flake8-bugbear has been released on Mar 28, 2019. Switch CI to use the latest stable version.
- Fix the new B011 errors that flake8-bugbear catches in the current codebase.

 ---

B011: Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21944

Differential Revision: D15974842

Pulled By: soumith

fbshipit-source-id: de5c2c07015f7f1c50cb3904c651914b8c83bf5c
2019-06-24 20:48:15 -07:00
f5df0c9104 Don't end on inplace operators in einsum (#22111)
Summary:
Returning the result of an inplace `squeeze_` in `einsum` (which itself is traced) interacts badly with `autograd.Function`.

I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall.

Fixes: https://github.com/pytorch/pytorch/issues/22072
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22111

Differential Revision: D15974990

Pulled By: soumith

fbshipit-source-id: 477e7f23833f02999085f665c175d062e7d32acd
2019-06-24 20:39:20 -07:00
ede08492e1 Enabled mul for bool tensors on CUDA (#21771)
Summary:
Enable mul_cuda for bool tensors.
This is a helper PR to fix a [test failure](https://circleci.com/gh/pytorch/pytorch/1992191?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) in the other [PR](https://github.com/pytorch/pytorch/pull/21113).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21771

Differential Revision: D15883737

Pulled By: izdeby

fbshipit-source-id: 4c39644bbe8e80da4d14570862589944285d4bfe
2019-06-24 18:37:29 -07:00
3b700a43d5 Add missing whitespace in error message (#21904)
Summary:
The current error message displays as:
     `RuntimeError: index koccurs twice in output`
A whitespace is missing between the index and 'occurs'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21904

Differential Revision: D15878941

Pulled By: colesbury

fbshipit-source-id: 163dda1829bf4956978cd01fd0e751673580722d
2019-06-24 15:32:46 -07:00
88cdc16835 AveragePool: expand incomplete kernel_size for the C++ API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22075

Differential Revision: D15945260

Pulled By: mrshenli

fbshipit-source-id: 827660c19ebbdb5f0aae2f4eadb6025ae2f93674
2019-06-24 15:32:41 -07:00
2372e7ed2e DilatedMaxPool: expand incomplete kernel_size for the C++ API (#22073)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22032.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22073

Differential Revision: D15944471

Pulled By: mrshenli

fbshipit-source-id: 84b265be00d67aa7f13508ede0646763d2339f1d
2019-06-24 15:32:36 -07:00
b2a39314e7 Make Dropout.__repr__ consistent with other modules (#22110)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22106.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22110

Differential Revision: D15958821

Pulled By: ezyang

fbshipit-source-id: 89381dc3bfa79544580e20fea906cef4f5101b61
2019-06-24 15:27:06 -07:00
273b6c5bae Cast return value of vector.at() to void to avoid nodiscard warning in MSVC. (#22061)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22061

Differential Revision: D15957983

Pulled By: ezyang

fbshipit-source-id: e4416c5f0db2bc6b8bfaa27be52b942148ec7b3d
2019-06-24 15:27:02 -07:00
0ac28c8966 Quick fix for #18215, the CPU case (#21910)
Summary:
The bug is that when target_length == 0, there is no preceding BLANK state and the original implementation will lead to out of bound pointer access.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21910

Differential Revision: D15960239

Pulled By: ezyang

fbshipit-source-id: 7bbbecb7bf91842735c14265612c7e5049c4d9b3
2019-06-24 15:26:58 -07:00
41d0525de3 Improve repr for IncompatibleKeys (#22119)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20128.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22119

Differential Revision: D15961965

Pulled By: ezyang

fbshipit-source-id: 9cc397726e6bea5580e79d291cfc1ee75337fa0c
2019-06-24 15:26:54 -07:00
f1775796dd Fix minor issues with #21736 (#22074)
Summary:
cc mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22074

Differential Revision: D15965376

Pulled By: mrshenli

fbshipit-source-id: 50ff96de6390817d8ea52c04322c6bee3d649b32
2019-06-24 15:18:26 -07:00
a45898931c Document the Boolean tensor type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21601

Differential Revision: D15971573

Pulled By: gchanan

fbshipit-source-id: c07c57f989980149cb1307dcca6ba64dce52d0ef
2019-06-24 14:16:36 -07:00
7c4206499e Fix in ivalue::Future (#22114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22114
ghimport-source-id: 249b76c078e7af8ebb6cab113dd48dbd3e31e8dc

Test Plan:
ran intra_inter_benchmark with PARALLEL_BACKEND=NATIVE build

Imported from OSS

Differential Revision: D15958901

Pulled By: ilia-cher

fbshipit-source-id: 1c3dedc4cf1ff8166aeb26899a06c7287a499562
2019-06-24 12:56:46 -07:00
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
21da33f0f9 Better trace comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22090

Differential Revision: D15968440

Pulled By: Krovatkin

fbshipit-source-id: e55e03a4303adbaa576c4384e7a42410bd99da6e
2019-06-24 12:51:27 -07:00
f1c7fa0503 De-deprecate some warnings that hurt usability (#21999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21999
ghimport-source-id: a77b3aea3d3ed33f328e143203730f2655371837

Test Plan: Imported from OSS

Differential Revision: D15925892

Pulled By: bwasti

fbshipit-source-id: 2b4e0af40bc1c6d12c617ba8701d3a5f7a6d833d
2019-06-24 12:35:00 -07:00
2347a4032b Fix tracing docs and add more comprehensive examples (#22082)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21857
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22082

Differential Revision: D15968306

Pulled By: Krovatkin

fbshipit-source-id: a76e500b0b7192bd814931ec48bbe9c37b8b92e0
2019-06-24 12:10:19 -07:00
85cbe0d825 Fix Concat Dimension Bug (#22088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22088

This diff is similar to D14163001. We need to handle the edge case when add_axis=1.

Reviewed By: jspark1105

Differential Revision: D15949003

fbshipit-source-id: 328d1e07b78b69bde81eee78c9ff5a8fb81f629b
2019-06-24 10:32:48 -07:00
322261a4de Fix dispatching of backwards kernel for ROCm. (#22125)
Summary:
Use WARP_SIZE consistently also for the dispatch dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22125

Differential Revision: D15966661

Pulled By: bddppq

fbshipit-source-id: 93eb663e01aff3b49474504a2f96f060919edf0c
2019-06-24 10:32:44 -07:00
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
6edaa11e5a fix broken link
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22064

Differential Revision: D15951107

Pulled By: mrshenli

fbshipit-source-id: 0b8f97bd2bbac26855cd2889e1fc619770974ee2
2019-06-24 07:34:16 -07:00
77eda8de8e Support sparse gradients in DistributedDataParallel (#22037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22037

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15926383

fbshipit-source-id: 39c0d5dbd95bf0534314fdf4d44b2385d5321aaf
2019-06-24 07:34:12 -07:00
a7ec889de4 Add sparse tensor allreduce (#22036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22036

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

This is a resubmission of #19146.

Reviewed By: mrshenli

Differential Revision: D15926384

fbshipit-source-id: b6ee5d81606bfa8ed63c3d63a9e307613491e0ae
2019-06-24 07:34:09 -07:00
313960d52e Use at::detail::* instead of detail::* to avoid ambiguity in windows (#22029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22029
ghimport-source-id: d1a26a07faf101c644775267a141ba56cbd3f1c9

Test Plan: Imported from OSS

Differential Revision: D15965039

Pulled By: ezyang

fbshipit-source-id: 31baf405da6f7c6d9e31f5954ec827889dadf769
2019-06-24 07:18:02 -07:00
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
cd0d8480d3 Remove many build options redundantly specified in Python build scripts. (#21877)
Summary:
Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost.

For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877

Differential Revision: D15964996

Pulled By: ezyang

fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007
2019-06-24 07:17:54 -07:00
1b34ccfc78 Porting SpatialDilatedConvolution and VolumetricDilatedConvolution to ATen (#20983)
Summary:
This PR tackles issue https://github.com/pytorch/pytorch/issues/18352 .

Progress:
- [x] conv_dilated2d CPU
- [x] conv_dilated3d CPU
- [x] conv_dilated2d CUDA
- [x] conv_dilated3d CUDA
- [x] RocM port
- [x] Port of CUDA gemm and gemv
- [x] Refactored 2d and 3d functions as well as output and gradient computations into a single C++ template function
- [x] Cleanup
  + [x] eliminate forward functions
  + [x] eliminate buffers `columns` and `ones` from functions API
  + [x] eliminate out functions
  + [x] eliminate using `ones`

Note that col2im, im2col, col2vol, vol2col implementations are exposed in `ATen/native/im2col.h` and `ATen/native/vol2col.h`. The corresponding operators (not ported in this PR) should use these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20983

Differential Revision: D15958088

Pulled By: ezyang

fbshipit-source-id: 1897f6e15abbf5710e9413cd1e443c2e1dc7d705
2019-06-24 07:12:54 -07:00
3ba654e6d5 Add finding thnvrtc_library into torchconfig.cmake (#22126)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/21861#issuecomment-504805368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22126

Differential Revision: D15964930

Pulled By: ezyang

fbshipit-source-id: 0fb749784bec9af5a8ccbcf775fa7d9d4d34a4c6
2019-06-24 07:04:44 -07:00
08060e898b Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark
Differential Revision:
D15435461

Original commit changeset: db08829dc3f4

fbshipit-source-id: 72a0eac1658b2d3f885bc9a21c49fcc23030ae3e
2019-06-23 22:55:05 -07:00
d96ce9b9fe add for in dict support (#22006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22006
ghimport-source-id: d9686c0b61b0eea3787f48adce567249e4e8faf0

Test Plan: Imported from OSS

Differential Revision: D15948548

Pulled By: wanchaol

fbshipit-source-id: 4227502ca050099085ad481aef725ac2cab06d74
2019-06-23 20:49:35 -07:00
c9344fc9c4 add for in string support (#21990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21990
ghimport-source-id: 69b4882f8602c4088e7a833c43fd3cd37501a3c0

Test Plan: Imported from OSS

Differential Revision: D15948547

Pulled By: wanchaol

fbshipit-source-id: 057e7f4fb67c6dca98458ceb14414368e1a86260
2019-06-23 20:49:30 -07:00
eab35756d8 support iteration tuple unpacking (#21985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21985
ghimport-source-id: 1f20a8db7b6bad23b18ac1caefcb46b3fa141697

Test Plan: Imported from OSS

Differential Revision: D15948549

Pulled By: wanchaol

fbshipit-source-id: 758c9c3dfad40c4158aee21ddebcd25b711111d7
2019-06-23 20:49:26 -07:00
9b45237618 PyTorch ThroughputBenchmark (#20766)
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.

Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766

Test Plan: Added a new unit test

Differential Revision: D15435461

Pulled By: salexspb

fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6
2019-06-23 13:03:18 -07:00
c0f96aaf01 Restore default values on premature test exit (#22115)
Summary:
Previously any assert failures would leave the updated setting, making
the test suite semantics dependent on the order in which the tests are run.

The diff is large only due to the indentation change (might be good to review without whitespace changes).

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22115

Differential Revision: D15960875

Pulled By: soumith

fbshipit-source-id: 9313695277fc2d968786f13371719e03fff18519
2019-06-23 12:55:00 -07:00
887ecf797c Fix DictType isSubtypeOf (#22104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22104
ghimport-source-id: 9db14020f424cf2e021d63e9c0fe4017ac7cd6c8

Test Plan: Imported from OSS

Differential Revision: D15956726

Pulled By: jamesr66a

fbshipit-source-id: 85448deab70c5e5b7ab1132652836ed575581868
2019-06-22 16:36:34 -07:00
45b91bd326 refactor all for in range/tensor tests to be together with other for loop tests (#21950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21950
ghimport-source-id: b2491313bc2e0fcc10f77167c261cbae4d884ebb

Test Plan: Imported from OSS

Differential Revision: D15948546

Pulled By: wanchaol

fbshipit-source-id: 34dde28902ae5b8affbf6e4deaaffdb1d8ddd6ec
2019-06-22 01:38:14 -07:00
e0f5ab2c2e Tree based Iterator infrastructure: for in range/list/tensor/zip/enumerate (#21801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21801
ghimport-source-id: b019d3e9a6f9bf152991a01b40e424dff176ffaa

Test Plan: Imported from OSS

Differential Revision: D15948545

Pulled By: wanchaol

fbshipit-source-id: 6110a0f3ab08cbbb398441e8330f56083ecd2d99
2019-06-22 01:00:42 -07:00
a256b09ce9 Backout Liveness Tests again :-(
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22100

Differential Revision: D15956214

Pulled By: Krovatkin

fbshipit-source-id: 9b0c8ecf5b479bf878ffc31acc416bd8dbfe4b50
2019-06-22 00:18:21 -07:00
7b1d6c8912 Update intra_inter_benchmark (#22051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051
ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15933951

Pulled By: ilia-cher

fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba
2019-06-21 23:06:27 -07:00
91bf0a9f9d Move quantized tensor tests in test_torch.py to test_quantized_tensor.py (#22089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22089

att

Reviewed By: jianyuh

Differential Revision: D15950101

fbshipit-source-id: 70acdeeef3a05201d72f986d5a0005832efd75ff
2019-06-21 22:48:34 -07:00
b19b20efef fix minor comment (#21576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21576

Fix comment regarding original_tensor

Reviewed By: jianyuh

Differential Revision: D15733294

fbshipit-source-id: e2957f32dcf90859b77e61c931b64abdd066aabb
2019-06-21 22:23:53 -07:00
f7b2778cb1 s/uniqueName/debugName/ (#22096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22096
ghimport-source-id: 8f1d994b98432942b5beeb10bf6d30e447d51997

Test Plan: Imported from OSS

Differential Revision: D15956004

Pulled By: jamesr66a

fbshipit-source-id: 319d2d20ef0863249a8a2bdd228b4f792d37bfab
2019-06-21 20:54:53 -07:00
7d637de771 Reduce excessive CI printing in TestHub (#22043)
Summary:
https://github.com/pytorch/pytorch/pull/21132 reverted https://github.com/pytorch/pytorch/pull/19606.

Now these tests again print like 40% lines of CI outputs (e.g., https://circleci.com/gh/pytorch/pytorch/2041825?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link)

This PR now uses the functionality introduced in https://github.com/pytorch/vision/issues/862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22043

Differential Revision: D15947268

Pulled By: ailzhang

fbshipit-source-id: f84f4d6b86203dbe8687e04ae3ed8c99df0bdff8
2019-06-21 20:08:44 -07:00
63ca908026 Updating submodules
Reviewed By: yns88

fbshipit-source-id: d3374d2ee514cc0526559ffbac6dc11918ea71cf
2019-06-21 18:51:07 -07:00
856268c716 Revert D15947873: [JIT] s/uniqueName/debugName
Differential Revision:
D15947873

Original commit changeset: 31a2b30d0ce9

fbshipit-source-id: ef1c0f120c1835184d8106d176cea58ec6ad40b7
2019-06-21 18:51:03 -07:00
36e4b54420 s/uniqueName/debugName (#22048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22048
ghimport-source-id: a82d80ceec1d8055ce4cf62df10ade4a224109f8

Test Plan: Imported from OSS

Differential Revision: D15947873

Pulled By: jamesr66a

fbshipit-source-id: 31a2b30d0ce911edf5791ca10040a1e968750b06
2019-06-21 17:59:38 -07:00
4bc89bd5a6 Implement tensor.select(Dimname,int) (#21795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21795
ghimport-source-id: d13af6078a47de1d6045cfbb7d278c378fe734fe

Test Plan: Imported from OSS

Differential Revision: D15833457

Pulled By: zou3519

fbshipit-source-id: fa52aff25ce0e12f31da3eef83ea948b4f7a5d9f
2019-06-21 16:16:45 -07:00
18a904c12e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 494a8fe00cbdb782bbdb05eefb17e9166d117599
2019-06-21 15:14:46 -07:00
f164c01f9c Adding liveness test cases back
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21762

Differential Revision: D15943509

Pulled By: Krovatkin

fbshipit-source-id: 4b65bf63ab15a2347da5f7269cc0f2dbb226b330
2019-06-21 15:09:09 -07:00
38aa5a519e Experimental option to use single thread pool (#22047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22047
ghimport-source-id: 8731538a091997fd31d6aff59152dc9241de2ba4

Test Plan:
EXPERIMENTAL_SINGLE_THREAD_POOL=1 PARALLEL_BACKEND=NATIVE_TBB
USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0
BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake
./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel
./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15931188

Pulled By: ilia-cher

fbshipit-source-id: 1ca1b190b6e16ce5398f2dad72deaf3cb083a43b
2019-06-21 14:54:16 -07:00
5ff06a7b0b more complete tuple assignments (#21949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21949
ghimport-source-id: 458793d74af3728bf0338867b081157905a7635a

Test Plan: Imported from OSS

Differential Revision: D15948550

Pulled By: wanchaol

fbshipit-source-id: 9ed69e0859e052816f06fc9c288b905551b2e48c
2019-06-21 14:49:38 -07:00
4009089d1f Sparse BLAS: Remove workaround to check zero length inputs. (#22080)
Summary:
Fix was released with ROCm 2.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22080

Differential Revision: D15947427

Pulled By: bddppq

fbshipit-source-id: b6b66f4cfc334ddc6140d1d519792d4783ba0efa
2019-06-21 14:45:06 -07:00
04e9278306 First round of optimizations for segment_reduction_op kernels. (#22081)
Summary:
Apply launch bounds annotations for ROCm as the maximum threads per
block (1024) is higher than the ROCm internal default (256).

Reduce the minBlocksPerMultiprocessor for ROCm to 8 from 16 as this
improves performance in some microbenchmarks by (statistically
significant) 4%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22081

Differential Revision: D15947426

Pulled By: bddppq

fbshipit-source-id: b4b7015417f99e14dfdedb62639e4d837c38e4fd
2019-06-21 14:33:12 -07:00
1c5fe2e8c4 Add support for Python 3.8 Constant node (#22007)
Summary:
We can't really test these until we get Python 3.8 in the CI, but these all work locally and won't be invoked at all for Python 3.7 and lower so this should be pretty safe.

Fixes #21710
](https://our.intern.facebook.com/intern/diff/15914735/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22007

Pulled By: driazati

Differential Revision: D15914735

fbshipit-source-id: 83833cebe7e38b162719a4f53cbe52c3fc638edd
2019-06-21 14:22:06 -07:00
f9b3989206 handle slice with negative indices and indices exceeding tensor dimen… (#21811)
Summary:
handle slice with negative indices and indices exceeding tensor dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21811

Reviewed By: zrphercule

Differential Revision: D15944243

Pulled By: houseroad

fbshipit-source-id: f7d987e9d8d704ade9d489599df14afbf1333428
2019-06-21 13:37:54 -07:00
38c9bb8261 Remove most usages of THCHalfAutoNumerics. (#21878)
Summary:
This was originally introduced between at::Half, which overloaded a number of operators; since this isn't necessary anymore, get rid of it.

Note in many cases, these files still need THCNumerics.cuh (which was included by THCHalfAutoNumerics); I was not careful about isolating these usages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21878

Differential Revision: D15941236

Pulled By: gchanan

fbshipit-source-id: 65f30a20089fcd618e8f3e9646cf03147a15ccba
2019-06-21 12:40:38 -07:00
06c3bd0302 Improve ListPtr::extract() (#21753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21753

- it accidentally didn't move non-IValue-based lists before. This is fixed now.
- it only needs to recreate a T() for IValue-based lists

Reviewed By: resistor

Differential Revision: D15809220

fbshipit-source-id: 944badf1920ee05f0969fff0d03284a641dae4a9
2019-06-21 12:26:01 -07:00
fe580e850e Rewrite lerp operator to use TensorIterator and support compile-time vectorization. (#22038)
Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
2019-06-21 11:39:27 -07:00
28630529ac Limit overall number of threads used by TBB (#22045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22045
ghimport-source-id: ea49ae04d86677f7a73a07968ce454eb1128fb84

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel

Imported from OSS

Differential Revision: D15930319

Pulled By: ilia-cher

fbshipit-source-id: 4c33ae395965e5708f8d7ceb67495b303fc4d22c
2019-06-21 11:39:18 -07:00
82dd69326b Split nn.Module._save_to_state_dict to make it overridable (#21933)
Summary:
# Motivation

We allow to override JIT module serialization with `__getstate__/__setstate__` in order to cover cases where parameters are not serializable. Use cases include: MKLDNN integration: a388c78350/torch/utils/mkldnn.py (L18-L26)
and also fbgemm prepacked format integration for quantized tensors.

However many Eager scripts use `torch.save(module.state_dict())` form of serialization. There are several ways to make it work:

* make packed_weight itself pickleable (e.g. by binding `__getstate__/__setstate__` on C++ UDT level)
    * change: we’d need to allow module buffers to be of arbitrary, non-Tensor types
    * pro: no change to state_dict behavior
    * cons: might not be directly inspectable by user calling .state_dict(), especially if packed weights represent several tensors fused together
* make packed_weight being proper Tensor layout
    * pro: no change to state_dict or buffers behavior
    * cons: adding new tensor layouts is pretty costly today
    * cons: doesn’t work if multiple tensors are packed in one interleaved representation
* *[this approach]* allow Modules to override state_dict and return regular tensors
    * pro: most flexible and hackable
    * pro: maintains semantic meaning of statedict as all data necessary to represent module’s state
    * cons: complicates state_dict logic
    * cons: potential code duplication between `__getstate__/__setstate__`

Based on discussions with zdevito and gchanan we decided to pick latter approach. Rationale: this behavior is fully opt-in and will impact only modules that need it. For those modules the requirement listed above won't be true. But we do preserve requirement that all elements of state_dict are tensors. (https://fburl.com/qgybrug4 for internal discussion)

In the future we might also implement one of the approaches above but those are more involved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21933

Differential Revision: D15937678

Pulled By: dzhulgakov

fbshipit-source-id: 3cb5d1a8304d04def7aabc0969d0a2e7be182367
2019-06-21 09:55:22 -07:00
b2197ef2b0 Adding support for JIT Fusion on Windows for CUDA (#21861)
Summary:
This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861

Differential Revision: D15940939

Pulled By: soumith

fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1
2019-06-21 09:44:17 -07:00
edb5a1662e Remove getDeviceFromPtr and allocator from Type (#21940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21940
ghimport-source-id: 0a618878ae030f663b05662f83ac4b549a90ba29

Test Plan: Imported from OSS

Differential Revision: D15893330

Pulled By: li-roy

fbshipit-source-id: a3dfb6b4ed0c72f7f3efd00192fb63aabc9c5967
2019-06-21 01:05:33 -07:00
b36a041d6f Move UnsafeTensorFromTH and UnsafeStorageFromTH off Type (#21923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21923
ghimport-source-id: f015c8521ef9071eaa982cbf73c13aa925035956

Test Plan: Imported from OSS

Differential Revision: D15883390

Pulled By: li-roy

fbshipit-source-id: 6a7a7ffbe6000199d41cdca5efb97371f46dd8fe
2019-06-21 01:05:29 -07:00
5d7cf66862 add Int8SpatialBNRelu (#22014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22014

Add Int8SpatialBN + Relu fused operator.

Reviewed By: dskhudia

Differential Revision: D15916551

fbshipit-source-id: a938e0f0e105ab5f823a3cb6144f50aa2ab944c1
2019-06-20 23:23:04 -07:00
7d81e62562 Add mkldnn tests for running end to end resnet models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22041

Differential Revision: D15928786

Pulled By: bddppq

fbshipit-source-id: 4b12e5bda2da13aba2d63d357a0a854d59317362
2019-06-20 22:42:49 -07:00
71741ba115 rename test to be more consistent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057

Differential Revision: D15936870

Pulled By: soumith

fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69
2019-06-20 22:02:36 -07:00
a3fc6ed046 Hook up liveness into profiling pipeline.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21881

Differential Revision: D15931627

Pulled By: Krovatkin

fbshipit-source-id: dc825a563c7aceb5f66a2ed2a600d550b70941b2
2019-06-20 21:23:16 -07:00
3838324539 Add max/min/argmax/argmin/sort/argsort for quantized Tensor (#21546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21546

Added following methods for QTensor:
- max, min
- argmax, argmin
- sort, argsort

Reviewed By: dzhulgakov

Differential Revision: D15718117

fbshipit-source-id: 746b978d5722cb75e216fc65585bf206d45a7969
2019-06-20 21:00:03 -07:00
95aee81dd7 more general fusion logic (#22015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22015

Previous fusion logic only works for operators back-to-back in the linear order of protobuf file.
This diff generalizes to work for any predecessor-successor operators in the graph without any "interfering" use/def of the related blobs.

Reviewed By: csummersea

Differential Revision: D15916709

fbshipit-source-id: 82fe4911a8250845a8bea3427d1b77ce2442c495
2019-06-20 20:44:26 -07:00
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
d4119f8fcb Automatic update of fbcode/onnx to 355a4954ea4e5836a5e943589509951c44feb6b4 (#22030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22030

Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](https://github.com/onnx/onnx/commit/355a4954)**: Update codeowners to have community folder changes assigned to steering committee (#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](https://github.com/onnx/onnx/commit/ceaa5da7)**: Fix Resize/Upsample Shape inference function (#2085) <Raymond Yang>
- **[4de8dc0d](https://github.com/onnx/onnx/commit/4de8dc0d)**: Clarify shape inference requirements for new operators (#2088) <Hariharan Seshadri>
- **[52aa1fad](https://github.com/onnx/onnx/commit/52aa1fad)**: Fix NN defs file (#2083) <Hariharan Seshadri>

Reviewed By: bddppq

Differential Revision: D15924221

fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73
2019-06-20 15:52:45 -07:00
84a2d5d7aa Add hashing to bucket-weighted pooling (#20673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20673

Add option to bucket-weighted pooling to hash the bucket so that any cardinality score can be used.

Reviewed By: huginhuangfb

Differential Revision: D15003509

fbshipit-source-id: 575a149de395f18fd7759f3edb485619f8aa5363
2019-06-20 15:12:36 -07:00
1aae4b02df Fix 'error : detail is ambiguous' on Windows (#22025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22025
ghimport-source-id: 0fb408ad185a989507f7509a2a3574e1a7e60ab2

Test Plan: Imported from OSS

Differential Revision: D15926651

Pulled By: ezyang

fbshipit-source-id: 298340bfbfe44dcd81cde8f0d56f8dbde92fb7bd
2019-06-20 13:23:21 -07:00
19ef15709f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0be0694d6adf1ae9baa408a4b372101a26a14ba4
2019-06-20 12:59:31 -07:00
4cd7d78718 correct arange docs (#21992)
Summary:
https://github.com/pytorch/pytorch/issues/21579 correctly points out an inaccuracy in the docs for `arange`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21992

Differential Revision: D15914411

Pulled By: umanwizard

fbshipit-source-id: 3eb1734b29af3f3858f0f4d54c71e28dbda5c75b
2019-06-20 12:36:00 -07:00
08facca1a1 Support accumulating DDP grads using a context manager (#21736)
Summary:
The first attempt and more discussions are available in https://github.com/pytorch/pytorch/issues/19577

#### Goal

Allow toggling DDP gradient synchronization across iterations. With this feature, users may accumulate grads in module variables, and only kick off expensive grad synchronize every a few iterations.

#### Concerns

Our first attempt in https://github.com/pytorch/pytorch/issues/19577 tries to do it using a variable or a function. But apaszke made a good point that it will not be error prone, and favors a context manager instead.

#### Proposed Solution

Instead of providing a `accumulate_grads` variable/function/context, we provide a `DistributedDataParallel.no_sync()` context manager. And it does exactly what the name suggests, i.e., disable DDP grad synchronization within the context. Note that `accumulate_grads` means `no_sync` + no optimizer step, where the latter is not controlled by DDP.

It is true that users need to call another `model(input).backward()` after exiting the context, and this is indeed more verbose. But I think it is OK as one major concern in the previous discussion is to prevent users from running into errors without knowing it. This API should reaffirm the expected behavior, and does not mess up with other use cases if accumulating grads is not required..

The application would then look like:

```python
with ddp.no_sync():
  for input in inputs:
    ddp(input).backward()

ddp(one_more_input).backward()
optimizer.step()
```

chenyangyu1988 myleott
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21736

Differential Revision: D15805215

Pulled By: mrshenli

fbshipit-source-id: 73405797d1e39965c52016af5cf45b15525ce21c
2019-06-20 12:23:52 -07:00
40b9f8f0a0 Added more descriptive error message for index out of range (#21758)
Summary:
https://github.com/pytorch/pytorch/issues/21535
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21758

Differential Revision: D15922915

Pulled By: Chillee

fbshipit-source-id: dcb301a661c359f27869200ee241ec272ef50d3a
2019-06-20 12:12:03 -07:00
6bd58b7548 Move list / dict tests to TestList and TestDict (#22000)
Summary:
There aren't any substantive changes aside from some test renames (e.g. `TestScript.test_dict_membership` -> `TestDict.test_membership`) and the addition of `TestDict.dict()`.

Adding the rest of the dict ops was making the tests a mess and `TestScript` is already > 10000 lines by itself, so breaking them up should make things cleaner
](https://our.intern.facebook.com/intern/diff/15911383/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22000

Pulled By: driazati

Differential Revision: D15911383

fbshipit-source-id: 614428e03fbc14252f0e9cde74ab9a707169a860
2019-06-20 11:17:35 -07:00
0702b5f345 Partially parallelize randperm on CPU. (#21529)
Summary:
This commit parallelizes the variable initialization (from 1 to n) step on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21529

Differential Revision: D15855402

Pulled By: VitalyFedyunin

fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d
2019-06-20 10:44:01 -07:00
e388f70499 Move cppdocs build to CircleCI (#19768)
Summary:
The cppdocs build job (originally run on Chronos as a cron job) was frequently broken because it was not run on every PR. This PR moves it to CircleCI and enables it on every PR, so that we can get the build failure signal much earlier.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19768

Differential Revision: D15922289

Pulled By: yf225

fbshipit-source-id: e36ef59a2e42f78b7d759ee02f2d94dc90f88fff
2019-06-20 10:24:21 -07:00
76fe91bb2f Revert D14889547: Add sparse tensor allreduce
Differential Revision:
D14889547

Original commit changeset: 34f3de4d6a2e

fbshipit-source-id: 24d2239da0b865280af88dce3d8fb25883fc0174
2019-06-20 10:07:27 -07:00
cb4c213f55 Revert D15007365: Support sparse gradients in DistributedDataParallel
Differential Revision:
D15007365

Original commit changeset: f298e83fd3ca

fbshipit-source-id: ef5e556d2df37f0c64652bd3563956afd8d9fd7f
2019-06-20 10:07:22 -07:00
f8f583cbae Port convtranspose2d (#20994)
Summary:
this PR will resolve partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20994

Differential Revision: D15876052

Pulled By: ezyang

fbshipit-source-id: 5896e0cbb656d0530e39fd681808adc685841b37
2019-06-20 07:11:38 -07:00
365de7bda1 Support sparse gradients in DistributedDataParallel (#19443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19443

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15007365

fbshipit-source-id: f298e83fd3ca828fae9e80739e1db89d045c99ac
2019-06-20 07:06:28 -07:00
aee6a412e9 Add sparse tensor allreduce (#19146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19146

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

Reviewed By: mrshenli

Differential Revision: D14889547

fbshipit-source-id: 34f3de4d6a2e09c9eba368df47daad0dc11b333e
2019-06-20 07:06:24 -07:00
97ea44b34a Fix issue in quantization error measurement when followed by Relu (#21890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21890

As title

Reviewed By: jspark1105

Differential Revision: D15739808

fbshipit-source-id: 8fbcca04f0711fd9f994d67e1f4a604ef9fa42c6
2019-06-19 22:29:54 -07:00
b6f542f8a1 Add aten mkldnn transpose (#21943)
Summary:
This PR is about:

1.  Make mkldnn reshape can share same memory fro plain format tensor.

2.  Add mkldnn transpose operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21943

Differential Revision: D15916063

Pulled By: bddppq

fbshipit-source-id: d1971c67341f277c1e80c1fa34e213b6c27f4062
2019-06-19 22:20:46 -07:00
3d44cd6d19 Replace Type dispatch with ATenDispatch (#22008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22008
ghimport-source-id: b0a5cc3da283b195f88636e2a61939d2facd11d9

Test Plan: Imported from OSS

Differential Revision: D15914756

Pulled By: li-roy

fbshipit-source-id: 5bc300ec525a3ee9e6491dd4c55e78bbd977d691
2019-06-19 21:42:54 -07:00
5d67c606ea Added error for classes that don't have an init function (#21880)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21761
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21880

Differential Revision: D15879205

Pulled By: Chillee

fbshipit-source-id: 8b614970196b381357b6032a73eeaab0b7a4f667
2019-06-19 21:33:37 -07:00
4fee532de6 Pass loop_over optional parameter for cached reader properly. (#21929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21929

Just need to pass `loop_over` argument properly.

Reviewed By: noname01

Differential Revision: D15885401

fbshipit-source-id: f1928277262a80e5b41f4c4f3945c2f378a4e233
2019-06-19 18:15:32 -07:00
96c0bd3722 ListPtr->List DictPtr->Dict step 3 (#21938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21938

After having changed all call sites, we can now remove the old naming scheme.

Reviewed By: zdevito

Differential Revision: D15892402

fbshipit-source-id: 1f5b53a12fa657f6307811e8657c2e14f6285d2f
2019-06-19 18:02:08 -07:00
275087383b ListPtr->List DictPtr->Dict step 2 (#21937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937

This changes call sites to use the new naming scheme

Reviewed By: zdevito

Differential Revision: D15892404

fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
2019-06-19 18:02:05 -07:00
093c78f854 ListPtr->List DictPtr->Dict step 1 (#21936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21936

This introduces torch::List and torch::Dict as aliases to ListPtr/DictPtr.
After this lands, we can step by step change the call sites to the new naming
and finally remove the old spellings.

Reviewed By: zdevito

Differential Revision: D15892405

fbshipit-source-id: 67b38a6253c42364ff349a0d4049f90f03ca0d44
2019-06-19 18:02:01 -07:00
cba79f4872 Revert D15637222: [wip] Replace Type dispatch with ATenDispatch
Differential Revision:
D15637222

Original commit changeset: fcfaea0b5480

fbshipit-source-id: 9bca7ebb91d7a3609b86663089140d7c5a33f58d
2019-06-19 17:36:52 -07:00
15be5483c0 Move NamedType method definitions into cpp file (#21983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21983
ghimport-source-id: fe9c1eba5f4c737e1442b877d396b9e8e5298cfb

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15907633

Pulled By: jamesr66a

fbshipit-source-id: bd2dfdca117cdc3ae35fdd9d29cf521d82636069
2019-06-19 16:43:11 -07:00
f6aac41391 Defining object destructor in c10 (#21984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21984
ghimport-source-id: 5767592e37ed388422eed5639f8ba0722aec66e2

Test Plan: Imported from OSS

Differential Revision: D15906530

Pulled By: zdevito

fbshipit-source-id: bec8c8b0b5b9dcc2e8fc69b5031fcfa6bb22d54e
2019-06-19 16:27:40 -07:00
24a6c32407 Replace Type dispatch with ATenDispatch (#21320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21320
ghimport-source-id: cc18f746a1c74df858cb0f6d8b7d4de4315683c7

Test Plan: Imported from OSS

Differential Revision: D15637222

Pulled By: li-roy

fbshipit-source-id: fcfaea0b5480ab966175341cce92e3aa0be7e3cb
2019-06-19 15:46:45 -07:00
00fdb2cf95 Enable XLA by default on pull requests. (#21991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21991
ghimport-source-id: 5077e62a613c36256d2b5a2427aa9c3887c4a797

Test Plan: Imported from OSS

Differential Revision: D15907913

Pulled By: ezyang

fbshipit-source-id: c67bb999f02760836d1568c1a3911add3f1538f0
2019-06-19 15:01:49 -07:00
effcc398c4 Refactor Random Number Generators in ATen (#21555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21555
ghimport-source-id: dd900a8c3e1ef9ef1e011b8bb5476626d18cc462

Test Plan: Imported from OSS

Differential Revision: D15875780

Pulled By: ezyang

fbshipit-source-id: 6e04e90af62ab9c9593d74f344a3a084aaaf6f43
2019-06-19 13:54:09 -07:00
34aee933f9 ONNX Export Interpolate (Resize) for opset version 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21434

Reviewed By: zrphercule

Differential Revision: D15777197

Pulled By: houseroad

fbshipit-source-id: 517b06a54a234ffdb762401e83f5a732023ed259
2019-06-19 13:40:27 -07:00
44128e09f0 Speed up op lookup and registration (#21806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806

Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it.

This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle)
and it also speeds up op registration since that needs to look if an op with the same name already eists.

Differential Revision: D15834256

fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1
2019-06-19 12:05:14 -07:00
d1c80300ce Better stringification of dispatch keys in error messages (#21809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21809

Many error messages show dispatch keys, for example when the dispatcher didn't find a kernel to dispatch to.
Previously, this was a string like "CPU" or "CUDA" for known backends and just an arbitrary number for other backends.

Now, tensor type id registration also registers a name for the dispatch key and shows that in the error messages.

There is no API change, just the error messages are better now.

Differential Revision: D15835809

fbshipit-source-id: 4f0c9d0925c6708b02d79c653a2fae75b6623bb9
2019-06-19 11:44:24 -07:00
dd046bef8d NamedTuple serialization (#21839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21839
ghimport-source-id: b9d82018fbf26b22d58cad3a033cbfe4e879a8fe

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860002

Pulled By: jamesr66a

fbshipit-source-id: 0fc97c4adefa9ae4937f21179c7afa817f4099e5
2019-06-19 10:43:55 -07:00
5a37f8c63f Refactor TupleType to take a NamedTupleSpec (#21836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21836
ghimport-source-id: 91cab735765ff875046b42864188e86b8487b0ae

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860003

Pulled By: jamesr66a

fbshipit-source-id: 62a99a212ae6f9af83a90305e443f2dd05588292
2019-06-19 10:43:51 -07:00
c0be6e6290 Introduce SerializableType (#21835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21835
ghimport-source-id: e674048a56b9a573ba89e484f4b41818d3f08234

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860004

Pulled By: jamesr66a

fbshipit-source-id: 2d2905296939903ed4586932bea0a504b542bbdb
2019-06-19 10:43:47 -07:00
74104f383e Some small fixes for NamedTuple (#21813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21813
ghimport-source-id: a1edca8ad0384a9e493ef2f3b0aa5005a668a8f3

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860005

Pulled By: jamesr66a

fbshipit-source-id: 4a43432d2dacebde1a676a93ac57f675db857154
2019-06-19 10:43:43 -07:00
6b972795e4 Add torch.__future__._overwrite_module_params_on_conversion global flag, and check it in nn.Module._apply() (#21613)
Summary:
https://github.com/pytorch/pytorch/pull/17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type:
```python
# 6dc445e1a8/torch/nn/modules/module.py (L192-L208)
def _apply(self, fn):
    ...
    for param in self._parameters.values():
        if param is not None:
            # Tensors stored in modules are graph leaves, and we don't
            # want to create copy nodes, so we have to unpack the data.
            param.data = fn(param.data)  # NOTE: this doesn't allow changing `param.data`'s TensorImpl type
            if param._grad is not None:
                param._grad.data = fn(param._grad.data)  # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type
   ...
```

yf225 TODO: fix the description here when we finish the implementation

To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in.

We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model.

This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher.

[xla ci]

cc. resistor ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21613

Differential Revision: D15895387

Pulled By: yf225

fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6
2019-06-19 10:30:02 -07:00
Jie
056a033cdc updating upsampling bilinear2d kernel: (#21879)
Summary:
1. faster atomicAdd trick for fp16 backward kernel
2. better launch configs for backward kernel
3. removed unnecessary buffer initialization for forward kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21879

Differential Revision: D15898680

Pulled By: ezyang

fbshipit-source-id: 1fc81e6c078f1538d82e4f36921b630499eb504f
2019-06-19 07:42:21 -07:00
34536e207a Fix: convert Onnx DynamicSlice operator with 4 inputs to caffe2 fa… (#20846)
Summary:
I reported an issue [https://github.com/pytorch/pytorch/issues/20743](url)
and make this pull request for it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20846

Reviewed By: zrphercule

Differential Revision: D15569135

Pulled By: houseroad

fbshipit-source-id: 96a2c818ef666a7d79b96decfa347d7154b34d5c
2019-06-19 00:09:15 -07:00
4b1df5c1f5 Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865)
Summary:
When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right.

Note that this PR is BC-breaking in the following way:

Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865

Differential Revision: D15881952

Pulled By: yf225

fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd
2019-06-18 20:45:40 -07:00
abd6cffe55 Added some extra tests for std_mean and var_mean for multiple dims. (#20650)
Summary:
Added some extra tests for std_mean and var_mean for multiple dims.
Some refactoring of previously created tests based on PR comments: https://github.com/pytorch/pytorch/pull/18731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20650

Differential Revision: D15396101

Pulled By: ifedan

fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9
2019-06-18 20:36:32 -07:00
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
e239e31da6 Fix lint error (#21932)
Summary:
https://github.com/pytorch/pytorch/issues/21916 has broken python lint on master
https://travis-ci.org/pytorch/pytorch/jobs/547354937
```
./tools/build_variables.py:167:39: E261 at least two spaces before inline comment
./tools/build_variables.py:379:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:379:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:17: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:19: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:20: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:23: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:22: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:24: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:15: E251 unexpected spaces around keyword / parameter equals
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21932

Differential Revision: D15892041

Pulled By: bddppq

fbshipit-source-id: f62949a7617f8ea9c036ea9b48ab1e340a7af83e
2019-06-18 19:08:24 -07:00
3bdde56907 Fix incorrect usage of __HIP_PLATFORM_HCC__ (#21757)
Summary:
This avoid using `__HIP_PLATFORM_HCC__` in case it changes in the future.

Following up https://github.com/pytorch/pytorch/issues/21718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21757

Reviewed By: xw285cornell

Differential Revision: D15891867

Pulled By: bddppq

fbshipit-source-id: 5de55687ab1c86eddf6b4d8d25fee48d96ec72ad
2019-06-18 18:56:32 -07:00
a388c78350 fix bug in CompilationUnit::define (#21886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21886
ghimport-source-id: fefbd758bbe2fbcaaad84a376ac5f69c40bccb80

Test Plan: Imported from OSS

Differential Revision: D15867647

Pulled By: suo

fbshipit-source-id: 3e0f5bbc98ec93ccf26442c4c574626e45e53888
2019-06-18 15:41:55 -07:00
52e1cea057 Fix recusive method compilation (#21862)
Summary:
The code in `python_sugared_value.cpp` to recursively compile methods
was not being tested, so this adds a test for it and fixes some errors
in it

It was necessary to disable any hooks set since (at least in our tests) they would try to export
a half-finished graph since they were being called on recursively
compiled methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21862

Differential Revision: D15860314

Pulled By: driazati

fbshipit-source-id: e8afe9d4c75c345b6e1471072d67c5e335b61337
2019-06-18 14:08:56 -07:00
eda08b0aae script::Module as a view. (#21814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21814
ghimport-source-id: 49cfea6101ad9ca438600c465762e23252e05ff3

Test Plan: Imported from OSS

Differential Revision: D15839583

Pulled By: zdevito

fbshipit-source-id: ab4ef31a523b3ac1477aa7e6d4d9513e7408c560
2019-06-18 13:58:49 -07:00
94c61d4f32 Fix infinite loop in del_post_hook (#21914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21914

https://github.com/pytorch/pytorch/pull/21591 added a needed feature to clean up grad accumulator post hooks when the DistributedDataParallel model object is cleaned up. There's a minor typo that causes it to loop infinitely over the first element.

Differential Revision: D15878884

fbshipit-source-id: b7fd0bbd51eb187579d639b1709c6f7b62b85e7a
2019-06-18 13:43:59 -07:00
c0f51142cd Added a test case for an index error for the index_copy_ (#21912)
Summary:
Follow up PR with a test for the [fixed](4b45f08f87) [bug](https://github.com/pytorch/pytorch/issues/20322).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21912

Differential Revision: D15878674

Pulled By: izdeby

fbshipit-source-id: c8fef2214606c796d174d0faaaf633531a7bea88
2019-06-18 13:43:56 -07:00
ad00c12379 Clean up //caffe2/torch-cpp to avoid duplicated symbols (#21916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21916

Hopefully fixes https://fb.workplace.com/groups/1405155842844877/permalink/2832659000094547/

Reviewed By: rutyrinott

Differential Revision: D15862128

fbshipit-source-id: 77c01a57bddc39b267e307b50942e029a381711b
2019-06-18 13:05:22 -07:00
081cd3a293 Change AT_CHECK to TORCH_CHECK in python_arg_parser.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21887

Differential Revision: D15869483

Pulled By: jerryzh168

fbshipit-source-id: f3d9d73078e7c1c08ec79694105e18084e7f9caf
2019-06-18 10:48:38 -07:00
28ecc104f4 Fix WeakIValueEq (#21891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21891
ghimport-source-id: a037850c96fe803540412db9a88548fa41f2d4f0

Test Plan: Imported from OSS

Differential Revision: D15871588

Pulled By: jamesr66a

fbshipit-source-id: ecfdece1285c0737d0b1dc2afe959c43d9413001
2019-06-18 10:35:35 -07:00
010f238d17 Retry pip install to make pytorch rocm CI more stable (#21895)
Summary:
pip install randomly core dumps, examples:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25720//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25723//console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21895

Differential Revision: D15873197

Pulled By: bddppq

fbshipit-source-id: 2c967bc0a47bef9d3f7af83e99514c93b54e353f
2019-06-18 10:10:56 -07:00
5eb25c3704 Support in membership checks (#21527)
Summary:
This PR adds support for `in` checks like `key in my_dict`

For now it leaves lists as a follow up due to the changes around `IValue` lists and it needing an `IValue` equality op.

For objects it uses the magic method `__contains__(self, key)`
](https://our.intern.facebook.com/intern/diff/15811203/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21527

Pulled By: driazati

Differential Revision: D15811203

fbshipit-source-id: 95745060394f8a9450efaaf8ab09d9af83bea01e
2019-06-18 09:49:12 -07:00
afad3e4954 Add support for class annotations (#21379)
Summary:
This adds support for inferred attributes (everything except empty lists, dicts, and tuples) as well as using the PEP 526 style annotations on a class, so this eliminates the need for `torch.jit.Attribute`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21379

Differential Revision: D15718537

Pulled By: driazati

fbshipit-source-id: b7481ae3d7ee421613e931b7dc3427ef2a99757f
2019-06-18 09:49:09 -07:00
85528feb40 Mark test_snli as a slow test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21908

Pulled By: driazati

Differential Revision: D15875846

fbshipit-source-id: 98b79e7beee5ffd72e1f41d22e07e618547b23e9
2019-06-18 09:44:12 -07:00
0998a32588 Backward function will set a flag if it released variables (#21533)
Summary:
This is a fix for https://github.com/pytorch/pytorch/issues/21469
Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables.
Functions that do not have saved variables can be called multiple times for BC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21533

Differential Revision: D15810481

Pulled By: ifedan

fbshipit-source-id: 5663e0c14f1b65727abc0d078aef348078d6a543
2019-06-18 09:21:17 -07:00
f363a33e10 Set __file__ for torch.ops (#21888)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19351 https://github.com/pytorch/lockdown/issues/93
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21888

Differential Revision: D15871142

Pulled By: ailzhang

fbshipit-source-id: 339e9d493e2e13f09e118814bdd1d7a5942804b8
2019-06-18 08:46:23 -07:00
31e1e63bc2 Port avg_pool3d() to ATen (#21732)
Summary:
This will need a conflict resolution once avg_pool2d() has been merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21732

Differential Revision: D15824923

Pulled By: ezyang

fbshipit-source-id: 83341e0209b660aecf788272079d8135d78b6ff1
2019-06-18 08:33:30 -07:00
Jie
c471a63a39 UpSample-nearest cuda kernel update (#21694)
Summary:
updating upsampling kernel:
1. avoids atomicAdd for better fp16 performance.
2. better launch configures for 2D input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21694

Differential Revision: D15875791

Pulled By: ezyang

fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4
2019-06-18 08:24:25 -07:00
998efb48c3 Add at::dimname_to_position helper. (#21789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21789
ghimport-source-id: 42c0a58280f3645dd38ea11d39311a0c53f90488

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

Imported from OSS

Differential Revision: D15833455

Pulled By: zou3519

fbshipit-source-id: 8dd51a7b785972668984a7c161b94b92039a1cb1
2019-06-18 07:44:04 -07:00
8f9e0f77dd Turn off non-default stream testing. (#21793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793
ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4

Test Plan: Imported from OSS

Differential Revision: D15874814

Pulled By: ezyang

fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25
2019-06-18 07:00:08 -07:00
08a0ac84d7 Removed unused variable from closure in range (#21897)
Summary:
This was some code I added :^)

Time for me to remove it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21897

Differential Revision: D15873213

Pulled By: Chillee

fbshipit-source-id: 769c3bd71c542be4afddc02dc2f65aa5c751b10d
2019-06-18 02:21:50 -07:00
6042012a93 Fixed "tried to access to nonexistent attribute" -> "tried to access nonexistent attribute"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21863

Differential Revision: D15873204

Pulled By: Chillee

fbshipit-source-id: c5d85487b287ee9dd8318161ef9399ffd1ee0b68
2019-06-18 02:13:09 -07:00
df787cf079 Fixed a warning in test_jit.py (#21898)
Summary:
What's the point of having warnings if we never fix them :^)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21898

Differential Revision: D15873280

Pulled By: Chillee

fbshipit-source-id: a8274bab2badd840d36a9d2e1354677a6114ae1d
2019-06-18 01:15:15 -07:00
f1c1d1a964 Export the cosine_similarity op as an ATenOp correctly (#21884)
Summary:
cosine_similarity has two non-tensor parameters, needs some special handling. Add the support for its export in this diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21884

Reviewed By: zrphercule

Differential Revision: D15866807

Pulled By: houseroad

fbshipit-source-id: a165fbc00c65c44b276df89ae705ca8960349d48
2019-06-17 23:34:59 -07:00
3ed8acdf59 Fixes lint error in py3 (#21883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21883
ghimport-source-id: c4330d71033929178ef10f2a0fcd8b0b2b468cb5

Test Plan: Imported from OSS

Differential Revision: D15866746

Pulled By: bwasti

fbshipit-source-id: c3d23f3396a95d5b1d689a07662e82e48cb3ab7a
2019-06-17 22:20:06 -07:00
2ba164b943 Future interface for ATen/Parallel (#21764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21764
ghimport-source-id: fca083c09d814a0411020871f49429509fc0e8b5

Test Plan:
Imported from OSS
see https://github.com/pytorch/pytorch/pull/21764

Differential Revision: D15816658

Pulled By: ilia-cher

fbshipit-source-id: 0e25ca6ff66a837d4f69f37a47e59927ab10e216
2019-06-17 22:05:59 -07:00
d8314a6260 Replace nullary/unary/binary loops with generic implementation (#21475)
Summary:
```
This replaces the kernel helpers in Loops.h/cuh with the following:

  cpu_kernel
  cpu_kernel_vec

  gpu_kernel
  gpu_kernel_with_scalars

These work with functions with any number of input arugments, with the
exception of 'gpu_kernel_with_scalars' which is limited to binary
operations. Previously, we only supported functions of 0, 1, or 2 input
arguments. Adding support for 3 or 4 input argument functions required
significant amount of additional code.

This makes a few other changes:

Remove 'ntensors' from the for_each/serial_for_each loop. Most loops
assume a fixed number of tensors, and the value is accessible from
TensorIterator::ntensors()

Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'.
Previously, we performed this recursively in gpu_unary_kernel and
gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)`
would specialize to a "nullary" kernel. Now, only the first
scalar input is lifted to a kernel parameter. Any additional scalar
inputs are copied to CUDA tensors. Note that operations like `x + 5`
and `5 + x` still work efficiently. This avoids generating an exponential
number of specializations in the number of input arguments.
```

**Performance measurements**
Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3):
[miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4)

**Generated assembly**
cpu_kernel and cpu_kernel_vec still generate good vectorized code with
both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of
torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/
binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that
auto vectorization still works well):

[torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331)
[torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16)
[torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628)
[torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475

Differential Revision: D15745116

Pulled By: colesbury

fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91
2019-06-17 19:08:33 -07:00
7f057f00cc Update mkldnn-bridge to fix MKLDNN grouped conv issue (#21854)
Summary:
1. Fix grouped conv issue in https://github.com/pytorch/pytorch/issues/21597
2. Fix build error in 731670f40a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21854

Test Plan: buck run experimental/jbai/pt_issue_21597_mkldnn_conv_2d_repro:run

Reviewed By: yinghai

Differential Revision: D15861105

Pulled By: bddppq

fbshipit-source-id: fe3e2943a15aab4294f8e6bb15db15829a94420f
2019-06-17 18:21:26 -07:00
5237835a17 Make script::Method a value type (#21675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21675
ghimport-source-id: 90ee7ba00e58b0151ca4c17e91fd17303c9d5d08

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15777725

Pulled By: zdevito

fbshipit-source-id: 8482cd2e1dcd7dd77a9cacbb76743bd190c7c4cf
2019-06-17 18:14:50 -07:00
cc4498a54a Always enable P2P access for GPU copies (#21872)
Summary:
PR https://github.com/pytorch/pytorch/issues/20685 incorrectly only enabled P2P access for non-contiguous copies.
This can make cudaMemcpy slow for inter-gpu copies, especially on ROCm
devices.  I didn't notice a difference on CUDA 10, but ngimel says it's
important for CUDA too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21872

Differential Revision: D15863965

Pulled By: colesbury

fbshipit-source-id: 0a858f3c338fa2a5d05949d7f65fc05a70a9dfe1
2019-06-17 17:48:28 -07:00
76a250d590 Add new regression loss function type to FBLearner (#21080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21080

Add Huber loss as a new option for regression training (refer to TensorFlow implementation: https://fburl.com/9va71wwo)
  # huber loss
  def huber(true, pred, delta):
    error = abs(true-pred)
    loss = 0.5 * min(error, delta)^2 + delta * max(error - delta, 0)
    return mean(loss)

As a combination of MSE loss (`x < delta`) and MAE loss (`x >= delta`), the advantage of Huber loss is to reduce the training dependence on outlier.

One thing worth to note is that Huber loss is not 2nd differential at `x = delta`. To further address this problem, one could consider adopt the loss of `LOG(cosh(x))`.

Reviewed By: chintak

Differential Revision: D15524377

fbshipit-source-id: 73acbe2728ce160c075f9acc65a1c21e3eb64e84
2019-06-17 17:43:00 -07:00
8aeb4ef4bf Add python string standard lib (#21807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21807
ghimport-source-id: dcb2c78b8facb90a323ab9212b7703e553354273

Test Plan: Imported from OSS

Differential Revision: D15835509

Pulled By: bwasti

fbshipit-source-id: bc8bc5ae5a4fb4a1581aa94485973ed87af4eaaf
2019-06-17 15:48:36 -07:00
d329dffd92 improve error message on recursive class defs (#21842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21842
ghimport-source-id: 33569714b18fc476c4e6b3bc976b53b1f107273d

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857568

Pulled By: suo

fbshipit-source-id: 6307597b9741cfdccd5c55216ebdc7c4391a5e23
2019-06-17 15:23:21 -07:00
cdae8b93a7 improve recursive scripting error message (#21841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21841
ghimport-source-id: fbca813d12ca4bfad7967e12c8dafe5eaba77cab

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857569

Pulled By: suo

fbshipit-source-id: 152eba10565cf7119508079e98512f116eb3a5a8
2019-06-17 15:23:17 -07:00
c0420d9618 Attempt to fix TRT build after library merge (#21775)
Summary:
After fixing https://github.com/pytorch/pytorch/issues/20774 the TRT build was broken

Because of missing annotations, pybind_state_gpu.so was missing symbols, but pybind_state.so did not. It caused a weird combination when trying to import pybind_state_gpu first left system in semi-initialized state and lead to sigsev.

Minimal repro:
```
>>> import ctypes

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so: undefined symbol: _ZN6caffe219TensorRTTransformer9TransformEPNS_9WorkspaceEPNS_6NetDefERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_11TensorShapeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SC_EEE

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so')
Segmentation fault (core dumped)
```

Too lazy to repro locally, let's see if CI passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21775

Differential Revision: D15829605

Pulled By: dzhulgakov

fbshipit-source-id: 1adb2bde56b0cd68f84cfca67bc050adcf787cd9
2019-06-17 14:16:45 -07:00
0408697317 Followup cleanup in cmake.py and add a comment in setup.py (#21792)
Summary:
Following up b811b6d5c03596d789a33d7891b606842e01f7d2

* Use property instead of __setattr__ in CMake.
* Add a comment clarifying when built_ext.run is called.

 ---

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792

Differential Revision: D15860606

Pulled By: umanwizard

fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0
2019-06-17 13:46:25 -07:00
7279e07c8b Don't use anonymous namespace in header. (#21790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21790
ghimport-source-id: ff648a1e9a1b8627f0742307e2e7810d6445d597

Test Plan: Imported from OSS

Differential Revision: D15827311

Pulled By: ezyang

fbshipit-source-id: 996bfd3a93fcda5934dcc523adae0648cba1c4fa
2019-06-17 13:26:02 -07:00
1aa16d356e named inference rule for tensor.select (#21752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21752
ghimport-source-id: 95e17087b8c29c9bd88003ae225cb7329d0b67e6

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

gh-metadata: pytorch pytorch 21752 gh/zou3519/50/head

Imported from OSS

Differential Revision: D15833453

Pulled By: zou3519

fbshipit-source-id: 7b51e4137e54712aa9c6274a9e6bb48ab7191b8d
2019-06-17 13:12:49 -07:00
b403b10ff9 Fix #11752: fix numerical issue in log_softmax (#21672)
Summary:
https://github.com/pytorch/pytorch/issues/11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in https://github.com/pytorch/pytorch/issues/11752. `log_softmax` is still not working for big logits.

I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`.

This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21672

Differential Revision: D15856327

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
2019-06-17 12:59:08 -07:00
0f675f9cbc Port im2col and vol2col (#21769)
Summary:
resolves partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21769

Differential Revision: D15854530

Pulled By: ezyang

fbshipit-source-id: 574853c068010d1b7588047d2ab7450077471447
2019-06-17 10:06:26 -07:00
2b23fac8da Disallow creation of tensors with duplicate names (#21781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21781
ghimport-source-id: d77e0c97fe0104b4b29571fd5828967399d34fb1

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 21781 gh/zou3519/51/head

Imported from OSS

Differential Revision: D15833454

Pulled By: zou3519

fbshipit-source-id: fca4de83fba4bced615ec3cbd4ce4c441ddfcaf2
2019-06-17 09:59:50 -07:00
44707dd3ca Rename Dimname::name to Dimname::full_name (#21803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21803
ghimport-source-id: e0bc5a746e745e18f19215c6551d79cb0cd5f9c5

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D15833452

Pulled By: zou3519

fbshipit-source-id: 7aa4d78ff436bd6a622a5ea235b75135d9798d33
2019-06-17 08:32:32 -07:00
7c1528bab6 Copy NamedTensorMeta in TensorImpl::copy_tensor_data() (#21735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21735
ghimport-source-id: 4a4289693e372880e3d36e579c83d9e8745e70ed

Test Plan:
- I'm not sure how to test this other than making sure it compiles.
- [namedtensor ci]

gh-metadata: pytorch pytorch 21735 gh/zou3519/49/head

Imported from OSS

Differential Revision: D15833456

Pulled By: zou3519

fbshipit-source-id: ea2fa6d5c5f1eb2d7970d47189d6e4fcd947146d
2019-06-17 08:32:28 -07:00
da4e60226c Keep Reducer hooks in a vector instead of an unordered_map (#21783)
Summary:
kuttas pointed out that the DDP Reducer only needs to remember `uintptr, Function` pairs, and hence does not need a nunordered map as added by https://github.com/pytorch/pytorch/issues/21591. Using a vector should speed it up a bit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21783

Differential Revision: D15854312

Pulled By: mrshenli

fbshipit-source-id: 153ba035b8d658c7878a613f16a42de977d89c43
2019-06-17 08:24:19 -07:00
76713fb564 Fix remote build + clean up disable feature hack (#21816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21816

Clean up disable feature hack.

Reviewed By: bddppq

Differential Revision: D15833285

fbshipit-source-id: a2ae5d0f15e47b835dbd3997bbaa0add7e868f20
2019-06-17 08:08:34 -07:00
4a6aa1d806 Populate producer_info.json in any PyTorch model at FB (#21662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21662

Use hook added in https://github.com/pytorch/pytorch/pull/20863 to auto-write a file with environment information (including user, machine, Flow, etc).

Reviewed By: natalialunova

Differential Revision: D15690185

fbshipit-source-id: ccaaeda9562db32925041d18f394fb98fab8db99
2019-06-16 20:12:23 -07:00
c9ba3f699d Bag of documentation fixes (#21846)
Summary:
Thanks henon for raising the issues.

Fixes https://github.com/pytorch/pytorch/issues/21830
Fixes https://github.com/pytorch/pytorch/issues/21831
Fixes https://github.com/pytorch/pytorch/issues/21832
Fixes https://github.com/pytorch/pytorch/issues/21827
Fixes https://github.com/pytorch/pytorch/issues/21822
Fixes https://github.com/pytorch/pytorch/issues/21820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21846

Differential Revision: D15847389

Pulled By: soumith

fbshipit-source-id: 421cc48af646a2618af731697de7d4de83d3eabe
2019-06-16 19:35:27 -07:00
972ec676b2 Remove lowered execution (#21674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21674
ghimport-source-id: b8e27f0ce9b8b362daf73556ee67457fb5355062

Reviewed By: eellison

Differential Revision: D15777726

Pulled By: zdevito

fbshipit-source-id: 718ac676c9a1bcf99b856862fd29631d825645da
2019-06-16 14:29:18 -07:00
ff1172d705 high pri Jit builtins (#21451)
Summary:
bin/hex/oct/round/chr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21451

Differential Revision: D15702863

Pulled By: ailzhang

fbshipit-source-id: 9f69896b79e7584f12353e9f2ee2969dbe1ec6d6
2019-06-16 09:48:38 -07:00
4f75da3b41 change ClassType::compilation_unit to return owning ptr (#21787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21787
ghimport-source-id: eed7b98b0f02745066164b8ef3906291931e2ecb

Test Plan: Imported from OSS

Differential Revision: D15831353

Pulled By: suo

fbshipit-source-id: 50695c35dba8ffea710cbc9aca8aba6a75512fa0
2019-06-16 02:37:07 -07:00
263b1985a8 Revert D15833924: [jit] Fix stdout capturing, remove some expect files
Differential Revision:
D15833924

Original commit changeset: 152972b4c240

fbshipit-source-id: 1d5a2258bc134fdc7bd2cb557bcc05f2289443b6
2019-06-15 20:39:11 -07:00
04f09d4235 Move unwrap logic from c10 to caffe2 (#21620)
Summary:
After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path.

Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620

Differential Revision: D15763560

Pulled By: yf225

fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
2019-06-14 22:02:43 -07:00
794ee6d00c Switch to out-source builds for LibTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21772

Differential Revision: D15839332

Pulled By: yf225

fbshipit-source-id: 017cf61c5682c6a8ffeaf2ca952e1418c27be30e
2019-06-14 21:00:18 -07:00
4a2fc00db0 Revert D15830704: [jit] Add Python string standard lib
Differential Revision:
D15830704

Original commit changeset: e55a8c6bf910

fbshipit-source-id: 1ec953bfaabab0288e953f48cde0a32370ac3fc6
2019-06-14 20:52:58 -07:00
97b92eede1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 979417253fab9142059bdfb6e834f44bb1cc8d0d
2019-06-14 17:41:30 -07:00
220efdbdc4 Refactor pybind_utils.h (#21550)
Summary:
This refactors pybind_utils so we can have all our type-inferring stuff in
1 place (e.g. for #21379)

There is some follow up work to make the error messages better, but I think that's fine to save for another PR.
](https://our.intern.facebook.com/intern/diff/15727002/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21550

Pulled By: driazati

Differential Revision: D15727002

fbshipit-source-id: a6974f2e1e5879f0503a18efc138da31cda7afa2
2019-06-14 17:27:45 -07:00
a85305fdea Hook up profiled execution in the interpreter (#21799)
Summary:
Rebasing https://github.com/pytorch/pytorch/pull/21616 onto master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21799

Differential Revision: D15832854

Pulled By: Krovatkin

fbshipit-source-id: 88d754446df2abc25ea86e46764848d48ee3a5fc
2019-06-14 16:56:13 -07:00
4bcc72fe95 Support for NamedTuple (#21428)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/18

This implements NamedTuple by taking advantage of the existing `names` field in `TupleType`.

TODO: This currently doesn't retain the NamedTuple-ness through serialization. Discussed with suo offline, we can probably make a way to define an anonymous NamedTuple in script (e.g. `NamedTuple('Foo', [('a', int), ('b', float), ('c', List[float])])` and serialize that
TODO: implement support for calling the constructor with kwargs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21428

Differential Revision: D15741564

Pulled By: jamesr66a

fbshipit-source-id: c077cbcea1880675ca6deb340a9ec78f824a136c
2019-06-14 16:45:56 -07:00
ac8d1a1f76 fix some issues found by enabling -Wshorten-64-to-32 (#18187)
Summary:
when enabling this flag, there were a lot of warnings, this pr focuses on the warnings where this comparison could be affecting array indices, which could be ones most prone to fail

the good news is that I didn't find anything obviously concerning

one degenerate case could be when the matrices we work with are too skinny could run into issues (dim1=1, dim2 needs to hold a big number)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18187

Differential Revision: D14527182

Pulled By: hyuen

fbshipit-source-id: b9f46b6f68ab912c55368961758a7a5af1805555
2019-06-14 16:29:32 -07:00
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
d0021b3ac7 Fix stdout capturing, remove some expect files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21805

Pulled By: driazati

Differential Revision: D15833924

fbshipit-source-id: 152972b4c24041b8a459d5b8ef8789543a6b8153
2019-06-14 16:05:06 -07:00
07fea3f5b6 Add new get_batch() method to ChunkDataset API (#21797)
Summary:
We plan on generating python bindings for C++ ChunkDataset API using the current Pytorch Dataloader class, which must call get_batch() instead of get_batch(size)

This changes doesnt break the current API, just add one more method that will make future extensions easier (WIP)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21797

Differential Revision: D15830522

Pulled By: soumith

fbshipit-source-id: 7208f305b48bf65d2783eaff43ff57a05e62c255
2019-06-14 13:39:54 -07:00
dddc65db9e Add Python string standard lib (#21059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21059
ghimport-source-id: f813585cde1b275c134b19009a2f5c0b3d70fc6e

Reviewed By: jamesr66a

Differential Revision: D15830704

Pulled By: bwasti

fbshipit-source-id: e55a8c6bf910a163b9a5260235e315af9532b129
2019-06-14 13:34:42 -07:00
65a3dbdfb0 Remove hip device sync in miopen Conv implementation (#21791)
Summary:
xw285cornell bddppq
Note there are other optimizations coming.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21791

Differential Revision: D15829238

Pulled By: bddppq

fbshipit-source-id: 66c62c646f315d65b4e432ca20890faded843db4
2019-06-14 12:32:50 -07:00
1fc240e59a add tests for add_custom_scalars and others (#20987)
Summary:
Originally, the tests for tensorboard writer are smoke tests only. This PR lets CI compare the output with expected results at low level. The randomness of the tensors in the test are also removed.
ps. I found that how protobuf serializes data differs between different python environment. One method to solve this is to write the data and then read it back instantly. (compare the data at a higher level)

For `add_custom_scalars`, the data to be written is a dictionary. and the serialized result might be different (not `ordereddict`). So only smoke test for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20987

Reviewed By: NarineK, lanpa

Differential Revision: D15804871

Pulled By: orionr

fbshipit-source-id: 69324c11ff823b19960d50def73adff36eb4a2ac
2019-06-14 12:27:07 -07:00
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
710821875a Fix flaky nuclear_norm() test (#21638)
Summary:
Try to fix a sporadic failure on some CIs.

I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638

Differential Revision: D15827779

Pulled By: ezyang

fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02
2019-06-14 11:40:03 -07:00
zaf
ff8c3fd54e Adding the quantized namespace to torch.nn and importing it from torch (#21600)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21600 Adding the quantized namespace to torch**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15742149/)

Add nn.quantized name space to torch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21600

Differential Revision: D15742149

Pulled By: zafartahirov

fbshipit-source-id: 60dede12c81861f369d208b06f5b68e9384312f6
2019-06-14 11:05:45 -07:00
9a1dc43f34 Deprecate unordered_map and vector in IValues (#21712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21712

Warn when people use unordered_map or vector with IValues. These APIs are deprecated.
The unordered_map API is slow because it requires copying the whole map.
The vector API is slow for some types (e.g. std::string) because for them it also requires copying the whole map.
Also, the vector API would get slow for all types if we decide to switch to SmallVector.

Differential Revision: D15792428

fbshipit-source-id: 1b72406b3a8d56521c862858c9f0ed01e56f2757
2019-06-14 11:05:41 -07:00
029a968212 Define __setstate__ on _ConvNd to handle pre-padding_mode pickles. (#21687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21687
ghimport-source-id: df49530d25239ac4d62eae83c5d7b0d8f00f836a

Differential Revision: D15807402

Pulled By: ezyang

fbshipit-source-id: f51b221444afc4e017db7544642a9c0a7d2a3efb
2019-06-14 11:00:21 -07:00
7284f448ba Fix handling of kwargs from common method invocations (#21499)
Summary:
When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't.

Also fixed some flake8 warnings I was seeing locally.

I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499

Differential Revision: D15826974

Pulled By: nairbv

fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36
2019-06-14 10:47:33 -07:00
c1744a6c39 Add ONNX py3 CI cases (#21715)
Summary:
So far, we only have py2 ci for onnx. I think py3 support is important. And we have the plan to add onnxruntime backend tests, which only supports py3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21715

Reviewed By: bddppq

Differential Revision: D15796885

Pulled By: houseroad

fbshipit-source-id: 8554dbb75d13c57b67ca054446a13a016983326c
2019-06-14 10:20:14 -07:00
c06ccbe663 Add aten mkldnn zero_ operator (#20573)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💚
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20573

Differential Revision: D15820477

Pulled By: bddppq

fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68
2019-06-14 09:48:49 -07:00
bc6281028c rebuild_storage_fd retry on EINTR (#21723)
Summary:
Some data loader tests are flaky on py 2 with the following error
```
Jun 12 22:17:31 Traceback (most recent call last):
Jun 12 22:17:31   File "test_dataloader.py", line 798, in test_iterable_dataset
Jun 12 22:17:31     fetched = sorted([d.item() for d in dataloader_iter])
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__
Jun 12 22:17:31     idx, data = self._get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data
Jun 12 22:17:31     success, data = self._try_get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data
Jun 12 22:17:31     data = self.data_queue.get(timeout=timeout)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Jun 12 22:17:31     res = self._recv()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Jun 12 22:17:31     return pickle.loads(buf)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Jun 12 22:17:31     return Unpickler(file).load()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Jun 12 22:17:31     dispatch[key](self)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Jun 12 22:17:31     value = func(*args)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Jun 12 22:17:31     fd = multiprocessing.reduction.rebuild_handle(df)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
Jun 12 22:17:31     new_handle = recv_handle(conn)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
Jun 12 22:17:31     return _multiprocessing.recvfd(conn.fileno())
Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call
```

Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174
So we should call it with an outer try-catch loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723

Differential Revision: D15806247

Pulled By: ezyang

fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7
2019-06-14 09:10:00 -07:00
deb2140c6e Throwing errors for min and max reductions in empty CUDA tensors (#19612)
Summary:
Related to https://github.com/pytorch/pytorch/issues/17750.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19612

Differential Revision: D15813649

Pulled By: gchanan

fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35
2019-06-14 08:34:30 -07:00
b811b6d5c0 When building extensions, honor options set in CMake. (#21653)
Summary:
Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because:

1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail.
2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example,

```bash
# First build
USE_CUDA=0 python setup.py install
# Subsequent builds like this would fail, unless "build/" is deleted
python setup.py install
```

This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions.

 ---

The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653

Differential Revision: D15824506

Pulled By: ezyang

fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028
2019-06-14 08:13:40 -07:00
4001e71547 When converting to NumPy, throw TypeError when type is not supported (#21608)
Summary:
This makes the error thrown in aten_to_numpy_dtype consistent with that in numpy_dtype_to_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21608

Differential Revision: D15816035

Pulled By: gchanan

fbshipit-source-id: 392e8b9ea37003a859e7ed459911a1700fcbd695
2019-06-14 07:35:03 -07:00
2d5ce519f2 Fix with emit_nvtx, also allow shape information to appear in nvtx ranges. (#21691)
Summary:
This PR is intended as a fix for https://github.com/pytorch/pytorch/issues/21644.

It allows the `with emit_nvtx` context manager to take an additional `record_shapes` argument. `record_shapes` is False by default, but if True, the nvtx ranges generated for each autograd op will append additional information about the sizes of Tensors received by that op.

The format of shape information is equivalent to what the CPU-side profiler spits out.  For example,
```
M = torch.randn(2, 3)
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)

with torch.cuda.profiler.profile():
    with torch.autograd.profiler.emit_nvtx(record_shapes=True):
        torch.addmm(M, mat1, mat2)
```
produces the following nvtx range label for addmm:
![Screenshot from 2019-06-12 10-48-01](https://user-images.githubusercontent.com/7799218/59374008-b7d13100-8cff-11e9-9245-58410073d965.png)
(cf the "Input Shapes" shown in 864cfbc216 (diff-115b6d48fa8c0ff33fa94b8fce8877b6))

I also took the opportunity to do some minor docstring cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21691

Differential Revision: D15816226

Pulled By: gchanan

fbshipit-source-id: b2b01ea10fea61a6409a32b41e85b6c8b4851bed
2019-06-14 07:35:00 -07:00
b9675efb5a Fix the issue of sizes vs size for tensor creation ops (#21686)
Summary:
Related to [pytorch#20921](https://github.com/pytorch/pytorch/issues/20921)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21686

Differential Revision: D15816109

Pulled By: gchanan

fbshipit-source-id: 4428b8e77b6c8b297ddb77e58fc1cb916c9cc46e
2019-06-14 07:34:56 -07:00
1e7bd7586d Query caffe2 operator stats for detailed execution info (#20924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20924

I found a python3 bug for deserializing caffe2 code. The exception thrown is Unicode related error instead of just decode error, and we need to catch that as well

Reviewed By: ipiszy

Differential Revision: D15293221

fbshipit-source-id: 29820800d1b4cbe5bf3f5a189fe2023e655d0508
2019-06-13 23:41:04 -07:00
d9eec4ef0d backend.py: _getattr__ must raise AttributeError (#21763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21763

Custom __getattr__ functions can only raise AttributeError. This code throwed NotImplementedError which caused upstream troubles when hasattr() was called.

Differential Revision: D15815176

fbshipit-source-id: 0982e2382de4578d3fc05c5d2a63f624d6b4765e
2019-06-13 23:17:57 -07:00
044809f1f3 Handling numel() == 0 in convTranspose (#21652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21652

Diff fixes issue of empty ROIs for convTranspose

Issue StackTrace: P65374505

Reviewed By: jerryzh168

Differential Revision: D15766739

fbshipit-source-id: 39cf8feca66b6aae22ff4ec5c1b6a4e3f20f378d
2019-06-13 23:02:26 -07:00
5c0e058950 Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647
ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42

Differential Revision: D15804425

Pulled By: zou3519

fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d
2019-06-13 20:38:19 -07:00
3e79036382 Make it possible to trigger all tests by pushing to ci-all/ branch (#21750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21750
ghimport-source-id: 4792aa5ccab7e4b54c21f23d0b78802f85bbeb8d

Differential Revision: D15819367

Pulled By: ezyang

fbshipit-source-id: db91ee727c66469ac78e59b3662f29db53a916bc
2019-06-13 19:53:35 -07:00
16b4a12ed8 better example for local weights (#21685)
Summary:
fixes https://github.com/pytorch/hub/issues/29
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21685

Differential Revision: D15817774

Pulled By: ailzhang

fbshipit-source-id: d2f615e5d431186d45a21d8300fb9ba3c37b246c
2019-06-13 17:56:25 -07:00
adc99efb46 Add batch id to tracer event (#21446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446

this is used for easier tracing of iter id when looking at trace diagram

Reviewed By: ilia-cher

Differential Revision: D15628950

fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d
2019-06-13 17:13:42 -07:00
fbecb4621f schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15808354

Pulled By: ZolotukhinM

fbshipit-source-id: 16d938fd5acafb445a0c433cabc9a55cab563165
2019-06-13 17:04:38 -07:00
cfd8c58b45 Tune elementwise ops for ROCm (#21754)
Summary:
```
The stride calculation using OffsetCalculator performs poorly with
MAX_DIMS=25. This reduces MAX_DIMS (after coalescing) to 16 on ROCm.
I think it's unlikely that anyone will exceed this limit. If they do,
we can add additional specializations for ROCm with more dimensions.
```

I'm not sure about the underlying cause. With MAX_DIM=25, the add kernel's params
is ~648 bytes vs. ~424 bytes with MAX_DIM=16. The kernel instruction footprint is
bigger too, but most of these instructions are never executed and most kernel parameters
are never loaded because the typical dimensionality is much smaller.

Mini benchmark here:
https://gist.github.com/colesbury/1e917ae6a0ca9d24712121b92fed4c8f

(broadcasting operations are much faster)

cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21754

Reviewed By: bddppq

Differential Revision: D15811906

Pulled By: colesbury

fbshipit-source-id: 063f92c083d26e2ef2edc98df7ff0400f9432b9d
2019-06-13 16:25:26 -07:00
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
efd20de276 fix multihead attention for half (#21658)
Summary:
Currently multihead attention for half type is broken
```
  File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward
    attn_output = torch.bmm(attn_output_weights, v)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2'
```
because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658

Differential Revision: D15807487

Pulled By: zhangguanheng66

fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d
2019-06-13 15:17:04 -07:00
4716409f30 Use expect to fill in pytorchbot token (#20459)
Summary:
In this PR, we use `expect` to fill in the token for pytorchbot when doing `git push`, so that we don't need to save the token in the git remote URL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20459

Differential Revision: D15811676

Pulled By: yf225

fbshipit-source-id: cd3b780da05d202305f76878e55c3435590f15a8
2019-06-13 14:56:38 -07:00
b858f42e16 Document that no_grad is thread local. (#21755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21755
ghimport-source-id: dfb53759024d9ba9d104fdb2a8151ab996e55234

Differential Revision: D15811172

Pulled By: ezyang

fbshipit-source-id: c8c7c1c15277d8fe8cc513e20af449257d7ff15c
2019-06-13 13:47:09 -07:00
3e8dc565bd Bug fix: ONNX export full operator (#21669)
Summary:
Fix an obvious bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21669

Reviewed By: zrphercule

Differential Revision: D15806614

Pulled By: houseroad

fbshipit-source-id: d0f6e934252e0057f3dbcc7f160236ee6f4497ac
2019-06-13 13:20:21 -07:00
4b45f08f87 Added dim check for index_copy_ (#21617)
Summary:
Fixing reported [bug](https://github.com/pytorch/pytorch/issues/20322)
The issue was related to not checking the dimensions of source vs destination tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21617

Differential Revision: D15749963

Pulled By: izdeby

fbshipit-source-id: acff114c729fd9c0a9a51325e0ebd8b42e1f2fc1
2019-06-13 13:15:23 -07:00
aa6887e6ef add error message to missing function backend (#21742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21742

Add error message to NotImplementedError so we know which function it is about.

Reviewed By: bddppq

Differential Revision: D15806379

fbshipit-source-id: 14eab9d03aa5b44ab95c5caeadc0e01d51f22188
2019-06-13 13:10:48 -07:00
756a20de93 Add/edit docs for nn.transformer (#21746)
Summary:
Add docs for TransformerEncoder and TransformerDecoder, plus minor edits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21746

Differential Revision: D15807498

Pulled By: zhangguanheng66

fbshipit-source-id: 388efb5821c4c3d25865cecea70902e9b2bf5d15
2019-06-13 12:27:26 -07:00
7c7d5be033 Skip failing test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21751

Pulled By: driazati

Differential Revision: D15809091

fbshipit-source-id: 3cc96e632a7b89b4d86d68d2a76021d971447e12
2019-06-13 12:21:56 -07:00
51ee048709 improve torch.load & torch.save doc formatting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21747

Differential Revision: D15808189

Pulled By: ezyang

fbshipit-source-id: 5413eaaa901be098c6bad135f702ba103bc79d6c
2019-06-13 12:13:04 -07:00
63a7c7bb2a Add event and event_counter columns to caffe2_usage_tracer table (#21739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21739

Added event and event_counter columns for PyTorch/Caffe2 API usage metrics

Reviewed By: dzhulgakov

Differential Revision: D15119119

fbshipit-source-id: a71010bd659109a8e4f3a8bad84b22c1d15dc528
2019-06-13 12:06:02 -07:00
f87d5cc191 Fix first reshape in pixel_shuffle conversion (#21486)
Summary:
When converting pixel_shuffle to reshape + transpose + reshape, the first reshape should
be:
[N, C * r^2, H, W] => [N, C, r, r, H, W]
in order to match pytorch's implementation (see ATen PixelShuffle.cpp).

This previously wasn't caught by the test case, since it uses C = r = 4. Updated test case to
have C = 2, r = 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21486

Reviewed By: houseroad

Differential Revision: D15700945

Pulled By: houseroad

fbshipit-source-id: 47019691fdc20e152e867c7f6fd57da104a12948
2019-06-13 11:44:54 -07:00
fc3f702ba8 at::launch benchmark (#21581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21581
ghimport-source-id: 6a65d73694b17611d6ad45db0b39b86c318a68c7

Differential Revision: D15736495

Pulled By: ilia-cher

fbshipit-source-id: 6b9109ad3611ff3c8b1a37796e9149bef0c2ad36
2019-06-13 10:46:35 -07:00
eca42a5122 Fix failing test for Final annotations (#21725)
Summary:
](https://our.intern.facebook.com/intern/diff/15800009/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21725

Pulled By: driazati

Differential Revision: D15800009

fbshipit-source-id: 5409c213161e3f2031710933897b85872aad2a83
2019-06-13 10:41:44 -07:00
5485f09f18 Native TBB parallel backend (#20480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20480
ghimport-source-id: c710f897c4c9b9616fc3dd76d80b4845aea43a1f

Differential Revision: D15333692

Pulled By: ilia-cher

fbshipit-source-id: 61e476dd5c737fe144e3aec000d8ebb11fbc0547
2019-06-13 10:11:16 -07:00
ab0d5ab99d Port avg_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21635

Differential Revision: D15768487

Pulled By: ezyang

fbshipit-source-id: 85e1d883aded0f4d3ac5100719df335f5a337fc5
2019-06-13 10:03:58 -07:00
5a7e2ccc0b Add use_rocm flag to detect AMD build in the runtime (#21718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21718

adding a detection method on whether the package is built for AMD.

Reviewed By: bddppq

Differential Revision: D15795893

fbshipit-source-id: 91a21ee76b2273b1032507bdebe57e016717181d
2019-06-13 09:30:49 -07:00
556af7c19d ROCm 2.5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21724

Differential Revision: D15799149

Pulled By: bddppq

fbshipit-source-id: c72689e73470f2ca145556a2ac8cb34e36e341ef
2019-06-13 01:32:46 -07:00
42770e1370 Improving Categorical Distribution Docs' (#16291) (#21707)
Summary:
**Closes:** Confusing documentation with distributions.Categorical about logits https://github.com/pytorch/pytorch/issues/16291

**Solution**: Changes documentation on the Categorical distribution from `log probabilities` to `event log-odds`. This makes should reduce confusion as raised by this issue, and is consistent with other distributions such as `torch.Binomial`.

More than happy to make any other changes if they fit :).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21707

Differential Revision: D15799181

Pulled By: soumith

fbshipit-source-id: f11acca7a5c130102a3ff6674640235ee5aa69bf
2019-06-12 23:54:02 -07:00
a3db2844e1 Support tuples in ScriptModule inputs/outputs (#20784)
Summary:
- [x] Add tests after https://github.com/pytorch/pytorch/pull/20256 is merged

- Support exporting ScriptModule with inputs/outputs of arbitrarily constructed tuples.

- Moved the assigning of output shapes to after graph conversion to ONNX is completed. By then all tuples in the IR has already been lowered by the pass ```_jit_pass_lower_all_tuples```. If assigning output shapes is required to happen before that, we'll need to hand parse the tuple structures in the graph, and repeat the same logic in ```_jit_pass_lower_all_tuples```. Handling inputs is easier because all tuple information is encoded within the input tensor type.

- Swap the order of ```_jit_pass_lower_all_tuples``` and ```_jit_pass_erase_number_types```. Ops like ```prim::TupleIndex``` relies on index being a scalar. ```_jit_pass_erase_number_types``` will convert these kind of scalars to tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20784

Reviewed By: zrphercule

Differential Revision: D15484171

Pulled By: houseroad

fbshipit-source-id: 4767a84038244c929f5662758047af6cb92228d3
2019-06-12 23:37:28 -07:00
4c03ac7ac4 Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689)
Summary:
…ngular_solve

Changelog:
- Iterate over mini batches of 65535 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689

Differential Revision: D15800254

Pulled By: soumith

fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a
2019-06-12 23:30:19 -07:00
b599bb3836 Add mkldnn mul operator (#20575)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💛
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20575

Differential Revision: D15799529

Pulled By: bddppq

fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486
2019-06-12 22:41:51 -07:00
d3b3cbe26e Revert D15769066: [pytorch][PR] schema_matching.cpp: improve error messages.
Differential Revision:
D15769066

Original commit changeset: 5853e0360581

fbshipit-source-id: ac6fa8429136abf4c7835919009f936eea11ea7b
2019-06-12 20:17:38 -07:00
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
e9121e27ce remove liveness tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21719

Differential Revision: D15797628

Pulled By: Krovatkin

fbshipit-source-id: 87742bdde0b05aff4341ababb1f55c51991768ec
2019-06-12 19:04:41 -07:00
f5c00345b3 Profiling Programs section in README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21695

Differential Revision: D15795716

Pulled By: Krovatkin

fbshipit-source-id: e14a44210ea4312a247157a6681fce449e40f779
2019-06-12 17:52:05 -07:00
8dd670657b Liveness for BailOut graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21615

Differential Revision: D15793434

Pulled By: Krovatkin

fbshipit-source-id: d89f1bf61ea57a1e3b75f8e2b200c27beb8b46cf
2019-06-12 17:22:33 -07:00
8c57ce87b0 make tests pass with enable_first_class_module() enabled. (#21565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21565
ghimport-source-id: d1fe735fb7821eadc59116fb921d8fe39a49f818

Reviewed By: driazati

Differential Revision: D15729503

Pulled By: zdevito

fbshipit-source-id: fabb678f040d21fae7545e3b2be1d098e24c544e
2019-06-12 17:13:00 -07:00
d8056cb832 Update quantization to work with first-class modules. (#21660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21660
ghimport-source-id: f9a11b2748f49042ee636755358d79c547aa249e

Reviewed By: suo

Differential Revision: D15770237

Pulled By: zdevito

fbshipit-source-id: 41fa8577028eef247bc545635cd93192a0b19db4
2019-06-12 17:12:57 -07:00
56f4602630 Add WeakIValue, use in tracer. (#21515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21515
ghimport-source-id: 7898a68791db2b5050164ab01d6ca6991e05746d

Reviewed By: suo

Differential Revision: D15719981

Pulled By: zdevito

fbshipit-source-id: 42cf26cf6541bcdf95f1343da3b9228fe2c229da
2019-06-12 17:12:53 -07:00
0293cf5bb6 Add Final[T] annotated members to __constants__ (#21603)
Summary:
Class member annotations can be marked with `Final[T]` instead of adding them to `__constants__`. `Final` comes from the `typing_extensions` module (which will be used if it is present). If not, the polyfill from `_jit_internal` is exposed as `torch.jit.Final` for users that don't want to install `typing_extensions`.

This keeps around `__constants__` since a lot of code is still using it, but in documentation follow ups we should change the examples to all to use `Final`.

TODO: install typing_extensions on CI, move tests to a Python3 only file when #21489 lands
](https://our.intern.facebook.com/intern/diff/15746274/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21603

Pulled By: driazati

Differential Revision: D15746274

fbshipit-source-id: d2c9b5643b4abba069b130c26fd42714c906ffac
2019-06-12 16:40:40 -07:00
0481a7710d Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Differential Revision: D15790937

Pulled By: driazati

fbshipit-source-id: 0cc204f7209a79839d330663cc6ba8320d3a4120
2019-06-12 15:51:46 -07:00
699de487db numerical integration "trapz" function. (#21610)
Summary:
This is intended to match [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html): numerical integration based on the trapezoid rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21610

Differential Revision: D15747618

Pulled By: umanwizard

fbshipit-source-id: 8eadb2e75c9877b07592d875ca0b2cca6cb72297
2019-06-12 15:30:13 -07:00
b527e48588 Use c10::List (#21177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177

- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now

Differential Revision: D15476433

fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
2019-06-12 13:58:24 -07:00
ae342fd076 Refactor Random Number Generators in ATen (#21364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21364
ghimport-source-id: ca7d37e10190ba46dc8512f437404ca9216d3369

Differential Revision: D15696497

Pulled By: ezyang

fbshipit-source-id: 2e713b8566ae915e175b5a79ac1dd9b86cc2a23d
2019-06-12 13:01:30 -07:00
96910251e0 schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15769066

Pulled By: ZolotukhinM

fbshipit-source-id: 5853e0360581c44e42b068add3bf2bc68e671b2b
2019-06-12 12:43:12 -07:00
28adca82ea Add some named tensor helper functions (#21636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21636
ghimport-source-id: 5eff5744cd3c80f75bdb02576be1407a64e0434d

Differential Revision: D15780269

Pulled By: zou3519

fbshipit-source-id: 87ff40ffbe0ebd5fc4d105709c9f6f8dda5f9952
2019-06-12 12:34:44 -07:00
20b0acf057 Add some more namedtensor builds to the CI (#21632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21632
ghimport-source-id: 6a8da97ce153c6d279017af920edd0d20765c32c

Differential Revision: D15760331

Pulled By: zou3519

fbshipit-source-id: b2f4c65df5f6f9322d47da995c76851387e5df47
2019-06-12 12:34:40 -07:00
3e6eb3dcab Add virtual dtor to NamedTensorMetaInterface (#21633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21633
ghimport-source-id: 6cdf0b1559e696a19e282ff6d5ba79c6b119e8c0

Differential Revision: D15760589

Pulled By: zou3519

fbshipit-source-id: 537882c05ab7b19889a31c648c5efeb1949831a8
2019-06-12 12:34:37 -07:00
83cec5f3ee nn.Transformer (#20170)
Summary:
Accidentally rebased the old PR and make it too messy. Find it here (https://github.com/pytorch/pytorch/pull/19274)

Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed).

Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed.

Add a few unit tests for the transformer module, as follow:
TestNN.test_Transformer_cell
TestNN.test_transformerencoderlayer
TestNN.test_transformerdecoderlayer
TestNN.test_transformer_args_check
TestScript.test_scriptmodule_transformer_cuda

There is another demonstration example for applying transformer module on the word language problem. https://github.com/pytorch/examples/pull/555
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20170

Differential Revision: D15417983

Pulled By: zhangguanheng66

fbshipit-source-id: 7ce771a7e27715acd9a23d60bf44917a90d1d572
2019-06-12 12:22:12 -07:00
180aa234fc Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 5cbf562652b9d7cf3877b5f819141f88c9b857d3
2019-06-12 12:17:42 -07:00
8f40164517 Add libtorch Linux CPU binary build to PR CI (#21671)
Summary:
Currently we don't have any Linux libtorch binary build in the PR CI, which led to nightly build failure such as https://circleci.com/gh/pytorch/pytorch/1939687. This PR adds Linux libtorch CPU binary build to prevent such breakage from happening in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21671

Differential Revision: D15785003

Pulled By: yf225

fbshipit-source-id: d1f2e4235e48296ddecb3367f8e5a0df16f4ea49
2019-06-12 12:07:31 -07:00
39d412194f Fix ProcessGroupGloo allgather for tensors with shared storage (#21490)
Summary:
Fix https://github.com/pytorch/pytorch/issues/20421

`ProcessGroupGloo` only requires input/output tensors to be contiguous. Contiguous tensors might not start from the beginning of the underlying storage, e.g., `chunk(..., dim=0)[1]`. The current implementation passes `tensor.storage().data()` ptr to gloo buffer. This leads to wrong results if the tensor has a non-zero storage offset.

The proposed solution is to use `tensor.data_ptr()` instead. Let's see if this breaks any tests.

cc qijianan777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21490

Differential Revision: D15768907

Pulled By: mrshenli

fbshipit-source-id: 9d7d1e9baf0461b31187c7d21a4a53b1fbb07397
2019-06-12 11:59:17 -07:00
ad73ea22f7 Add strong Wolfe line search for lbfgs (#8824)
Summary:
This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book.

The implementation is based on four sources:
+ https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html
+ https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59
+ https://github.com/torch/optim/blob/master/lswolfe.lua
+ https://github.com/torch/optim/blob/master/polyinterp.lua

The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824

Differential Revision: D15783067

Pulled By: vincentqb

fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69
2019-06-12 11:32:41 -07:00
2c91ba3bbc Add div hashing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21422

Reviewed By: xianjiec

Differential Revision: D15589181

fbshipit-source-id: f6ff0726164f88da45e4b090b4d5ad05305b3225
2019-06-12 11:27:37 -07:00
76e01542ed Fix the shape of PReLU weight (#21330)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/21271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21330

Reviewed By: zrphercule

Differential Revision: D15776459

Pulled By: houseroad

fbshipit-source-id: 4e0aef88e9c91c79faa3da6fa66f7466dee52018
2019-06-12 11:03:40 -07:00
7123c6ca04 Enable groupwise for qconv (#21592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21592

We now support groupwise convolutions for qconv2d

Reviewed By: zafartahirov

Differential Revision: D15739239

fbshipit-source-id: 80b9b4fef5b9ee3d22ebecbaf205b970ab3d4250
2019-06-12 11:03:36 -07:00
8cc8e15473 Back out "[pytorch][PR] [Re-landing] Fix caffe2 windows CI for new Windows AMI" (#21670)
Summary:
Original commit changeset: e65c1d6bfcc9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21670

Differential Revision: D15776087

Pulled By: yf225

fbshipit-source-id: cbb55cbbcb133cae1aeb2fe75cc52e7350cc6c88
2019-06-12 10:37:19 -07:00
cbcb2b5ad7 Delete DDP hooks in Reducer destructor (#21591)
Summary:
Closes https://github.com/pytorch/pytorch/issues/21344

DDP assigns the original module to the first module replica instead of creating a new one. Then, it creates a new Reducer to add post hooks to sync gradients. However, because every reconstructed DDP instance wraps the same original module, all their reducers will add hooks to the same set of variables. This PR deletes DDP hooks from variables when destructing Reducer, trying to make DDP failure recoverable.

pietern kuttas and I discussed the following solutions:

#### Solution 1

Keep `add_post_hook` API intact, and do a `dynamic_cast` in `del_post_hook` to check hook type. If the type matches Reducer's hook, delete it. As pietern mentioned, this will not work if we create multiple DDP instances from the same original model.

#### Solution 2

Use a counter to generate a unique key for every hook in `Function`, and keep them in a map. return the key to the caller of `add_post_hook`, and ask the caller to provide key if it needs to delete the hook.

Con: this would add extra overhead to `add_post_hook` and every `Function` object.

#### Solution 3 [Current implementation]

kuttas suggests that, instead of generating a unique key, directly using the address of the pointer would be better. In order to avoid messing up dereferencing, let `add_post_hook` to return a `uintptr_t`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21591

Differential Revision: D15745706

Pulled By: mrshenli

fbshipit-source-id: e56d2d48de0c65f6667790ab16337eac7f7d8b76
2019-06-12 07:08:28 -07:00
1e4af2b969 Pin torchvision version. (#20811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20811
ghimport-source-id: 52e043453272d8441a2c0efd7f005b71ded024d6

Differential Revision: D15779416

Pulled By: ezyang

fbshipit-source-id: 1b3c2d9aeab57e580038f0c2a8bfbfcae9d7b62a
2019-06-12 06:16:20 -07:00
1ffa9d3d3b correct measure quantization error when followed_by=Relu and dequantize_output=1 (#21664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21664

As title

Reviewed By: csummersea

Differential Revision: D15770947

fbshipit-source-id: 57f5842e1a250300703b02134c314e4f06b767b8
2019-06-11 23:36:15 -07:00
c2a18a6702 Override print when python is present (#21625)
Summary:
This makes it so we can see the output of prim::Print in environments like iPython notebooks which override sys.stdout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21625

Differential Revision: D15756793

Pulled By: jamesr66a

fbshipit-source-id: 7d9a14b2e229ed358e784318e9d862677db2c461
2019-06-11 22:58:22 -07:00
aa7e27fa70 Emit Loop Condition as Separate Block (#21611)
Summary:
Emit loop condition as a separate block in loops, then inline them before conversion to SSA. This is needed for breaks & continues where we will inline the condition block after the continue pass and before the break pass.

I also considered emitting a prim::For and a prim::While, but i think it's easier to just have one pathway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21611

Differential Revision: D15775820

Pulled By: eellison

fbshipit-source-id: de17c5e65f6e4a0256a660948b1eb630e41b04fb
2019-06-11 22:03:26 -07:00
341a7e4bb5 Fix issue in backward path (#21663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663

as title

Reviewed By: hl475

Differential Revision: D15770793

fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6
2019-06-11 21:09:25 -07:00
afd202be9f StoreMatrixInMatrixMarketFormat can store both integer and float tensors (#21606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21606

StoreMatrixInMatrixMarketFormat was able to dump quantized tensors only but sometimes we want to dump float tensors.

Reviewed By: csummersea

Differential Revision: D15741611

fbshipit-source-id: 95b03c2fdf1bd8407f7d925171d9dc9f25677464
2019-06-11 17:28:19 -07:00
c2a08d339b Automatic update of fbcode/onnx to dd599b05f424eb161a31f3e059566a33310dbe5e (#21641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21641

Previous import was 5160f3ac3380302224998f1c95e111cd961c4bc5

Included changes:
- **[dd599b05](https://github.com/onnx/onnx/commit/dd599b05)**: Fix type s/depracted/deprecated/ (#2092) <Takeshi Watanabe>
- **[abb1702a](https://github.com/onnx/onnx/commit/abb1702a)**: Add shape inference for Tile op (#2076) <Hariharan Seshadri>
- **[67638d9c](https://github.com/onnx/onnx/commit/67638d9c)**: [New Operator] Round (#2053) <Jeff Saremi>
- **[584e4477](https://github.com/onnx/onnx/commit/584e4477)**: Add dilations support in ConvTranspose shape inference and update docs (#2068) <daquexian>

Reviewed By: zrphercule

Differential Revision: D15762382

fbshipit-source-id: 590f25fb733e1565eb90fcdeb797b0ba34e2d2c3
2019-06-11 16:54:47 -07:00
968114ae3d Revert D15769256: [jit] Add python string standard lib
Differential Revision:
D15769256

Original commit changeset: 1af487446361

fbshipit-source-id: 96bea4a49664dad68762bef75ae28e64c673f8b1
2019-06-11 16:54:43 -07:00
039629cedd fix incorrect use of TeX in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21649

Differential Revision: D15766392

Pulled By: umanwizard

fbshipit-source-id: a362ec06e971ee12c47a45bc9c15cc773ec878e3
2019-06-11 16:19:40 -07:00
1bd21d3f14 test_jit: Remove tests checking non-guaranteed properties from 'test_insert_observers'. (#21657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21657
ghimport-source-id: e9c7e45c00db55bf3b7895d06d77f0d99bfc1afe

Differential Revision: D15769295

Pulled By: ZolotukhinM

fbshipit-source-id: cfb40bc5d7116b1d99f5e0f5c4f5577f5aa33804
2019-06-11 16:12:09 -07:00
ee33afe2b1 randomized testing for qconv (#21436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21436

Test many different options

Reviewed By: zafartahirov

Differential Revision: D15683754

fbshipit-source-id: 60d0fc697b53c7e4adadbe80995d45f28729bca4
2019-06-11 16:07:22 -07:00
cf5c3bb3fe make range functions respect current stream (#21619)
Summary:
Stream is not respected on range/linspace/logspace functions, which contributes to https://github.com/pytorch/pytorch/issues/21589 (this is not a complete solution for that issue).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21619

Differential Revision: D15769666

Pulled By: ezyang

fbshipit-source-id: 7c036f7aecb3119430c4d432775cad98a5028fa8
2019-06-11 15:46:48 -07:00
9241c4b3c6 Add python string standard lib (#21656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21656
ghimport-source-id: cc7d7f68e33e95a97f6274c50823138aa4bacabb

Differential Revision: D15769256

Pulled By: bwasti

fbshipit-source-id: 1af487446361d90d03dce004c3e2169a3e62667d
2019-06-11 15:23:23 -07:00
9737b166a4 Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15761029

Pulled By: colesbury

fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb
2019-06-11 15:18:17 -07:00
fb9fbc009c Fix momentum bug in CyclicLR (#20401)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/19003

The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum).

Maybe printing a warning when switching this argument's value would suffice?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401

Differential Revision: D15765463

Pulled By: ezyang

fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf
2019-06-11 15:10:28 -07:00
cdbc20677c Add len to OrderedDict types (#21651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21651
ghimport-source-id: 0bba5c1930865e2d18b18782ba8c8990b0761d4d

Differential Revision: D15767795

Pulled By: bwasti

fbshipit-source-id: 70e27176897b0f977c9034ffb3ad21091c91e12e
2019-06-11 14:53:40 -07:00
7a040f4b0b Revert D15706021: [jit] Support for type annotations instead of torch.jit.annotate()
Differential Revision:
D15706021

Original commit changeset: 8bf1459f229d

fbshipit-source-id: 7ae34578560e2dccd0f04af2220445b3999771fe
2019-06-11 14:33:28 -07:00
b46e87cd3d Fix catch block to fix 'error: catching polymorphic type' (#21637)
Summary:
Fix catch block to fix 'error: catching polymorphic type `class c10::Error` by value [-Werror=catch-value=]'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21637

Differential Revision: D15761860

Pulled By: VitalyFedyunin

fbshipit-source-id: befc18a9c217440381cdb50a1319b0b5db5710e9
2019-06-11 12:30:52 -07:00
dd439bc39e Rename hubconf.conf to hubconf.py in docs (#21631)
Summary:
It's a typo I guess. cc fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21631

Differential Revision: D15764909

Pulled By: soumith

fbshipit-source-id: 5ffc7bde181c13e151332e7de3c0da36505b495e
2019-06-11 12:22:43 -07:00
bbcd6cc782 Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
](https://our.intern.facebook.com/intern/diff/15706021/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Pulled By: driazati

Differential Revision: D15706021

fbshipit-source-id: 8bf1459f229d5fd0e16e59953b9656e85a2207fb
2019-06-11 12:03:57 -07:00
25d1496d58 Fix Process Group for tensors shared across processes (#21449)
Summary:
Ops on a Process Group (pg) instance will hit an error when input/output tensors are created on a different process, because, pg calls `recordStream` on `CUDACachingAllocator` which only knows tensors created within the same process.

The proposed solution is to add a `suppressError` arg (suggestions for better names?) to `recordStream`. See comments in code for arguments.

CC pichuang1984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21449

Differential Revision: D15689736

Pulled By: mrshenli

fbshipit-source-id: e7fc81b167868f8666536067eaa7ae2c8584d88e
2019-06-11 11:50:25 -07:00
50ee1f3fa7 better error msg when seeing a unsupported builtin function (#21068)
Summary:
fixes https://github.com/pytorch/lockdown/issues/39.
Hopefully it doesn't break other tests....
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21068

Differential Revision: D15761895

Pulled By: ailzhang

fbshipit-source-id: 60cbb16cfc930b377d753b81e10b7edaea9a1281
2019-06-11 11:32:44 -07:00
4610347fdf Breaks up NN module in docs so it loads faster.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21291

Differential Revision: D15760935

Pulled By: ezyang

fbshipit-source-id: 114da4c52b78949e631e9adcae4eb620546124fb
2019-06-11 09:38:41 -07:00
646a7f99bb Move management of calls of "cmake --build" to setup_helper/cmake.py and refactor as a CMake class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21493

Differential Revision: D15759279

Pulled By: ezyang

fbshipit-source-id: 157e1de36f1c5a51caf2a25b363a94369c442012
2019-06-11 07:04:05 -07:00
835a6b9da2 Fix namedtensor build (#21609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21609
ghimport-source-id: 648a0bcd28db2cdda1bf2fa6a904ca8f851088c2

Differential Revision: D15747687

Pulled By: zou3519

fbshipit-source-id: 2a972a15fa7399391617fc6e6b19879b86568c3a
2019-06-11 06:53:50 -07:00
29c849ff34 implement transpose operator for MKLDNN (#19955)
Summary:
implement transpose operator for MKLDNN
1. upgrade mkldnn-bridge to support ND transpose
2. implement transpose operator in caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955

Differential Revision: D15701832

Pulled By: bddppq

fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f
2019-06-11 01:55:13 -07:00
731670f40a upgrade mkldnn-bridge (#20569)
Summary:
1. reduce the overhead of mkldnn-bridge itself
2. remove redundant code and useless APIs
3. provide new operators, including int8 inner_product,  ND permute/transpose, elem_add/mul, and etc.
4. improve inner_product to support io format weights without implicit reorder
5. add SoftMax support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20569

Reviewed By: houseroad

Differential Revision: D15558663

Pulled By: bddppq

fbshipit-source-id: 79a63aa139037924e9ffb1069f7e7f1d334efe3a
2019-06-11 00:47:11 -07:00
f2623c74a9 add PT pointwise unary ops to the benchmark suite (#21207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207

This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately.

Reviewed By: zheng-xq

Differential Revision: D15471597

fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56
2019-06-10 21:35:44 -07:00
4e3c97a0be add separate path for op with JIT (#21210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210

This diff introduces a new path to run op with JIT. There are two steps involved here:
1. Users need to script the op. This should happen in the `init` method.
2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend

Reviewed By: zheng-xq

Differential Revision: D15460831

fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee
2019-06-10 19:53:58 -07:00
a82feee07c Method-only entries in native functions should have self as first argument (#21549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21549
ghimport-source-id: a98fd7a18b4c523d9facb328a3b80a35416834ce

Differential Revision: D15724794

Pulled By: li-roy

fbshipit-source-id: a0f218cf6fd32d9694921685fc805d868156fce3
2019-06-10 19:32:34 -07:00
fff22125a5 AT_CHECK -> TORCH_CHECK (#21547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21547
ghimport-source-id: d99d3fdcd9abde4e1126716d32ed05aaf8508c50

Differential Revision: D15747676

Pulled By: bwasti

fbshipit-source-id: ae9824436e8316e2d0002d2973df4833a18c5f23
2019-06-10 16:58:09 -07:00
f5c24fc66d Deprecate torch::jit::RegisterOperators (#21552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21552

Original commit changeset: a142c22be3fd

https://github.com/pytorch/pytorch/pull/21368 got reverted because of a MSVC issue. This commit re-introduces that change and fixes the MSVC issue.

Differential Revision: D15727526

fbshipit-source-id: 8eb0eb9a7108dc049911b79342c364ac1b8623c8
2019-06-10 16:52:24 -07:00
cab3e726df Split out Function into its own file (#21539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21539
ghimport-source-id: f1e4396a0bec6e30d3179f926ec4da68807942f7

Differential Revision: D15741979

Pulled By: suo

fbshipit-source-id: 4cd0ed36bcbf8db0b36a101dda6f58975f806889
2019-06-10 16:37:58 -07:00
512c9d8c76 add PT gather op to the benchmark suite (#21614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614

as title

Reviewed By: kimishpatel

Differential Revision: D15525115

fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d
2019-06-10 16:31:52 -07:00
32a0440209 Publish torch::Dict and torch::OperatorKernel (#20723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20723

These classes already existed but only as c10::Dict and c10::OperatorKernel.
Since they're now part of torch::RegisterOperators(), they should also live in the torch namespace.

Differential Revision: D15421575

fbshipit-source-id: d64ebd8664fadc264bbbae7eca1faa182529a32b
2019-06-10 16:19:42 -07:00
a93a1ccbb3 Run test_c10d.py in multi-gpu environment (#21598)
Summary:
yf225 helped me discovered that our CI does not run multi-gpu tests in `test_c10d.py`. There are quite a few multi-gpu c10d tests. This PR tries to enable those tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21598

Differential Revision: D15744256

Pulled By: mrshenli

fbshipit-source-id: 0a1524a862946128321f66fc8b7f331eff10e52a
2019-06-10 15:58:38 -07:00
74f6c55f0f support negative axis in concat and split operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955

Differential Revision: D14476031

Pulled By: ezyang

fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7
2019-06-10 15:26:29 -07:00
3889855a5b Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463
ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad

Differential Revision: D15747426

Pulled By: ezyang

fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76
2019-06-10 15:26:25 -07:00
8b9b215dc5 Add a 'dim' argument to nuclear norm (#21022)
Summary:
Addresses #18275.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
2019-06-10 15:18:34 -07:00
2378c120e6 Implements divmod function (#20979)
Summary:
This PR refer to issue #18627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20979

Differential Revision: D15743929

Pulled By: wanchaol

fbshipit-source-id: 967fc3fd519501e427176e10b112c8be1390540b
2019-06-10 15:00:56 -07:00
8a88d33103 Uninitialized Ivalue (#21387)
Summary:
Create an uninitialized ivalue. This will be needed for Breaks & Continues to match up if block outputs of values that are guaranteed not to be used but need to escape the block scope. It is not exposed to users.

Was previously part of final returns but I was asked to make a separate PR for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21387

Differential Revision: D15745124

Pulled By: eellison

fbshipit-source-id: ae6a6f766b4a70a71b9033987a630cfbf044e296
2019-06-10 14:51:24 -07:00
dd0ffd6864 Use schema string specification in derivatives.yaml. (#20916)
Summary:
For consistency, derivatives.yaml now uses the same schema specification as native_functions.yaml.

Note that there are some small downsides, e.g. changing the default values or return parameter names in native_functions.yaml also now requires updating derivatives.yaml as well.  But this has a few nice properties:
1) Able to copy-paste definitions from native_functions to derivatives.
2) Makes it impossible to write derivatives for operators without schemas (e.g. old TH operators).
3) Moves us closer to the ideal situation of co-locating forward and backwards declarations.

Note that this doesn't change any generated code; in particular, this has the same behavior of mapping in-place and out-of-place definitions together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20916

Differential Revision: D15497800

Pulled By: gchanan

fbshipit-source-id: baee5caf56b675ce78dda4aaf6ce6a34575a6432
2019-06-10 13:47:55 -07:00
5f25a252d6 Allow tensors with requires_grad=True in c10 ops (#21599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21599

We prevented this because c10 ops can't have a backwards yet and calling them with requires_grad=True would do the wrong thing
if the c10 op is not purely implemented by calling other autograd-able ops.

However, it is a valid use case to have c10 ops that just call other autograd-aware ops, and these ops should be callable with requires_grad=True.

This should fix https://github.com/pytorch/pytorch/issues/21584.

Differential Revision: D15744692

fbshipit-source-id: ba665365c850ef63fc9c51498fd69afe49e5d7ec
2019-06-10 13:37:06 -07:00
5a48642fde Revert D15717575: [pytorch][PR] Fix bug in multinomial_alias_draw
Differential Revision:
D15717575

Original commit changeset: b1154e226d42

fbshipit-source-id: 305ca010bfda88c9295c52e0626d867452c72f84
2019-06-10 13:28:11 -07:00
4fb302eb34 fix optional type promotion for classes (#21593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21593
ghimport-source-id: f68730618bccf2326218e08d0a2a70171fdd8921

Differential Revision: D15741471

Pulled By: suo

fbshipit-source-id: 7ac1a0f6d9d2ff4bc819caff43a7a5b6d37cbc98
2019-06-10 12:51:00 -07:00
a436822c40 Consider contained types in alias analysis (#21431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21431
ghimport-source-id: d86ce974a065ec572e71cfa14a8f6bdf48216da7

Reviewed By: jamesr66a

Differential Revision: D15718560

Pulled By: suo

fbshipit-source-id: a36ce907ab26be22f12bab6175797fe8b34721f1
2019-06-10 12:42:10 -07:00
bb4aff2680 cleanups to memory_dag (#21430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21430
ghimport-source-id: 2dc5a0df8512e796c12d65d3ecc5981638122ce6

Reviewed By: jamesr66a

Differential Revision: D15718561

Pulled By: suo

fbshipit-source-id: 1ef31c08c8a757b632451eb07a47a8227e76c67f
2019-06-10 12:42:06 -07:00
ae144032aa cleanups to alias analysis interfaces (#21397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21397
ghimport-source-id: 8733e1af2fe66a3f4494a2c24c82a039375a982e

Reviewed By: jamesr66a

Differential Revision: D15642662

Pulled By: suo

fbshipit-source-id: ae66b7b4f19f255d6fe0e7e804bd0df6d86cb8d1
2019-06-10 12:42:02 -07:00
ddac8da813 avoid calling front() on empty working set (#21396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21396
ghimport-source-id: 7e57282099d2fd57c58c990b51ae933e427aecb2

Reviewed By: jamesr66a

Differential Revision: D15642663

Pulled By: suo

fbshipit-source-id: f9b467ba53f03438879bf3929da522aabaff2343
2019-06-10 12:41:58 -07:00
bb1dbdb99b Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes #21257, fixes #21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15717575

Pulled By: ezyang

fbshipit-source-id: b1154e226d426c0d412d360c15f7c64aec95d101
2019-06-10 12:05:48 -07:00
30d6933016 BailOut Graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21381

Differential Revision: D15724412

Pulled By: Krovatkin

fbshipit-source-id: 18e4a1916c7cd1baea76953d0087d6257e58c55b
2019-06-10 11:49:38 -07:00
3df5a46a99 Skip triangular_solve CUDA test on non-default stream
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590

Differential Revision: D15742549

Pulled By: ezyang

fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07
2019-06-10 11:38:42 -07:00
6f99bcda8a fix test (#21594)
Summary:
test that wasn't on the CI, but is tested internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21594

Differential Revision: D15742157

Pulled By: eellison

fbshipit-source-id: 11fc82d1fc0281ffedd674ed96100e0c783c0599
2019-06-10 11:23:18 -07:00
91ea2cd5a7 clip sigmoid to prevent transforms return inf/nan values (#20288)
Summary:
This PR addresses some numerical issues of Sigmoid/StickBreakingTransform, where these transforms give +-inf when the unconstrained values move to +-20 areas.

For example, with
```
t = torch.distributions.SigmoidTransform()
x = torch.tensor(20.)
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current behaviour the inverse will return `inf` and logdet return `-inf` while this PR makes it to `15.9424` and `-15.9424`.

And for
```
t = torch.distributions.StickBreakingTransform()
x = torch.tensor([20., 20.])
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current value is `(inf, nan)` and `-inf` for logdet, while this PR makes it `[16.6355, 71.3942]` and `-47.8272` for logdet.

Although these finite values are wrong and seems unavoidable, it is better than returning `inf` or `nan` in my opinion. This is useful in HMC where despite that the grad will be zero when the unconstrained parameter moves to unstable area (due to clipping), velocity variable will force the parameter move to another area which by chance can move the parameter out of unstable area. But inf/nan can be useful to stop doing inference early. So the changes in this PR might be inappropriate.

I also fix some small issues of `_Simplex` and `_RealVector` constraints where batch shape of the input is not respected when checking validation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20288

Differential Revision: D15742047

Pulled By: ezyang

fbshipit-source-id: b427ed1752c41327abb3957f98d4b289307a7d17
2019-06-10 11:16:31 -07:00
4bdbd30b96 Add python binding to deserialize blob (#21532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21532

Add python binding to deserialize blob

Reviewed By: yinghai

Differential Revision: D15706816

fbshipit-source-id: f498c7e0f7392f055b13810bbf81cba59f25e1d2
2019-06-10 10:49:21 -07:00
e4fae884f6 Change compiler to use Load/Stores, then transform to SSA (#21101)
Summary:
This changes our compiler so it first emits Loads & Stores, and then transforms the graph to SSA in a follow up pass. When a variable is set, we emit a prim::Store, and when a variable is referenced, we emit a prim::Load.
```
a = 1
print(a)
```
becomes:
```
%a.1 : int = prim::Constant[value=1]()
prim::Store[name="a"](%a.1)
%a : int = prim::Load[name="a"]()
prim::Print(%a)
```
In the follow up pass, convertToSSA, the values are turned into SSA form with the Loads & Stores removed. This change will enable breaks and continues because you can transform the graph with the variable naming information still intact.

There are still some remaining jitter and edge cases issues that I have to look through, but I think is still ready for eview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21101

Differential Revision: D15723353

Pulled By: eellison

fbshipit-source-id: 3269934d4bc24ddaf3a87fdd20620b0f954d83d0
2019-06-10 10:26:43 -07:00
1e6c99a6e0 update hub doc (#21568)
Summary:
update doc as pointed out in https://github.com/pytorch/hub/pull/22
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21568

Differential Revision: D15732927

Pulled By: ailzhang

fbshipit-source-id: 78ab026539e5ee59e7c3a8144e2c9fcbbc225733
2019-06-10 09:39:35 -07:00
mal
f308b07e8c Don't leak threads on exit (#21438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21438
ghimport-source-id: 33f145f5b3508163365442c22a223c4a44e677d8

Differential Revision: D15738856

fbshipit-source-id: 656e8d0e3d0d22f116e3ab66bf0282608d6f1a76
2019-06-10 09:14:13 -07:00
c294d64eff fix concat and split tensor inference function (#21382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21382

Concat tensor inference function was not handling correctly the case where axis argument points to the last dimension so input tensors don't need to have the same number of dimensions.
Split tensor inference function was not handling correctly the case where split information is provided as the second input tensor rather than as an argument.

Reviewed By: mdschatz

Differential Revision: D15633148

fbshipit-source-id: d566af44dc882457ee9efe83d2461b28408c2c5d
2019-06-10 08:23:53 -07:00
9deab0cf0e Documentation for locking discipline in engine.cpp/.h (#21548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21548

Added documentation as titled.

Reviewed By: ezyang

Differential Revision: D15723146

fbshipit-source-id: fab4a35c62f07256673318c0874701f7628b2f7a
2019-06-10 07:50:01 -07:00
547fcaa977 Add named_guard to native_functions options (#21373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21373
ghimport-source-id: acab6d3ab0b287d504afa98eaefa2aed6fe99453

Differential Revision: D15717925

Pulled By: zou3519

fbshipit-source-id: 8515c448b368be79f71681833b5edf960da44fe8
2019-06-10 07:29:41 -07:00
8ffcbfb7d4 Add unique_ptr<NamedTensorMeta> field to TensorImpl (#21341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21341
ghimport-source-id: 06021b06864746571a904a1cfc0aaea5f8a12325

Differential Revision: D15717907

Pulled By: zou3519

fbshipit-source-id: 48ee76cf2f11a8b092be75ecac8d5faee68ca0d9
2019-06-10 07:29:36 -07:00
f9c4d0d7a9 Fix NVTX path on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21580

Differential Revision: D15738060

Pulled By: ezyang

fbshipit-source-id: 05a2e97279816753d574678252bf9b35913c99b1
2019-06-10 06:05:44 -07:00
c4e0d61646 Regularization is not supported in FP16 (#21319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21319

Add assertion to raise Exception when Regularization is applied on FP16.

Reviewed By: bddppq

Differential Revision: D15528486

fbshipit-source-id: c887c90d1d9ccfdaded3b5fa16816c6f29910e2e
2019-06-09 23:59:48 -07:00
b1bf16eeab Enabled _th_ixor_ and _th_equal for bool (#21538)
Summary:
Following up on the feedback in this [PR](https://github.com/pytorch/pytorch/pull/21113/files?file-filters%5B%5D=.cwrap&owned-by%5B%5D=)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21538

Differential Revision: D15721390

Pulled By: izdeby

fbshipit-source-id: 1b5265bf8726c1051f306f7674d731e25a6c7d03
2019-06-09 15:28:38 -07:00
e447a733a1 Update module.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21570

Differential Revision: D15732665

Pulled By: ezyang

fbshipit-source-id: caa12a8619ad1396540f787b5c849d29cc5b03bd
2019-06-09 15:28:35 -07:00
04e6564f0c clean up the TracingState API (#21564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21564
ghimport-source-id: b6f71e2238f6f7c8de6cfbf6969a5e08e07be46c

Reviewed By: suo

Differential Revision: D15729497

Pulled By: zdevito

fbshipit-source-id: aacfea6058fadb572df692aa9ebd6cab0bcd03fc
2019-06-09 15:28:32 -07:00
69aa2b2814 Collapse tracing_state.h into tracer.h (#21563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21563
ghimport-source-id: de87e5e621da33326a9d2cb8a57d82d355166479

Reviewed By: suo

Differential Revision: D15729499

Pulled By: zdevito

fbshipit-source-id: 17b3e2e71d004f08c4413e80091388ae9ac2df2b
2019-06-09 15:28:29 -07:00
ea822d9626 Interpreter support for CallFunction/CallMethod (#21562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21562
ghimport-source-id: 17e5e183f730f50d97ef48973aafc6249d54978f

Reviewed By: suo

Differential Revision: D15729500

Pulled By: zdevito

fbshipit-source-id: efa8a133b617b1498810392a8da6b513ce00b5eb
2019-06-09 15:28:26 -07:00
ad0c08f950 Expose ExecutionPlan in prep for function calls (#21561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21561
ghimport-source-id: 4bf28d8140610a0cefef0c0a17f0a513ae855dde

Reviewed By: suo

Differential Revision: D15729498

Pulled By: zdevito

fbshipit-source-id: b26458336da1efaba71d8a577c3917c6622dae0d
2019-06-09 15:28:22 -07:00
de31f6719c Add flag to temporarily enable first class modules (#21560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21560
ghimport-source-id: a555ca33fcd3efd1147aaf90f26a8e63da1c1a67

Reviewed By: suo

Differential Revision: D15729502

Pulled By: zdevito

fbshipit-source-id: d6c11472bfc791e2ad1e9aa695b0439d72b79681
2019-06-09 15:28:19 -07:00
18996a8952 unfinished push/pop reduction (#21559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21559
ghimport-source-id: 81ba4a5638577781e1ea706599966c033c37e814

Reviewed By: suo

Differential Revision: D15729501

Pulled By: zdevito

fbshipit-source-id: 3423bff61e89617c40078d5fab726b77d21bfa27
2019-06-09 15:28:16 -07:00
13edda417d Prepare interpreter for function calling (#21558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21558
ghimport-source-id: a8a19dbefea869ca1401e5afea6c02f31f95b99a

Reviewed By: suo

Differential Revision: D15729491

Pulled By: zdevito

fbshipit-source-id: 9629664608a2379a2ddcafaf741fa8463c4fb917
2019-06-09 15:28:13 -07:00
8ae7b1c486 Update functional.py doc (#21510)
Summary:
- Fixes a typo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21510

Differential Revision: D15731277

Pulled By: ezyang

fbshipit-source-id: c3f8e110f5c61e797b857477b495168ea8d63cd5
2019-06-09 15:28:09 -07:00
74828be4a7 fix segfault in cat on CPU with tensors that can't be indexed with 32-bit ints. (#21530)
Summary:
Should be self-explanatory. This `int` variable is overflowing.

Reported in #21526
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21530

Differential Revision: D15719275

Pulled By: umanwizard

fbshipit-source-id: 24e917a00a5b78bc3af29ef3b8b72eea7e89d5d5
2019-06-09 15:28:06 -07:00
406374657a Optimize batch mm op when broadcast the second input (#21556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21556

Optimize batch mm op when broadcast the second input

Reviewed By: houseroad

Differential Revision: D15728914

fbshipit-source-id: c60441d69d4997dd32a3566780496c7ccda5e67a
2019-06-09 15:28:03 -07:00
d71501259b Revert D15572818: Prepare interpreter for function calling
Differential Revision:
D15572818

Original commit changeset: 3a9b5f053664

fbshipit-source-id: b932411e8e88c7414c8db332d6049fe4e26bd83e
2019-06-07 22:20:54 -07:00
d4bcab0dba Revert D15590900: Reduce number of stack manipulation instructions in interpreter.
Differential Revision:
D15590900

Original commit changeset: 98829979feba

fbshipit-source-id: eb7f1d396bb2b98d2852af81c69db81430eba33c
2019-06-07 22:20:50 -07:00
03641413e5 Revert D15600068: Add flag to temporarily enable first class modules
Differential Revision:
D15600068

Original commit changeset: 9b68e23d7f8b

fbshipit-source-id: 45f36b3aaa4f1c457c27490579496456cbbc680b
2019-06-07 22:20:47 -07:00
e616a5e8b8 Revert D15600067: Expose ExecutionPlan in prep for function calls
Differential Revision:
D15600067

Original commit changeset: 82b7de458dd6

fbshipit-source-id: ca26a362cd73bdb9e8c4eba15dd5c10986fa79fe
2019-06-07 22:20:44 -07:00
bfb235b8c9 Revert D15618275: Interpreter support for CallFunction/CallMethod
Differential Revision:
D15618275

Original commit changeset: 038ae27e5416

fbshipit-source-id: 8dbe0f564ba103fe445dacc471085c659171705f
2019-06-07 22:20:40 -07:00
c27cabe2d7 Revert D15719982: Collapse tracing_state.h into tracer.h
Differential Revision:
D15719982

Original commit changeset: 56bb021dd949

fbshipit-source-id: 2eb3e2c9745c35a84ebcc0fc7ac62b5f1fdd6437
2019-06-07 22:20:37 -07:00
3cfe914191 Revert D15719980: clean up the TracingState API
Differential Revision:
D15719980

Original commit changeset: 3de2746c3f3c

fbshipit-source-id: 4610e215936b2476a0271355ef3b8f1f480bdea8
2019-06-07 22:20:34 -07:00
dd0faf4366 clean up the TracingState API (#21514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21514
ghimport-source-id: 6a9b6fdd7e696ea29e8715482708efe897230e4d

Reviewed By: jamesr66a

Differential Revision: D15719980

Pulled By: zdevito

fbshipit-source-id: 3de2746c3f3c3de4111b4cb73f4c4acedbf28862
2019-06-07 20:57:05 -07:00
8c5f3acfc0 Collapse tracing_state.h into tracer.h (#21513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21513
ghimport-source-id: 86278929818a8fc65684bd8f2ffac31460772fe9

Reviewed By: jamesr66a

Differential Revision: D15719982

Pulled By: zdevito

fbshipit-source-id: 56bb021dd949668562ea481c5ff0115a9ea2b02e
2019-06-07 20:57:01 -07:00
5f6afafdef Interpreter support for CallFunction/CallMethod (#21325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21325
ghimport-source-id: eeca1176f5e00c85a69cd016acccf5105e670e02

Reviewed By: jamesr66a

Differential Revision: D15618275

Pulled By: zdevito

fbshipit-source-id: 038ae27e5416f1ce338009627c839a4d61a00658
2019-06-07 20:56:58 -07:00
1517ff66a1 Expose ExecutionPlan in prep for function calls (#21273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21273
ghimport-source-id: b92c1e07fbe4122467a21b98d29635295093e0c2

Reviewed By: jamesr66a

Differential Revision: D15600067

Pulled By: zdevito

fbshipit-source-id: 82b7de458dd65c175f55b0f383bfc3fcf4704032
2019-06-07 20:56:55 -07:00
7e08bc42d5 Add flag to temporarily enable first class modules (#21272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21272
ghimport-source-id: 43e73d1b93ccbe0dd6845eb3f7444c9d0abd444b

Reviewed By: jamesr66a

Differential Revision: D15600068

Pulled By: zdevito

fbshipit-source-id: 9b68e23d7f8b6046a5a0d6d9fd16138ac384b863
2019-06-07 20:56:52 -07:00
dde27958dd Reduce number of stack manipulation instructions in interpreter. (#21240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21240
ghimport-source-id: 5e9cbe8b3df3ac721135d2f652a420ae0b14ac55

Reviewed By: jamesr66a

Differential Revision: D15590900

Pulled By: zdevito

fbshipit-source-id: 98829979feba23685f0ba98ba3cb840157f7259a
2019-06-07 20:56:49 -07:00
c53e4d012d Prepare interpreter for function calling (#21185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21185
ghimport-source-id: 6b9cb92d1f1f59bb980dcfa0d29dfe985ee955d1

Reviewed By: jamesr66a

Differential Revision: D15572818

Pulled By: zdevito

fbshipit-source-id: 3a9b5f053664c09212b97f1391d8d006337b5550
2019-06-07 20:56:46 -07:00
c36dc35853 Revert D15576968: Turn on Werror for deprecated-declarations.
Differential Revision:
D15576968

Original commit changeset: fb73a8986a5b

fbshipit-source-id: 1ae19afc6816f764b895a47162728433a319ac0b
2019-06-07 19:15:56 -07:00
b849f101b1 Fix slow unpickling (#21542)
Summary:
This was looking at the number of elements in the memo table, not the total capacity, and was thus calling reserve() a lot more than it should have
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21542

Reviewed By: driazati

Differential Revision: D15723132

Pulled By: jamesr66a

fbshipit-source-id: 20e1f9099b6a51a33994ea9dbc3f22eb3bc0c8f9
2019-06-07 17:28:55 -07:00
66d596645a Turn on Werror for deprecated-declarations. (#21195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21195

The motivation is that, while we shouldn't break USER code for using
deprecated declarations, we should keep our internal code base
deprecation clean.

Differential Revision: D15576968

fbshipit-source-id: fb73a8986a5b60bf49ee18260653100319bb1030
2019-06-07 17:24:17 -07:00
a5cf6d5100 reorganize op bench directory (#21543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543

No code change in this diff.

Reviewed By: hl475

Differential Revision: D15721419

fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667
2019-06-07 16:06:51 -07:00
5b4a188a95 add support for steps(strides) in tensor slices
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20929

Differential Revision: D15632636

Pulled By: Krovatkin

fbshipit-source-id: 0e127bbd7b339784c4be2e0a57f28024727d5ad3
2019-06-07 15:55:26 -07:00
5744fb3007 Add mkldnn softmax operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21516

Differential Revision: D15712759

Pulled By: bddppq

fbshipit-source-id: bf515135263156bea1a2b3e53a47edf697b8b1e2
2019-06-07 15:22:18 -07:00
a947d98282 Set "scalar_check: false" for some LAPACK functions that can't return scalars. (#21498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21498
ghimport-source-id: 33ce2f3f083616f633561e4871585f439e2647c0

Differential Revision: D15715477

Pulled By: gchanan

fbshipit-source-id: 7772573ba74cdf7a5f2d86d2e581652ebd85e1c6
2019-06-07 15:00:54 -07:00
fe5ceea580 Rename caffe2<->c10 operator wrappers (#21322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322

Naming is everything.

- Rename c10_operator.h -> export_caffe2_op_to_c10.h
- Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h
- Rename corresponding macros

This hugely improves readability and explains what these things are doing.

Reviewed By: dzhulgakov

Differential Revision: D15616816

fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603
2019-06-07 13:48:10 -07:00
dad85b7e69 clang-format by line (#21531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21531
ghimport-source-id: 711867e19cc3948a5e2a6aa8c4f2cd631abb04d2

Reviewed By: zdevito

Differential Revision: D15719260

Pulled By: suo

fbshipit-source-id: e88c5d3e14e6ecc956ce30ab0246ed606f4b0a38
2019-06-07 13:42:44 -07:00
fa0c5c31d4 Turn namedtensor build back on (#21520)
Summary:
namedtensor build + test should run on PRs only if the commit message
includes [namedtensor ci].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21520

Differential Revision: D15718404

Pulled By: zou3519

fbshipit-source-id: ce8b5df2682e795e64958a9d49e2e3c091599b33
2019-06-07 13:37:48 -07:00
2b902e9738 Fix the offset numerical bug when casting (#21484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484

cast<int32_t*> => cast<int32_t>

Also fixed reserve problem which might cause incorrect pointer.

Reviewed By: yinghai

Differential Revision: D15699866

fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990
2019-06-07 12:33:18 -07:00
ac8c3fa7b6 Revert D15717337: [pytorch][PR] [precommit hook] clang-format by line
Differential Revision:
D15717337

Original commit changeset: 57e65a679a8f

fbshipit-source-id: f73794087a23d56d03497b29d9a9e4e7d54deaad
2019-06-07 11:50:42 -07:00
a77802cf56 clang-format by line (#15657)
Summary:
This should further reduce noise by only clang-formatting the lines you actually touched in the precommit hook.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15657

Differential Revision: D15717337

Pulled By: suo

fbshipit-source-id: 57e65a679a8fdee5c3ff28e241c74ced9398eb0c
2019-06-07 11:46:36 -07:00
b7f5d1e4c6 Fix size of histc return on CPU when input is 0-dimensional and bins=1. (#21497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21497
ghimport-source-id: bc03f27408aa772f78d5351afe404b5e91a7c4ce

Differential Revision: D15715478

Pulled By: gchanan

fbshipit-source-id: 90e1b65249b4b12f936ee8877cc0bc5a972d9ceb
2019-06-07 11:23:46 -07:00
881adb5bcd fix tuple indexing bug (#21521)
Summary:
lower tuples pass didn't check bounds for tuple index
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21521

Differential Revision: D15716813

Pulled By: eellison

fbshipit-source-id: 8eead98c2c63118e7d24a8c8bf6184b02afb7dcd
2019-06-07 11:17:59 -07:00
a5cca4d342 add failback for Sign operator (#21343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343

Needed to binarise features

Reviewed By: yinghai

Differential Revision: D15625653

fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab
2019-06-07 10:56:22 -07:00
51fb42ebcf Updating submodules
Reviewed By: yns88

fbshipit-source-id: 5778cdb5173fc16e5d5474fefa2ea89264101184
2019-06-07 10:43:12 -07:00
54413cf91e replace LegacyTracedModule with torchscript used in add_graph (#21339)
Summary:
The new implementation of tracing supports more module. So many error-handling code can be removed by placing the old one (LegacyTracedModule).

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21339

Reviewed By: natalialunova

Differential Revision: D15695154

Pulled By: orionr

fbshipit-source-id: af7d35754e9f34bd1a0ad7b72a9ebe276ff8ab98
2019-06-07 10:43:08 -07:00
b144ba66d5 Change PyTorch tests to use non-default CUDA stream (#21474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474
ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3

Differential Revision: D15699700

Pulled By: fbhuba

fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97
2019-06-07 10:24:48 -07:00
5cc3a3e2bf Set "scalar_check: false" for TH methods that can't return scalars. (#21496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21496
ghimport-source-id: d7197bccfe9e4d807f38a66e02ca6f0bf32bdc2b

Differential Revision: D15715479

Pulled By: gchanan

fbshipit-source-id: fa59eb808d26119b33eb97bb90ef70e95e58458d
2019-06-07 10:19:23 -07:00
cc85c3dbbc ONNX Export Slice and Flip ops for opset 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20533

Reviewed By: zrphercule

Differential Revision: D15579713

Pulled By: houseroad

fbshipit-source-id: 91f3ac0cb14ef226f980362b0013b6b92cb8b8da
2019-06-07 10:03:26 -07:00
3eced796cd Make torchvision install chatty. (#21476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21476
ghimport-source-id: adfd08b818f31ebbdf3da89d6bb95d33e14a9403

Differential Revision: D15715270

Pulled By: ezyang

fbshipit-source-id: dde02579d9960ac960306d0a024b8e17846ae0ff
2019-06-07 08:41:13 -07:00
1503c734ce updating gemmlowp tp 3fb5c
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21488

Differential Revision: D15715264

Pulled By: ezyang

fbshipit-source-id: 86978f294720e0ce6f60b748a71f0604d6cfa00c
2019-06-07 08:32:49 -07:00
8c9a88bdab Make test_cuda.py work on Python 2. (#21466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466
ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91

Differential Revision: D15698096

Pulled By: ezyang

fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a
2019-06-07 08:13:03 -07:00
c60465873c Fix batch norm multiplier init (#13774)
Summary:
Fixes #12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13774

Differential Revision: D15715021

Pulled By: ezyang

fbshipit-source-id: 20ce2beee1b39ebe9f023c5f2b25be53acccb5f3
2019-06-07 07:50:39 -07:00
c604847602 Implement at::match(Dimname, Dimname) and at::unify(Dimname, Dimname) (#21281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21281
ghimport-source-id: 4b241d54a60c188b8566065c90b227b40914a5ca

Differential Revision: D15699063

Pulled By: zou3519

fbshipit-source-id: c0f00c370d266a4ea5211aae943041fd899e960a
2019-06-07 06:30:45 -07:00
4727685ea1 Added at::Dimname (#21280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21280
ghimport-source-id: 921848326e4828ffd422868be26c409c6490e1ab

Differential Revision: D15698516

Pulled By: zou3519

fbshipit-source-id: 502b9b019d51dd46327e6caf2af69aa520c70cb6
2019-06-07 06:30:42 -07:00
e27c2f1437 Revert "Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen" (#21427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21427
ghimport-source-id: 930c2fb29320f70e78f94e7eaaffe8e2ab62e7f3

Differential Revision: D15698423

Pulled By: ezyang

fbshipit-source-id: 891c94c24b6d377cd6dd94d86cc66465b582359f
2019-06-07 05:52:27 -07:00
d6af6588c2 Super resolution export to Caffe2 is broken, skip it. (#21479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21479
ghimport-source-id: 60fa97fb2dfb37a758c0e8b9c2bc0fb2819fd2f7

Differential Revision: D15713609

Pulled By: ezyang

fbshipit-source-id: a3d9c49e2db985f4373508cd44e94d43ae6e24da
2019-06-07 05:46:26 -07:00
78a376592d add cancelAsyncCallback method to OperatorBase (#21492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492

If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks.

The new behavior is to cancel the callbacks first, then set event status to finished.

Reviewed By: ilia-cher

Differential Revision: D15702475

fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca
2019-06-06 20:57:12 -07:00
696b2c89b4 Adding gradient to Boolean Mask operator (#21423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21423

- add gradient for boolean mask
- add test for gradient checking

Reviewed By: BIT-silence

Differential Revision: D15640036

fbshipit-source-id: 79f40c6901e805bf1b8e9b01b57903e30b00f654
2019-06-06 20:48:47 -07:00
d3d195e0b1 Updating submodules
Reviewed By: yns88

fbshipit-source-id: af5812e3d071e66f9d0272c36bf639eb04bde7e4
2019-06-06 20:42:34 -07:00
772fd79d40 Defer constructing error strings for definitions under If's until they're needed. (#21429)
Summary:
This saves ~7% DenseNet load time (4.3s -> 4.0s) on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21429

Differential Revision: D15681374

fbshipit-source-id: 9925a6154d51f2d592e26cb5ff8bf7ab3ee2519b
2019-06-06 20:32:57 -07:00
abc0d3e544 Fix unused variable warning
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21444

Differential Revision: D15701786

Pulled By: ezyang

fbshipit-source-id: 8348e08f9b8f3047b30736f9a944786ab84e6b68
2019-06-06 19:37:54 -07:00
bad67015fe Add warning for Turing GPUs and CUDA <= 9000 (#21468)
Summary:
Turing GPUs (compute capability 7.5) require CUDA10 to work properly.
We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older:
[Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575)
[Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6)

Tested on using CUDA9 with an RTX 2080Ti.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21468

Differential Revision: D15696170

Pulled By: ezyang

fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a
2019-06-06 19:33:02 -07:00
63d4bbb0ec Turn XLA back on for default set (but filtered out using should_run_job) (#21470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21470
ghimport-source-id: 69800c1ce1187591b7bcdb8a63973b4fd8d0e326

Differential Revision: D15696930

Pulled By: ezyang

fbshipit-source-id: fafbcba38d9572a23ee9c1d81cdcce3a154ae4c6
2019-06-06 19:18:45 -07:00
f433913996 add more info back to BenchResult (#21502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502

In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration.

Reviewed By: wanchaol

Differential Revision: D15706306

fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d
2019-06-06 18:43:51 -07:00
d51bd2191c Revert D15629687: Deprecate torch::jit::RegisterOperators
Differential Revision:
D15629687

Original commit changeset: 2f87f18be655

fbshipit-source-id: a142c22be3fdf14a2b3c29b8766b218fb0883927
2019-06-06 18:09:01 -07:00
37ab35c8fc Move jit testing utils to their own file (#21491)
Summary:
This moves `JitTestCase` to its own file so that we can have other jit
test files (ex. `test_jit_py3.py`)

There aren't any code changes, just a move and cleaning up the imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21491

Pulled By: driazati

Differential Revision: D15703060

fbshipit-source-id: 6082e8b482100bb7b0cd9ae69738f1273e626171
2019-06-06 15:52:45 -07:00
e87f77def6 Fix typo (#21426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21426

-

Differential Revision: D15679789

fbshipit-source-id: 5fd448e66af159fd79883aa874065424ec9694ad
2019-06-06 15:44:16 -07:00
d714abf597 Deprecate torch::jit::RegisterOperators (#21368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21368

-

Differential Revision: D15629687

fbshipit-source-id: 2f87f18be65552f3eb3f4c945d7f19ba4bae0eb8
2019-06-06 15:44:12 -07:00
cb2ec07fa2 ReshapeOp supports empty tensor (#21230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21230

tsia; we support empty tensor with this diff for reshape operator

Reviewed By: jerryzh168

Differential Revision: D15583356

fbshipit-source-id: 6d44c04e95ca3546509bfb12102e29c878f9a7c7
2019-06-06 15:02:11 -07:00
b161832f10 support ceil mode by padding changes (#21310)
Summary:
Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion https://github.com/pytorch/pytorch/pull/19205#discussion_r276903751).

To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size.

Strengthened the test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21310

Reviewed By: bddppq

Differential Revision: D15611664

Pulled By: akyrola

fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20
2019-06-06 14:47:35 -07:00
80a083ef92 Remove unneeded headers (#21393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21393

Result of splitting the base diff. We moved a header from src/* to include/fbgemm/*

Reviewed By: jianyuh

Differential Revision: D15635188

fbshipit-source-id: ad7d0ddba964ff1cb8b2e33f5f98e457a4d2eac9
2019-06-06 14:23:54 -07:00
8a9ea55b25 Add autograd for to_sparse. (#20458)
Summary:
https://github.com/pytorch/pytorch/issues/18111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20458

Differential Revision: D15699732

Pulled By: nairbv

fbshipit-source-id: f7a5424c1f1d3b0e4eba0d503d75ae8a18ef7ff4
2019-06-06 14:23:51 -07:00
87d10d49f4 Bilinear Upsampling increased throughput (#19306)
Summary:
changed `UpsampleBilinearKernel` s.t. the throughput increased 40~50%.

I tested locally with my local test code -- **not pytorch's provided test code** -- because I am having a build problem ( which I made an issue about [here](https://github.com/pytorch/pytorch/issues/19184)). I tested with various tensor sizes and across all the sizes, it should a significant increase in throughput.

1. added `__restrict__`
2. instead of launch as many threads as there are output elements, I launched only `output_height * output_width` may threads and had each thread iterate through the channel and batch dimension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19306

Differential Revision: D15701840

Pulled By: ezyang

fbshipit-source-id: 53c54d4f4e4a28b58ecc7d7ae6b864cbfc760e27
2019-06-06 13:58:57 -07:00
c5d5d45f40 Fix numerically instability of SigmoidTransform (#19802)
Summary:
fix #18254 for numerically instability of `SigmoidTransform`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19802

Differential Revision: D15701837

Pulled By: ezyang

fbshipit-source-id: fe6c755c523487c8bbdcc3bfb8455801617c70a4
2019-06-06 13:58:53 -07:00
f8cab38578 Address precision matrix instability of MVN distribution (#21366)
Summary:
Currently, when the input of MVN is precision matrix, we take inverse to convert the result to covariance matrix. This, however, will easily make the covariance matrix not positive definite, hence will trigger a cholesky error.

For example,
```
import torch
torch.manual_seed(0)
x = torch.randn(10)
P = torch.exp(-(x - x.unsqueeze(-1)) ** 2)
torch.distributions.MultivariateNormal(loc=torch.ones(10), precision_matrix=P)
```
will trigger `RuntimeError: cholesky_cpu: U(8,8) is zero, singular U.`

This PR uses some math tricks ([ref](https://nbviewer.jupyter.org/gist/fehiepsi/5ef8e09e61604f10607380467eb82006#Precision-to-scale_tril)) to only take inverse of a triangular matrix, hence increase the stability.

cc fritzo, neerajprad , SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21366

Differential Revision: D15696972

Pulled By: ezyang

fbshipit-source-id: cec13f7dfdbd06dee94b8bed8ff0b3e720c7a188
2019-06-06 13:54:46 -07:00
vfn
8ece538a79 Addresses bad behavior with overridden optimizer.step by #20124 (#21460)
Summary:
This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276
and previously coded bad behaviour:
- a warning was raised all the times when lr schedulling is initialized

Now the code checks that:
- on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 )
- if optimizer's step is overridden -> raise once another warning to aware user about the new pattern:
`opt.step()` -> `lrs.step()` as we can not check this .

Now tests check that
- at initialization (`lrs = StepLR(...)`)there is no warnings
- if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised.

cc ezyang

PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460

Differential Revision: D15701776

Pulled By: ezyang

fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7
2019-06-06 13:54:42 -07:00
51d0da2802 Improve build docs and process for Windows (#21190)
Summary:
Fixes #21026.
1. Improve build docs for Windows
2. Change `BUILD_SHARED_LIBS=ON` for Caffe2 local builds
3. Change to out-source builds for LibTorch and Caffe2 (transferred to #21452)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21190

Differential Revision: D15695223

Pulled By: ezyang

fbshipit-source-id: 0ad69d7553a40fe627582c8e0dcf655f6f63bfdf
2019-06-06 13:46:52 -07:00
6fc702f384 Per-callback sampling (#21394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21394
ghimport-source-id: 2607c7b456031a1ddb19fabc3b6fe2585c276d76

Differential Revision: D15639723

Pulled By: ilia-cher

fbshipit-source-id: 938d02c1daf5bec5afa5d3cd021d2dae7e7160ce
2019-06-06 13:46:48 -07:00
bb788631ce Fix caffe2 windows CI for new Windows AMI (#21452)
Summary:
The alternative of #21410.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21452

Differential Revision: D15701767

Pulled By: ezyang

fbshipit-source-id: e65c1d6bfcc98e88460f4a57e5b99c2f395c0ceb
2019-06-06 13:46:45 -07:00
3feb40d602 pack_padded_sequence: Check for empty (zero-element) tensors (#21461)
Summary:
Fixes: #20529

Thank you, JamieCT for the bug report with reproducing script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21461

Differential Revision: D15696183

Pulled By: ezyang

fbshipit-source-id: a93cde2c924f8447563c64ce8a1cf75fcee60a01
2019-06-06 13:41:52 -07:00
3b6362d98e Remove NodeExecStats and AllocatorMemoryUsed (#21419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21419

Removed ```node_stats``` and unused imports

Reviewed By: orionr

Differential Revision: D15672824

fbshipit-source-id: 6167c80c081d925f02a1d279f3af0e1b8de66752
2019-06-06 13:35:52 -07:00
0a3fb45d3d allow passing Python built-in types as dtypes (#21215)
Summary:
Another simple bit of syntax that NumPy supports and we don't.

Support int, float, and bool.

```python
>>> torch.randn((2,3), dtype=float)
tensor([[-0.1752, -0.3240, -0.6148],
        [ 0.1861,  1.6472,  0.1687]], dtype=torch.float64)
```

A bit confusingly, Python's "float" actually means double, but nothing we can do about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215

Differential Revision: D15697012

Pulled By: umanwizard

fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246
2019-06-06 13:17:23 -07:00
b647804a55 Fix embedding bag nan output when input is empty (#21400)
Summary:
```
import torch

Embed = torch.nn.EmbeddingBag(100, 10, sparse=True)

print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0])))
print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0, 0])))
```

Before this fix:
```
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```

After this fix:
```
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21400

Differential Revision: D15643357

Pulled By: bddppq

fbshipit-source-id: 119eba38129dc0a3757c331304a18044714fcca5
2019-06-06 13:03:17 -07:00
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
8a2985eb05 Support recursive ModuleList / Sequential (#21306)
Summary:
Adds support for recursively compiling `nn.Sequential` and
`nn.ModuleList`. When either is used, it is converted to a
`jit._ConstModuleList` or `jit._ConstSequential` as necessary. Due to
this, we don't need to add it to `__constants__` since it's made
constant on demand.

This PR also moves the recursive script tests out to their own class
`TestRecursiveScript` (the added test is called `test_iterable_modules`)
](https://our.intern.facebook.com/intern/diff/15611738/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21306

Pulled By: driazati

Differential Revision: D15611738

fbshipit-source-id: fac52993990bd2dfad71d044c463a58a3759932a
2019-06-06 12:23:59 -07:00
2e37ab85af Enable bool support for several index methods (#21435)
Summary:
Enable bool tensors for these index methods:
- index_select
- index_copy
- put
- take
- index_fill

Tested via unit tests

TODO:
Enable index_add in a separate PR as it requires more "side" changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21435

Differential Revision: D15684964

Pulled By: izdeby

fbshipit-source-id: 48440e4d44873d70c4577e017dd0d8977e0fa15a
2019-06-06 12:14:01 -07:00
61cc03fb8d Make ScriptModule.training an attribute instead of a parameter (#21078)
Summary:
Redo of #19587
](https://our.intern.facebook.com/intern/diff/15560540/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21078

Pulled By: driazati

Differential Revision: D15560540

fbshipit-source-id: f415775d87c163f93b3bbdd5f87c9ff73f58b049
2019-06-06 12:06:49 -07:00
f1adddd1c6 Updated sum() logic to properly deal with bool tensor (#21421)
Summary:
`torch.tensor([True, False, True], dtype=torch.bool).sum()` should return **2** instead of **True** as it does now.

Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421

Differential Revision: D15674203

Pulled By: izdeby

fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74
2019-06-06 12:02:23 -07:00
b7b6b612a7 Fix C++ data parallel (#20910)
Summary:
Fixes #19540

CC nmerrill67

C++ data parallel was using Module.clone() to create module replicas on every destination device. However, clone() does not set up gradient edges to point from replicas to the original module. As a result, the gradient will not be aggregated into the original module. This commit fixes the the problem by manually setting gradient edges from every parameter X in every replica to the same parameter X in the original module.

## Failed Attempt

Initially I tried implementing what we did in `replicate.py`, which
1. create module replicas
2. use Python `Broadcast` autograd function to broadcast every parameter in the original module to all destination devices.
3. assign the broadcast result params to module replicas' `_parameters` dict.

This works in Python because derived module member field params (e.g., `Linear.weight`) and base module `_parameters` (e.g., `Linear._parameters['weight']`) are referencing the same parameter instance. Assigning one of them will apply to both. However, in C++, even though I can modify Module's `parameters_ `values and gradient edges to point to the broadcast source,  I cannot touch the weight and bias member fields in Linear, because replicate cannot (and should not) add special-case handlers to every different module. (See `Linear` [.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/modules/linear.h), [.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/modules/linear.cpp)) Although they initially point to the same `TensorImpl` instance, after assigning to `Module.parameters_['weight']`, it will be different from `Linear.weight`.

## Solution Options

gchanan and I had several discussions on this issue and figured two solutions to this problem.

### Option One [implemented in this PR]

Replicate the module in two steps:
1. call `Module.clone()` to create a module replica on every destination device.
2. manually setting gradient edges from every parameter in every replica to the same parameter in the original module.

* Pro: Does not need to change any existing module, and relatively easier to implement
* Con: It is a little hackish.

### Options Two

Implement a `Replicatable` class (similar to `Cloneable`), and make it a friend class of `Module`. For more details see `Note [Replicating Modules]` in the code change.

* Pro: Maybe this aligns more with our existing approach implemented in `Cloneable`?
* Con: Require changes to every existing module.

I am inclined to go with option one, because `replicate` will only be used on data parallel. I feel it is too big an overkill if we have to change all existing module implementations due to a data parallel requirement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20910

Differential Revision: D15556426

Pulled By: mrshenli

fbshipit-source-id: aa836290ec657b32742e2bea80bd0ac2404ef3b0
2019-06-06 11:57:31 -07:00
da4f3629c5 Add missing shebangs to Python files with executable permissions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21305

Differential Revision: D15613078

Pulled By: ezyang

fbshipit-source-id: 1fedf4368d65db406b617a51402ee8a20968aff7
2019-06-06 10:53:40 -07:00
52596164d4 Fix 32-bit env. model load issue (#20900)
Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20900

Differential Revision: D15696709

Pulled By: ezyang

fbshipit-source-id: 37a81f05f235d3b9fc6244e12d3320ced3d1465e
2019-06-06 10:30:29 -07:00
f891b4338a Test the exceptions raised by isfinite and isinf (#21168)
Summary:
Following up ef1fdc27a3779586efad631d698cec2d6d19503f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168

Differential Revision: D15696615

Pulled By: ezyang

fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2
2019-06-06 10:30:26 -07:00
dffff3218b Improves NVRTC Error messages (#21174)
Summary:
Current versions of NVRTC incorrectly map error code 7 to the error string "NVRTC unknown error." This update maps error code 7 to the correct string explicitly in PyTorch. See the documentation at: https://docs.nvidia.com/cuda/nvrtc/index.html#group__error.

This may give us a better idea of the source of NVRTC errors that some community members, like Uber, have reported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21174

Differential Revision: D15696593

Pulled By: ezyang

fbshipit-source-id: f5c7b5876c07b311ab5f2d7c8e375e93273912c6
2019-06-06 10:30:22 -07:00
6615797837 Add derivative for QR decomposition (#21274)
Summary:
Changelog:
- Implement derivative for QR decomposition for tall and square matrices i.e., num rows >= num cols
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21274

Differential Revision: D15696506

Pulled By: ezyang

fbshipit-source-id: 1c77bb8369818112c84139360f6e2650f92bf2fd
2019-06-06 10:11:21 -07:00
26bcadcc61 Gumbel-Softmax Arxiv Docs Link Fix (#21376)
Summary:
Links separated #20297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21376

Differential Revision: D15696413

Pulled By: ezyang

fbshipit-source-id: 513bd430e41c109aa2d0fbaa9a242acb2a12059b
2019-06-06 10:11:18 -07:00
ee15ad1bd6 "CharTensor" numpy conversion is supported now (#21458)
Summary:
Fixed #21269 by removed the the expected `ValueError` when converting a tensor to a Numpy `int8` array in the Numba interoperability test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21458

Differential Revision: D15696363

Pulled By: ezyang

fbshipit-source-id: f4ee9910173aab0b90a757e75c35925b026d1cc4
2019-06-06 10:06:41 -07:00
c8083e0292 Include named_any.h in modules.h (#21437)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21437

Differential Revision: D15684880

Pulled By: yf225

fbshipit-source-id: db23c7e4e0f62d22b0b6c18f15420c3bb66af366
2019-06-06 09:57:33 -07:00
856e3518c5 Parallelize eye() on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21077

Differential Revision: D15695329

Pulled By: ezyang

fbshipit-source-id: 9841777238dac7c08cde2db3cd9401853f633af3
2019-06-06 09:52:13 -07:00
ae18f8e761 Fix latex formular error about *normal (#21000)
Summary:
issue:
 https://github.com/pytorch/pytorch/issues/20903
the latex abort norm should be `\mathcal{N}(\text{mean}, \text{std}^2)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21000

Differential Revision: D15695335

Pulled By: ezyang

fbshipit-source-id: 34dcca0acb20c297f876287e081cd44d11a3e516
2019-06-06 08:47:42 -07:00
4e02d3c0a1 insert default parameters in binary cross entropy with logits (#21336)
Summary:
I inserted default weight and reduction params in binary_cross_entropy_with_logits function . These default params exist in python and binary_cross_entropy function in cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21336

Differential Revision: D15628917

Pulled By: ezyang

fbshipit-source-id: 38e5f53851125238842df1bd71cb6149c8603be1
2019-06-06 08:47:39 -07:00
d50dca4075 fix two typos: "a the" => "the"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20437

Differential Revision: D15321243

Pulled By: zou3519

fbshipit-source-id: 6e1690132769b8ef2fd679cb5898c378efac2112
2019-06-06 08:42:57 -07:00
63a55d4932 Support gather export with OneHot + Mul (#21235)
Summary:
This could serve as a alternative solution to export ```torch.gather``` before something similar goes into ONNX spec. The exported model is verified to be correct against onnxruntime backend. We weren't able to test against Caffe2 backend because it doesn't seem to support OneHot opset9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21235

Differential Revision: D15613039

Pulled By: houseroad

fbshipit-source-id: 7fc097f85235c071474730233ede7d83074c347f
2019-06-06 08:35:28 -07:00
240d62fbaa Move redundant code that checks NumPy during build to a helper module and add an option to disable building with NumPy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21417

Reviewed By: ezyang

Differential Revision: D15694357

Pulled By: fmassa

fbshipit-source-id: bc1bda23349ba4531f19619fa4adecb846225c20
2019-06-06 08:15:19 -07:00
a68d2e817b Kill apt-get even harder, and before we purge. (#21464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21464
ghimport-source-id: 81beb6ef39e3d412e755f0ae06c9186d8e11a8bc

Differential Revision: D15694828

Pulled By: ezyang

fbshipit-source-id: 0791fe017a1318425528795f576fb96e54b14dae
2019-06-06 07:49:39 -07:00
12528990f8 change output of ai_pep_format (#21440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440

This diff modifies the output format when ai_pep_format is enabled.

Reviewed By: hl475

Differential Revision: D15681042

fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e
2019-06-05 21:54:24 -07:00
4e679e30a8 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 060bf204b6400515bbc8f1b9b3ef34bef9d32560
2019-06-05 20:06:53 -07:00
7e300fbb21 Added degrees, radians, ldexp (#21131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21131
ghimport-source-id: 62b9cb71a17f9c9a7999a6e33c2d8b840ce097ff

Differential Revision: D15563184

Pulled By: Chillee

fbshipit-source-id: e2c47fb9f9c0fe9f039cfd001c5e6d5b455e034c
2019-06-05 19:17:02 -07:00
bd2d318e23 Modify quant-dequant node api to take module object and method name (#21407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21407

Modify api takes module object and method whose graph is
instrumented to insert the quant dequant nodes

Differential Revision: D15651624

fbshipit-source-id: 1ff1ae446c986184c724504c8fdd0dcd43864016
2019-06-05 19:08:56 -07:00
505ae5f51d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 9ab609a16522eb233f128df024903eb880742224
2019-06-05 19:03:34 -07:00
f8202d85a0 Added frexp, isinf, isnan, isfinite (#21130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21130
ghimport-source-id: fa771086da13deed232e142db6f940439bcc67bc

Differential Revision: D15563186

Pulled By: Chillee

fbshipit-source-id: fe33dbc454af2a9626ad810a5304300eb17d7530
2019-06-05 18:46:39 -07:00
26db46b324 change the epilogue of SLS to match the simd section (#21439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21439

this bug got exposed after testing accuracy on shapes not multiples of 8

Reviewed By: jspark1105

Differential Revision: D15684759

fbshipit-source-id: 2950f2bd87ee1d8e539148285a14c755f606b3a7
2019-06-05 18:41:55 -07:00
7e6d932208 Make strtod_c compatible with different gcc abi (#21293)
Summary:
We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace:
```
#0  0x00007fec10160207 in raise () from /lib64/libc.so.6
#1  0x00007fec101618f8 in abort () from /lib64/libc.so.6
#2  0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6
#4  0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6
#7  0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) ()
   from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
#8  0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
```

We are suspecting this line will get compiled to gcc abi dependent symbol:
```
char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point();
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21293

Differential Revision: D15609910

Pulled By: bddppq

fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1
2019-06-05 18:10:09 -07:00
e07d94558d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 6608ac4be8c338ff5a8116b275bbad487d317972
2019-06-05 16:28:40 -07:00
991c557270 Fix an incorrect implementation of celu (#21213)
Summary:
Fixing an incorrect implementation of the CELU activation function. The existing implementation works by a chance combination of errors that seem to cancel each other out. This change makes the code more readable, aligns the parameter names correctly, and is consistent with the cuda implementation.

I came across this issue while working on version counters... I attempted to specify a gradient in derivatives.yaml for CELU due to a failed test, but the derivative couldn't be specified correctly without fixing the celu implementation.
https://github.com/pytorch/pytorch/pull/20612
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21213

Differential Revision: D15678823

Pulled By: nairbv

fbshipit-source-id: 29fa76b173a66c2c44ed2e0b7959e77f95d19c43
2019-06-05 15:45:50 -07:00
335869e833 Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21425

Differential Revision: D15678631

fbshipit-source-id: 3da2c694de13ad03019e2b3ff451e762199265bb
2019-06-05 15:40:29 -07:00
6b9f46b2d0 Fix "warning: missing return statement at end of non-void function" (#21424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21424

Fixes #21418

Differential Revision: D15676140

fbshipit-source-id: cfadce164c6cfefb16f8bf7bc09529ba8b910769
2019-06-05 15:30:54 -07:00
0e3c4a054b Remove curandStateMTGP32 usage (#21301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21301
ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79

Differential Revision: D15632929

Pulled By: ezyang

fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30
2019-06-05 14:06:25 -07:00
793b302653 ensure version_counter gets incremented for non-differentiable outputs (#20612)
Summary:
issue:
https://github.com/pytorch/pytorch/issues/14571

To reproduce I:
1) added these lines to derivatives.yaml:
```
- name: add_(Tensor self, Scalar other, Scalar alpha)
  output_differentiability: [False, False, False]
- name: add_(Tensor self, Tensor other, Scalar alpha)
  output_differentiability: [False, False, False]
```

2) Ran this code:
```
import torch

scalar = torch.tensor(5)
var1 = torch.randn(4,2,requires_grad=True)
var2 = var1.detach().requires_grad_()
output1 = var1 * scalar
output2 = var2 * scalar
output1.sum().backward()
scalar.add_(5, 1)
output2.sum().backward()
print(var1.grad)
print(var2.grad)
```
Observed modified var2.grad in the output:
```
tensor([[5., 5.],
        [5., 5.],
        [5., 5.],
        [5., 5.]])
tensor([[10., 10.],
        [10., 10.],
        [10., 10.],
        [10., 10.]])
```

After making this change, re-running the above code produces the expected error:
```
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    output2.sum().backward()
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.LongTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20612

Differential Revision: D15661958

Pulled By: nairbv

fbshipit-source-id: af3373135a1a589a635b49e0ff62622a210258e6
2019-06-05 13:36:05 -07:00
8215f44405 Revert D15660575: [pytorch][PR] Fix Caffe2 CI job for new Windows AMI
Differential Revision:
D15660575

Original commit changeset: cfc0f325b0fb

fbshipit-source-id: cb7d87605c9019b9e563bf5ce4325a919263938e
2019-06-05 12:15:34 -07:00
98e3aaeb78 Adding support for exporting models with variable length input/output to ONNX (#20034)
Summary:
Proposal: https://gist.github.com/pk-g/cc45ff8c5891b5699bffd883a87f13ae?fbclid=IwAR17bRA7Fks4APoZRYiNa93UkLdoFCpRDuIYEx0lNVyPTyaDAShbEnytiQo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20034

Reviewed By: zrphercule

Differential Revision: D15606731

Pulled By: houseroad

fbshipit-source-id: 247251e07b4893cb3f7a1287948b1f57aadb7851
2019-06-05 12:02:23 -07:00
ba2bdf8d0e Added factorial (#21129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21129
ghimport-source-id: a676dd33c4d0b2b60c3e9ce725bda0abeb22375f

Differential Revision: D15563183

Pulled By: Chillee

fbshipit-source-id: 641cae34c181a16c772665f5f7ed01c96a67ea9c
2019-06-05 11:51:03 -07:00
7a1c9076ac Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen
Differential Revision:
D15632268

Original commit changeset: 8e337e8dc17a

fbshipit-source-id: de98b1af51a53105c97fb076b09efb6fa8eb08a7
2019-06-05 11:41:50 -07:00
f172fadd80 Make warnings be UserWarnings with source file info (#21231)
Summary:
Redo of #15201, this makes `warnings.warn` calls match their Python
behavior
](https://our.intern.facebook.com/intern/diff/15605266/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21231

Pulled By: driazati

Differential Revision: D15605266

fbshipit-source-id: 5931fd720b0c40d52dd492fbd1f5a76abefaab5c
2019-06-05 11:09:11 -07:00
3068a945ce Retry awscli install. (#21383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21383
ghimport-source-id: e518a4f8bf498694b6d504b8a695c5f11e7c681f

Differential Revision: D15664738

Pulled By: ezyang

fbshipit-source-id: d645db505de906e65c057f0d6964b5ce0fb6ff52
2019-06-05 10:38:01 -07:00
bf0e3b62ae Minor preparational JIT changes (#21096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21096
ghimport-source-id: 169f8b4b70cd77b0f9b07cca81d2b4cde2c46456

Reviewed By: ezyang

Differential Revision: D15546176

Pulled By: izdeby

fbshipit-source-id: cdd0a1c87263955eef9d3174ec2f36d1d2935f48
2019-06-05 10:30:01 -07:00
c15254d4ab Expunge some more deprecated uses of AT_CHECK.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21194

Differential Revision: D15576898

fbshipit-source-id: f030195f5bffe0027d4081aece57e2852aaf9ecb
2019-06-05 10:25:25 -07:00
ec7dc52e60 Fix a bug in qconv (#21294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21294

Returned output tensor wasn't of correct shape

Reviewed By: zafartahirov

Differential Revision: D15605081

fbshipit-source-id: f79a9d5b93b8b97e79c09411b9dc681987a61e44
2019-06-05 10:19:31 -07:00
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
22ddddfb80 Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen (#19465)
Summary:
This PR is a continuation of #15310, which itself is a continuation of #14845, #14941, & #15293. It should be synced up with the pytorch/master branch as of yesterday.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19465

Differential Revision: D15632268

Pulled By: ezyang

fbshipit-source-id: 8e337e8dc17ac31439935ccb530a7caf77f960e6
2019-06-05 10:13:58 -07:00
6874c4058d Add type annotation to stft (#21302)
Summary:
We want to be able to call stft from a torchscript which requires that stft have a type annotation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21302

Differential Revision: D15607973

Pulled By: cpuhrsch

fbshipit-source-id: c4a5c09cdaafe7e81cf487a3ad216d1b03464a21
2019-06-05 10:06:48 -07:00
7c6f2836d4 Fix Caffe2 CI job for new Windows AMI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21410

Differential Revision: D15660575

Pulled By: ezyang

fbshipit-source-id: cfc0f325b0fbc22282686a4d12c7a53236d973d4
2019-06-05 06:35:39 -07:00
6251c563eb Add CUDA support for _dirichlet_grad (#21191)
Summary:
Changelog:
- Migrate _dirichlet_grad implementation from TH to ATen
- Add CUDA support for _dirichlet_grad

Closes #11030.
Closes #15773.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21191

Differential Revision: D15660330

Pulled By: ezyang

fbshipit-source-id: c8ad5b80366e5348139ce9be10400f22fc430344
2019-06-05 06:35:35 -07:00
b460a1987e Per discussion at https://github.com/pytorch/pytorch/pull/21244, fix bugs in (#21392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21392

as discussed at https://github.com/pytorch/pytorch/pull/21244, we
found some values in log_beta are not properly initialized. This diff will 1)
initialize all log_beta to -inf; 2) fix a tricky compare condition; 3) zero all
the gradient elements corresponding to padding to zero.

Offline experiments show that this diff can fix previous seen NaN loss.

Differential Revision: D15637977

fbshipit-source-id: 477008a5e11aae946bd2aa401ab7e0c513421af0
2019-06-05 00:28:45 -07:00
42b2f56124 Fixing race condition at Module::forward method (#21398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21398

Module::forward method calls find_method() function potentially in multiple threads.
Internally it calls find_offset() method and reads dict_ object.
If the correspondent name is not in a dictionary thread call insert() method and modifies dict_ object.
At the same time when first thread modifies dict_ object another thread can enter forward()->find_method()->find_offset() path
and access dict_ object for reading while it have been modified -> crash.
Moved mutex protection up to protect both calls find_offset() and insert().
Consider to use C++ 17 shared_mutex locking object instead of recursive_mutex object.

Reviewed By: bddppq

Differential Revision: D15638942

fbshipit-source-id: ca6a453448302a0b3666c87724755fa4e9ce242f
2019-06-04 23:03:25 -07:00
95eb9339c1 Adds CUDA C++11 and Profiling Notes (#21386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21386
ghimport-source-id: 9430c7640b90d9add38d9bf2f1bd0c8f62b7f239

Differential Revision: D15640102

Pulled By: ezyang

fbshipit-source-id: 98a5efdea9b1de05207ebd3624cb20acda9fe96b
2019-06-04 19:18:55 -07:00
eadac840f7 Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (#21300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21300
ghimport-source-id: c314c28cb693b554d6f24de235c11ba24ed6bf61

Reviewed By: jerryzh168

Differential Revision: D15632935

Pulled By: ezyang

fbshipit-source-id: 9bb24f17d78151bf50942905c967bdcfe1ff00cb
2019-06-04 19:13:57 -07:00
c82bf8ef10 Move THCTensor_(lognormal) to ATen (#21299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21299
ghimport-source-id: 2c63f289f02087f023feda8bff6b90ed49737889

Reviewed By: jerryzh168

Differential Revision: D15632930

Pulled By: ezyang

fbshipit-source-id: 85c17cdca486b46942c5b500e4fd4d95bb5657f9
2019-06-04 19:13:53 -07:00
4671bed0f3 Move THCTensor_(geometric) to ATen (#21298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21298
ghimport-source-id: c0e2604aa25cc5da2b67293cafd88c2e77e476f9

Reviewed By: jerryzh168

Differential Revision: D15632932

Pulled By: ezyang

fbshipit-source-id: 248ca4b56967116f27174cda44893ecfe4ca9a99
2019-06-04 19:13:50 -07:00
d341bcb3dc Move THCTensor_(exponential) to ATen (#21297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21297
ghimport-source-id: 5f45154e714ab44dec961dabf1c64e54aaa063a2

Reviewed By: jerryzh168

Differential Revision: D15632931

Pulled By: ezyang

fbshipit-source-id: 0367eec0a9ef6812b1b3ab7597817ee40a011bb8
2019-06-04 19:13:46 -07:00
92b76df8f6 Finished trigonometric functions (#21128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21128
ghimport-source-id: d566de103f2aefc59e6423181de325d8f42620f4

Differential Revision: D15563190

Pulled By: Chillee

fbshipit-source-id: ad2e09cac5c7dae9978a7bd61098c2828620cdc4
2019-06-04 17:59:09 -07:00
7309cb60fd Finished the high-priority functions (#21127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21127
ghimport-source-id: 609021958e76ea01299f62b9491038005e6b4f27

Differential Revision: D15563189

Pulled By: Chillee

fbshipit-source-id: 5c6155a69fff7447689ef012ea303dc358d50486
2019-06-04 17:59:05 -07:00
622588d8fd Added remainder of high-priority trigonometric math ops (#21126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21126
ghimport-source-id: e310f3cfb28436b99ad038691887ca82068ca2c9

Differential Revision: D15563191

Pulled By: Chillee

fbshipit-source-id: 7135ddd5bc9eebc818694fa8b67eaade907fa8a1
2019-06-04 17:59:02 -07:00
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
ba08cf336d Reorganize cmake related functions to tools/setup_helpers/cmake.py (#21367)
Summary:
Currently tools/build_pytorch_libs.py looks quite convoluted. This commit reorgnizes cmake related functions to a separate file to make the code clearer.

 ---

This is hopefully helpful for further contribution for better integration with cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21367

Differential Revision: D15636991

Pulled By: soumith

fbshipit-source-id: 44d76e4e77aec0ce33cb32962b6a79a7f82785da
2019-06-04 17:01:38 -07:00
6ee9e87ff5 Back out "[pytorch][PR] don't materialize constants" (#21374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21374

Original commit changeset: d5609b0a5697

Not materializing constants slows compilation time significantly

Differential Revision: D15630632

fbshipit-source-id: c6b5026ee6eae2ef290628f350f49a657495bd5d
2019-06-04 16:32:09 -07:00
45d2305732 fix incorrect default on Graph::toString (#21370)
Summary:
This default was incorrect and made printing in python not print file:line:col

This wasn't caught because FileCheck internally uses operator<< to print the graph, which has `true` hardcoded as the value. I've added more comprehensive tests to catch this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21370

Differential Revision: D15631135

Pulled By: jamesr66a

fbshipit-source-id: c809e06fff4f0174eefeb89062024384b4944ef7
2019-06-04 16:15:38 -07:00
0dc7286e15 Better error message when trying to instantiate NamedTuple
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21309

Differential Revision: D15630564

Pulled By: jamesr66a

fbshipit-source-id: 82753feee65bbe6c8b2f827cc2664628f3b9f4a3
2019-06-04 16:11:05 -07:00
d348d6405c cdist: pairwise distances between two sets of tensors with batch mode (#20934)
Summary:
Batch implementation for cdist function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934

Differential Revision: D15609458

Pulled By: ifedan

fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7
2019-06-04 15:52:52 -07:00
6a3ebdbbc5 Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380)
Summary:
| | Before | After
------------ | ------------ | -------------
Binary builds | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915716-77a5f900-86d6-11e9-8a39-7ef587e56281.png) | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915620-4a594b00-86d6-11e9-9e5f-95cf085e6fc8.png) |
Smoke tests | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915728-812f6100-86d6-11e9-80c1-182242fdfd0e.png) | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915686-68bf4680-86d6-11e9-8cd2-e65a47384b4f.png) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21380

Differential Revision: D15634729

Pulled By: kostmo

fbshipit-source-id: aef44b0e5b9997be55d93969ab85effca68c5c88
2019-06-04 15:48:47 -07:00
ca32563999 add suggestion to use lld to CONTRIBUTING.md (#21334)
Summary:
I found this significantly speeds up incremental builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21334

Differential Revision: D15632994

Pulled By: suo

fbshipit-source-id: bb4af90f4400bffa90d168d82ff30fece5e3835c
2019-06-04 15:40:49 -07:00
4940e41d16 Fix mkl-dnn tautological compare error (#21371)
Summary:
```
../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare]
        virtual pd_t *clone() const override { return new pd_t(*this); } virtual status_t create_primitive(primitive_t **primitive, const primitive_at_t *inputs, const primitive_t **outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu
t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(*primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v
irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic
? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); };
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371

Differential Revision: D15631392

Pulled By: bddppq

fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298
2019-06-04 15:27:07 -07:00
403ca41142 make analyzeConservative more conservative (#21227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21227
ghimport-source-id: cac97ba20cb020f3edc4e83e7641201f0826f40a

Reviewed By: jamesr66a

Differential Revision: D15592316

Pulled By: suo

fbshipit-source-id: b311f73a5d81d6d0b0331678b6a625e446588ebd
2019-06-04 15:09:46 -07:00
0dbae7eddb cleanup templated implementation of mayAlias (#21224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21224
ghimport-source-id: 6ec4ea015043bbddd031f92c5149e8313f21977d

Reviewed By: jamesr66a

Differential Revision: D15592318

Pulled By: suo

fbshipit-source-id: 47c52342f2a1360752306908e2f394ef52e47504
2019-06-04 15:09:43 -07:00
adf6f6c442 use memory locations instead of values for working set (#21223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21223
ghimport-source-id: 82800a465a4273e185bfffe2f67835b2f7f3a519

Reviewed By: jamesr66a

Differential Revision: D15592317

Pulled By: suo

fbshipit-source-id: 5e87c803a928b61c923200888a3ff1ac7b2523e0
2019-06-04 15:09:39 -07:00
f330168570 remove multisets from work set (#21222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21222
ghimport-source-id: 0eb6daa92bef68a35bef918c3f3a791b401812aa

Reviewed By: jamesr66a

Differential Revision: D15592319

Pulled By: suo

fbshipit-source-id: 895d26538ba1edcd73b83147a68b7e4069084230
2019-06-04 15:09:36 -07:00
df0b83654a cleanups to alias analysis (#21221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21221
ghimport-source-id: 778e7317bbe874d35a903d89af5e0bc9721c8680

Reviewed By: jamesr66a

Differential Revision: D15592313

Pulled By: suo

fbshipit-source-id: d6f6d2be8cd80b40dd26d0bb3be30f074e356105
2019-06-04 15:09:33 -07:00
77c2f5dd75 fix copyright notice in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21372

Differential Revision: D15631889

Pulled By: umanwizard

fbshipit-source-id: cf764432c27cb1b01d8137ed60ec7de361450d0e
2019-06-04 14:53:45 -07:00
57f932a638 Enable 'empty' function for mkldnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21184

Differential Revision: D15625296

Pulled By: bddppq

fbshipit-source-id: 47d26798bcf48e227ffd813f299959a7b8993641
2019-06-04 14:16:13 -07:00
b869a3b4ac add new ops to benchmark_all_test (#21365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365

This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary

Reviewed By: hl475

Differential Revision: D15627328

fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940
2019-06-04 13:54:26 -07:00
2ed6f017ed Added better tests for math ops and unified them (#21125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21125
ghimport-source-id: 2a576b563208ce3d83e6771643e20d24bc72af86

Differential Revision: D15563188

Pulled By: Chillee

fbshipit-source-id: 0e77471729f715063d6bee075d2fc65f8db8b6c3
2019-06-04 13:15:54 -07:00
6938de8851 made floor/ceil return ints (#21124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21124
ghimport-source-id: e3e45bd50c9af1ee03fd58f2f4d631ce23d9612e

Differential Revision: D15563187

Pulled By: Chillee

fbshipit-source-id: 6504a41da883a8287d64db20d40cf958edb7404c
2019-06-04 10:32:16 -07:00
87690d2b77 Move THCTensor_(cauchy) to ATen (#21289)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
cauchy, size, elements 65536 forward 4.980564117431641e-06 bandwidth (GB/s) 52.63339529803734
cauchy, size, elements 131072 forward 6.232261657714844e-06 bandwidth (GB/s) 84.12483762631982
cauchy, size, elements 262144 forward 9.548664093017577e-06 bandwidth (GB/s) 109.81389540833959
cauchy, size, elements 524288 forward 1.59454345703125e-05 bandwidth (GB/s) 131.52052963827754
cauchy, size, elements 1048576 forward 2.86865234375e-05 bandwidth (GB/s) 146.21165262978724
cauchy, size, elements 2097152 forward 5.4748058319091796e-05 bandwidth (GB/s) 153.2220184158516
cauchy, size, elements 4194304 forward 0.00010075807571411133 bandwidth (GB/s) 166.50988897012377
cauchy, size, elements 8388608 forward 0.0001935744285583496 bandwidth (GB/s) 173.34124269355965
cauchy, size, elements 16777216 forward 0.00038077831268310545 bandwidth (GB/s) 176.24129779641603
cauchy, size, elements 33554432 forward 0.0006851387023925781 bandwidth (GB/s) 195.8986224705994
```
#### After:
```
cauchy, size, elements 65536 forward 6.077289581298828e-06 bandwidth (GB/s) 43.13501874366419
cauchy, size, elements 131072 forward 6.2131881713867184e-06 bandwidth (GB/s) 84.38308731972373
cauchy, size, elements 262144 forward 6.46829605102539e-06 bandwidth (GB/s) 162.11008150033175
cauchy, size, elements 524288 forward 6.8783760070800785e-06 bandwidth (GB/s) 304.8905726935182
cauchy, size, elements 1048576 forward 9.505748748779296e-06 bandwidth (GB/s) 441.23867681003264
cauchy, size, elements 2097152 forward 1.5070438385009766e-05 bandwidth (GB/s) 556.6266744001266
cauchy, size, elements 4194304 forward 2.4406909942626954e-05 bandwidth (GB/s) 687.396152951685
cauchy, size, elements 8388608 forward 4.6243667602539064e-05 bandwidth (GB/s) 725.6005792706125
cauchy, size, elements 16777216 forward 9.100198745727539e-05 bandwidth (GB/s) 737.4439380404413
cauchy, size, elements 33554432 forward 0.00017449140548706055 bandwidth (GB/s) 769.1939188944922
```
### Double Type
#### Before:
```
cauchy, size, elements 65536 forward 4.885196685791015e-06 bandwidth (GB/s) 53.660889593753055
cauchy, size, elements 131072 forward 6.229877471923828e-06 bandwidth (GB/s) 84.15703235943361
cauchy, size, elements 262144 forward 9.605884552001953e-06 bandwidth (GB/s) 109.15975455706132
cauchy, size, elements 524288 forward 1.5976428985595704e-05 bandwidth (GB/s) 131.26537863315923
cauchy, size, elements 1048576 forward 2.9621124267578124e-05 bandwidth (GB/s) 141.59840666786866
cauchy, size, elements 2097152 forward 5.5103302001953126e-05 bandwidth (GB/s) 152.23421637604707
cauchy, size, elements 4194304 forward 0.00010124444961547851 bandwidth (GB/s) 165.70998275677383
cauchy, size, elements 8388608 forward 0.0001944279670715332 bandwidth (GB/s) 172.58027487195184
cauchy, size, elements 16777216 forward 0.00034950494766235353 bandwidth (GB/s) 192.01119883668116
cauchy, size, elements 33554432 forward 0.0007002186775207519 bandwidth (GB/s) 191.67973135938277
```
#### After:
```
cauchy, size, elements 65536 forward 5.91278076171875e-06 bandwidth (GB/s) 44.33514628129032
cauchy, size, elements 131072 forward 6.234645843505859e-06 bandwidth (GB/s) 84.09266751632889
cauchy, size, elements 262144 forward 7.433891296386719e-06 bandwidth (GB/s) 141.05344807902503
cauchy, size, elements 524288 forward 1.1401176452636719e-05 bandwidth (GB/s) 183.94171941045587
cauchy, size, elements 1048576 forward 1.960039138793945e-05 bandwidth (GB/s) 213.99082890665372
cauchy, size, elements 2097152 forward 3.434181213378906e-05 bandwidth (GB/s) 244.26806504326578
cauchy, size, elements 4194304 forward 6.517410278320313e-05 bandwidth (GB/s) 257.4215107465028
cauchy, size, elements 8388608 forward 0.0001229524612426758 bandwidth (GB/s) 272.9057365819818
cauchy, size, elements 16777216 forward 0.00023239374160766602 bandwidth (GB/s) 288.77225150621814
cauchy, size, elements 33554432 forward 0.00046050310134887696 bandwidth (GB/s) 291.4589013773367
```
Resubmit of https://github.com/pytorch/pytorch/pull/20622
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21289

Differential Revision: D15622713

Pulled By: ezyang

fbshipit-source-id: abe8bd57794bd1c3a0b92395367a9653c5d0f2db
2019-06-04 08:24:42 -07:00
f9e746e9c8 Use "important" node to toggle whether or not to build on PR (#21308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21308
ghimport-source-id: 75fba872a658d8257a3f6ff9d9e33a320c6e523e

Differential Revision: D15621909

Pulled By: ezyang

fbshipit-source-id: 6d016d9ffdeb6414d70a1b48ed4766b5dc626353
2019-06-04 08:05:56 -07:00
1291d95e82 Revert "Fix the caffe2_gpu linkage with torch on Windows" (#21335)
Summary:
The original PR (#16071) is not working anymore after `caffe2` and `torch` is unified. What's more, It is making the binary big since the optimizing flag is disabled on a very big project(the `torch` library used to be small, but it now applies on the whole `caffe2` and `caffe2_gpu` library). We need to get it reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21335

Differential Revision: D15622163

Pulled By: soumith

fbshipit-source-id: 900bd400106d27a1512eed1e9f2288114f5f41bb
2019-06-04 07:49:49 -07:00
38d68ad803 Update randomness.rst (#21337)
Summary:
Following [this question on the forums](https://discuss.pytorch.org/t/reproducibility-and-performance/46504), I propose the following doc change. It clarifies that 'performance reduction' concerns the processing speed (and not the training accuracy).

Related website commit: https://github.com/pytorch/pytorch.github.io/pull/211
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21337

Differential Revision: D15622151

Pulled By: soumith

fbshipit-source-id: f0edeb20049f2ee715c400e7c57abb966864d621
2019-06-04 07:38:00 -07:00
ae42a11ab2 Make .circleci Conf class uses dataclasses; use types. (#21284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21284
ghimport-source-id: a628af85b7a085e15903168e957bed1e273d6636

Differential Revision: D15621908

Pulled By: ezyang

fbshipit-source-id: 64e2da8b96cdc1b53c0b314771d225eebf3d4b2d
2019-06-04 07:28:53 -07:00
25a6ff10f0 Add gtest for TensorIterator (#21253)
Summary:
This adds a regression test for the bug fix in #21236. Operations
involving CUDA tensors an CPU scalars should not copy the CPU scalar to
the device (because that is slow). They should instead "lift" the scalar
to a kernel parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21253

Reviewed By: bddppq

Differential Revision: D15604080

Pulled By: colesbury

fbshipit-source-id: c14ded5d584499eaa5ea83337ffc50278205f3d6
2019-06-04 07:23:42 -07:00
fecd5fa171 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 1a30f65182aead9145cb02fb544e2b7a25043f44
2019-06-04 07:23:39 -07:00
2ee2d78a29 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0256f2f4afaeaa2c16074dbca3b9a03ca434c7de
2019-06-03 23:36:35 -07:00
af4c24153f Honor OMP/MKL environment variables in AT_PARALLEL_NATIVE case (#21189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21189
ghimport-source-id: 4dcfaf04880346ff5ca79ca4dd11c94dcb645ce5

Differential Revision: D15574578

Pulled By: ilia-cher

fbshipit-source-id: 919fccb58b997f9a7add5486a79f9cd4cabaa1ee
2019-06-03 23:22:58 -07:00
f251416d70 Update fbgemm submodule (#21328)
Summary:
Fix master breakage

cc jianyuh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21328

Differential Revision: D15618649

Pulled By: bddppq

fbshipit-source-id: bce279705520dbd9c6df5fb794cdaeaed48a1a5a
2019-06-03 22:17:04 -07:00
113a27ee45 bake constants into the traced graph, get rid of getNestedValueTrace (#21046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21046
ghimport-source-id: 5cb3efb1896fbe42336e24c14fbf0bb5e646528e

Differential Revision: D15530991

Pulled By: wanchaol

fbshipit-source-id: b096ca5a1cdce496742b7f7e1de3ef8d21e9a8b0
2019-06-03 21:48:11 -07:00
cf356a342b Fix a bug in loop unrolling (#21239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21239
ghimport-source-id: 68256b752be795b32ab3f426848ed1d64fc5ea3e

Reviewed By: suo

Differential Revision: D15590901

Pulled By: zdevito

fbshipit-source-id: 8700aab723d4486fd20d3414df8160b36a3cc5da
2019-06-03 21:35:14 -07:00
6e657c5586 Add CallMethod, inline eagerly (#21116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21116
ghimport-source-id: 3c47e335dd80f52216e50e0a215cedc1862a9e78

Reviewed By: eellison

Differential Revision: D15552816

Pulled By: zdevito

fbshipit-source-id: 708fe87439d94117dca0a26c98f0917f497f718f
2019-06-03 21:35:11 -07:00
0f58d20fe4 Add quantized::fbgemm_linear_unpack operator for serialization (#97)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
2019-06-03 20:36:30 -07:00
4b576e5184 Do not hardcode build_dir in build_caffe2. Use the build_dir parameter.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21296

Differential Revision: D15613035

Pulled By: bddppq

fbshipit-source-id: 19313cbe0135581990d489f489d366d00962a3c3
2019-06-03 20:31:30 -07:00
702ba3d2fb build torch for libtorch mobile build (#21234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21234
ghimport-source-id: 8d401691a811991c79acf5e09e60389910910365

Differential Revision: D15616540

Pulled By: ljk53

fbshipit-source-id: 150e706630911bf14c55f47f4058eaada1edf1cc
2019-06-03 19:51:05 -07:00
82ceeaeca2 Add options to jit's operator constructor (#21315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21315
ghimport-source-id: 168ddecb333a8cb309e7b859683de9b077123205

Differential Revision: D15614506

Pulled By: bwasti

fbshipit-source-id: ae013a88e2069c38845b5b8ff805db96ab2c29e9
2019-06-03 19:30:22 -07:00
457c0f164e insert missing #pragma once in VariableTypeUtils.h (#21134)
Summary:
insert missing #pragma once keyword to prevent redefinition error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21134

Differential Revision: D15607673

Pulled By: li-roy

fbshipit-source-id: 0000fa18e3c55e5d36a64b171d6e85eb4bc211a1
2019-06-03 17:50:56 -07:00
1c5bd1fa65 Automatic update of fbcode/onnx to 5160f3ac3380302224998f1c95e111cd961c4bc5 (#21311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21311

Previous import was 9005291283e943f1a91da5f0acf218bc4e8eb2ca

Included changes:
- **[5160f3ac](https://github.com/onnx/onnx/commit/5160f3ac)**: Fix typo (#2069) <Takeshi Watanabe>
- **[ac218ac6](https://github.com/onnx/onnx/commit/ac218ac6)**: Add a missing step when upgrading an operator (#2071) <daquexian>
- **[5972eed9](https://github.com/onnx/onnx/commit/5972eed9)**: Clarify the axis/size in pads, strides, dilations (#2048) <daquexian>

Reviewed By: bddppq

Differential Revision: D15612734

fbshipit-source-id: 235dc3d49e4a6ccd4f43e6c2f648e87611d52697
2019-06-03 17:35:53 -07:00
02fd1878e3 Cast dropout to float in RNN (#21304)
Summary:
This solves the situation where, for example, someone instantiates LSTM with `dropout=0`, a Python integer. This works fine in Python, but JIT throws a type error because it expected float but got int

Resolves https://github.com/pytorch/lockdown/issues/65
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21304

Differential Revision: D15613153

Pulled By: jamesr66a

fbshipit-source-id: eabff76e3af3de0612583b37dbc5f7eab7e248a4
2019-06-03 16:59:04 -07:00
45de3ef6a7 Export feature length information for onnxifi operator (#21303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21303

Export feature length information for onnxifi operator
recommit for D15548138
disable caffe2_extract_feature_length_for_shape_inference by default
change LOG(INFO) to VLOG(4)
change LOG(WARNING) to LOG_EVERY_N(WARNING, 1000)

Reviewed By: yinghai, ipiszy

Differential Revision: D15608620

fbshipit-source-id: f96410366fe6bae954fea9d6b50ee72f4969d024
2019-06-03 16:53:06 -07:00
7c823312d3 hub doc improvements
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21307

Differential Revision: D15610441

Pulled By: ailzhang

fbshipit-source-id: 2b2a28ed808936cf7c93db31afc6b5ea888ab1b1
2019-06-03 16:29:39 -07:00
22865d4ce1 Add ONNX export support for torch.rand. (#20559)
Summary:
This PR adds support for torch.rand export in the PyTorch ONNX exporter. There are other generator ops that need to be supported for export and they will added in subsequent PRs. This op is needed with priority for a model on our end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20559

Differential Revision: D15379653

Pulled By: houseroad

fbshipit-source-id: d590db04a4cbb256c966f4010a9361ab8eb3ade3
2019-06-03 16:09:01 -07:00
7d84ca6e06 clean code to unify the logic to use fp16 by the optimizer engine (#20915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20915

Clean the unary processor code. Some question are added into the comments to seek suggestions.

Reviewed By: pjh5

Differential Revision: D15448502

fbshipit-source-id: ef0c45718c1a06187e3fe2e4e59b7f20c641d9c5
2019-06-03 15:03:35 -07:00
3004b397f0 change test_name to be globally unique value across tests (#21206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206

This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test.

Reviewed By: zheng-xq

Differential Revision: D15543508

fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8
2019-06-03 14:55:11 -07:00
ca80ec7c97 introduce a new intrace to add op [PT changes] (#21149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149

The diff modifies the interface for PyTorch operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433897

fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58
2019-06-03 14:55:08 -07:00
88d033f842 don't materialize constants (#21229)
Summary:
This doesn't affect anything because we run constant pooling, and in the case of Closures and Forks creates unnecessary closures over constants.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21229

Differential Revision: D15587764

Pulled By: eellison

fbshipit-source-id: d5609b0a5697071fab5050eb9e03876ab9ebb27a
2019-06-03 13:36:57 -07:00
9a41f44732 Improve ONNX Loop export (#20445)
Summary:
~~This is work in progress due to its dependency on multiple pending PRs.~~

- [x] ONNX: Relax constraint on subgraph input/output type & shape check. https://github.com/onnx/onnx/pull/2009
- [x] PyTorch: Add infra to test_pytorch_onnx_caffe2.py to test ScriptModule models. https://github.com/pytorch/pytorch/pull/20256

This PR should partially resolve https://github.com/pytorch/pytorch/issues/17531. However, ideally we shouldn't need to put cast(and reshape) node to help the conversion for loop condition.

- Added cast node for condition values before entering loop node. The ONNX spec only accepts Bool type, while in PyTorch if the condition value is an output from other node it could potentially have any integral type.
- Tidying up the exported ONNX loop subgraph input type & shape. According to ONNX spec, input "M" is exported as 0-d scalar tensor with type int64. input "Cond" is exported as incomplete tensor of type Bool without shape information. This is because through out the iteration, the rank of condition value is dynamic, either 0-d or 1-d, as long as it holds a single value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20445

Differential Revision: D15534188

Pulled By: houseroad

fbshipit-source-id: d174e778529def05ee666afeee4b8fb27786e320
2019-06-03 13:00:00 -07:00
mal
4980b8b95c Renaming member variables in engine.cpp/h (#21283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21283
ghimport-source-id: 360a138e420ace3cd4ca6ccbc761c8e68319440d

Differential Revision: D15607428

fbshipit-source-id: f8df6b42796a49c4d68fa8366b6a68d5715f6421
2019-06-03 12:54:50 -07:00
37fed9b24a Rename FC to Linear for the function name (#21268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21268

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15599232

fbshipit-source-id: 0046f933657f60807fdca7009676bfb052748d91
2019-06-03 11:55:35 -07:00
63b3c5a66a Replace AT_ASSERTM with TORCH_CHECK (#21267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21267

Replace AT_ASSERTM with TORCH_CHECK: AT_ASSERTM is deprecated.

Not sure when ```AT_ASSERT``` is dprecated with some new TORCH ASSERT function.

Reviewed By: zafartahirov

Differential Revision: D15599242

fbshipit-source-id: 23f21a9a23dc3c147dc817e6d278066d0832e08d
2019-06-03 11:47:14 -07:00
ad971a37d0 Improve performance of advanced indexing backward (#20557)
Summary:
This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100:
```
Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1
Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096
```
after:
```
Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096
Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096
```
Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported.
The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index.
There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls.
I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent.
This PR also fixes #20457 by casting indices to `self` backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557

Differential Revision: D15582434

Pulled By: ezyang

fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a
2019-06-03 11:38:53 -07:00
4ac732ed7a file:line for tracing (#21247)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/21217

This adds support for recording file and line information during tracing, by extracting the top Python interpreter frame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21247

Reviewed By: suo, driazati

Differential Revision: D15594553

Pulled By: jamesr66a

fbshipit-source-id: 72e1b3a46f1dabe3e83a608ec1a7d083bd1720f9
2019-06-03 11:13:49 -07:00
27d1daab45 Export ONNX Dropout for opset 10 (#20710)
Summary:
Remove Dropout from the opset 10 blacklist.
ONNX Dropout was modified in opset 10, but only the output "mask" was modified, which is not exported in pytorch opset 9. So we can still fallback on the opset 9 op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20710

Differential Revision: D15571248

Pulled By: houseroad

fbshipit-source-id: 15267eb63308a29a435261034b2f07324db1dea6
2019-06-03 10:59:56 -07:00
770089c2b8 math module support: isnan, asinh, atanh, cosh, sinh, and tanh (#19337)
Summary:
driazati and eellison Please review This PR is for #19026 .  Specifically, isnan, asinh, atanh, cosh, sinh, and tanh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19337

Differential Revision: D15580932

Pulled By: driazati

fbshipit-source-id: 38513fa59088e038264f9f6f0d6374a13a165589
2019-06-03 10:54:42 -07:00
fb72625267 Remove onnx export expects (#21238)
Summary:
We're not getting much from checking the export strings, and they are noisy and slow development. DIdn't realize they existed until now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21238

Differential Revision: D15604256

Pulled By: eellison

fbshipit-source-id: 488e9401231228cffe132dab99d519563fa63afc
2019-06-03 10:30:12 -07:00
2e59a0a646 add contiguous function type hint for tensor (#21285)
Summary:
Fixes #21261
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21285

Differential Revision: D15604270

Pulled By: soumith

fbshipit-source-id: c1c02348e338477a507052de0a1065cf42a99387
2019-06-03 10:17:03 -07:00
96667dfe41 Write add_scalars data in the same file (#21100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21100

Added multifile flag to write scalar data into separate files. This can slow down dashboard loading.

Reviewed By: orionr

Differential Revision: D15548913

fbshipit-source-id: dd39a7f76f93025d28f14babbf933e39860e6910
2019-06-03 09:53:27 -07:00
5b33698776 Fix build error in c10 on Windows (#21005)
Summary:
Targets https://github.com/pytorch/pytorch/issues/20635#issuecomment-496265510
Reference:
1. https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2015#microsoft-specific-predefined-macros
2. https://docs.microsoft.com/en-us/cpp/cpp/deprecated-cpp?view=vs-2019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21005

Differential Revision: D15543134

Pulled By: ezyang

fbshipit-source-id: f32709b018a7de651cb31575fc6117bfc4dd3bd1
2019-06-03 09:53:24 -07:00
155f767382 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#21287)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779
normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568
normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376
normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219
normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762
normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498
normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253
normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696
normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774
normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252
```
#### After:
```
normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385
normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916
normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706
normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386
normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293
normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237
normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946
normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927
normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856
normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966
```
### Double Type
#### Before:
```
normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461
normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661
normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304
normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668
normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678
normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186
normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557
normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402
normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674
normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564
```
#### After:
```
normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444
normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349
normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296
normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726
normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345
normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881
normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457
normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966
normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818
normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795
```

Resubmit of #20621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21287

Differential Revision: D15603695

Pulled By: ezyang

fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361
2019-06-03 09:45:02 -07:00
21113c2d36 EliminateGuards
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21070

Differential Revision: D15603561

Pulled By: Krovatkin

fbshipit-source-id: 03056688e8b99eddcb30d80cc20ab37ad3f13af2
2019-06-03 09:39:45 -07:00
c8539be962 Make is_contiguous checks generic in number of arguments (#21106)
Summary:
Loops.h has contains specializations for cases where all the inputs are
contiguous as well as cases where one input is a scalar and all other
inputs are contiguous.

Previously, there were separate checks for each functions that take
zero, one, or two input arguments. This is getting unwieldy, especially
once we add support for functions that take three inputs (#21025).

This requires the use of recursive templates (which have their own
downsides), but this seems better than the alternative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21106

Differential Revision: D15562430

Pulled By: colesbury

fbshipit-source-id: 5f19ab2212e16e29552887f4585c2b4a70309772
2019-06-03 09:19:19 -07:00
b159e0ce08 Significantly simplify the spawning of pytorch libs building process. (#21105)
Summary:
Instead of attempting to hardcode calls to "ninja" or "make", we should always let cmake do it. This better integrates build configurations (DEBUG or REL_WITH_DEB_INFO) and better handles the case in which the native build tool is not in PATH (cmake has some capacity to find them and has options for users to specify their locations).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21105

Differential Revision: D15602883

Pulled By: soumith

fbshipit-source-id: 32ac46d438af00e791defde6ae5ac21c437d0bb0
2019-06-03 08:28:19 -07:00
f62a006097 Retry Fix Python DataParallel RNN in no_grad mode (#21262)
Summary:
Retry #21197

The previous one failed because it uses some Python3 only syntax.

ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262

Differential Revision: D15598941

Pulled By: mrshenli

fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715
2019-06-03 08:04:35 -07:00
0c6efbd410 Fix gelu documents (#21265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21265

Fix gelu documents

Reviewed By: hl475

Differential Revision: D15598958

fbshipit-source-id: 483040069102daada705401c36c8990598142d3d
2019-06-02 20:17:56 -07:00
eaa3ba6587 Add autograd for layer_norm on CPU (#20883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20883

Add autograd for layer_norm on CPU, after this diff, both PyTorch and jit model can automatically benefit from performance improvement of nn.functional.layer_norm

Reviewed By: zheng-xq

Differential Revision: D15483790

fbshipit-source-id: 94ed3b16ab6d83ca6c254dbcfb224ff7d88837f3
2019-06-02 16:55:32 -07:00
31c79b71ff Add gelu gradient for pytorch (#21237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21237

Add gelu gradient for pytorch

Reviewed By: zheng-xq

Differential Revision: D15589816

fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce
2019-06-02 09:42:42 -07:00
93ae040ff0 Add gelu activation in pytorch (#20665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665

Add gelu activation forward on CPU in pytorch

Compare to current python implemented version of gelu in BERT model like

  def gelu(self, x):
      x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two))

The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm.

Reviewed By: zheng-xq

Differential Revision: D15400974

fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121
2019-06-02 09:08:47 -07:00
aac424a6c4 Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode
Differential Revision:
D15577342

Original commit changeset: 1a024c572171

fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de
2019-06-01 22:17:19 -07:00
360e6d1b0b Fixes a bug in the test (#21146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21146

The error was reported by https://our.intern.facebook.com/intern/test/562949965807317?ref_report_id=1837062

The API changed from `a.quantize_linear(...)` to `torch.quantize_linear(a, ...)`

Reviewed By: dskhudia

Differential Revision: D15557418

fbshipit-source-id: 88463e09fdf1f574f1b8128f6a00c2810091cd03
2019-06-01 18:00:33 -07:00
62ae348d1a Exclude file:line from graphs used for fuser kernel cache (#21252)
Summary:
cc ezyang this is meant to fix the fuser failures on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21252

Differential Revision: D15594283

Pulled By: jamesr66a

fbshipit-source-id: 85f37e78b2de051c92ade3fe4c44c7530b4542e5
2019-06-01 16:18:55 -07:00
7c40576c61 Save the weight shape info the first time we have chance to extract it (#21233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21233

It is possible that OnnxifiOp is created in a thread where weights have been cleaned from the workspace, which is legit use case as we can create the backend once and lower all the weights. So we need to extract the weight shape info the first time we create the backend and save it.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D15587237

fbshipit-source-id: 1f264dc32c0398c42b618e9c41c119eb13e1c9f1
2019-06-01 12:55:29 -07:00
0efc527dd1 Revert D15548138: Export feature length information for onnxifi operator
Differential Revision:
D15548138

Original commit changeset: 460118648bb4

fbshipit-source-id: 1a25ca2942d804f6c88e96c436f09f68c260b9be
2019-06-01 12:41:47 -07:00
51ebbe970a Fix Python DataParallel RNN in no_grad mode (#21197)
Summary:
Fixes #21108

When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)).

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

apsdehal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197

Differential Revision: D15577342

Pulled By: mrshenli

fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c
2019-06-01 10:37:57 -07:00
f051fbd4a8 Fix typo in test_dataloader
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226

Differential Revision: D15592797

Pulled By: soumith

fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724
2019-06-01 10:30:14 -07:00
d168a8533f compare scalar device with common device (#21236)
Summary:
I think there was a typo in #20690 here https://github.com/pytorch/pytorch/pull/20690/files#diff-b47a50873394e38a005b4c1acd151957R130.
Original conditional was ` common_backend == Backend::CUDA && op.tensor.type().backend() == Backend::CPU)`, now it is `op.device.is_cuda() && op.tensor.device().is_cpu()`. It seems that `op.device` and `op.tensor.device()` should be the same, so this conditional is never true. This leads to spurious h2d copies for operations between cuda tensors and cpu scalars, because cpu scalars are now sent to gpu, instead of being passed to lambdas directly.
Unfortunately, I don't know how to test this change, because functionally everything was fine after #20690, it was just a performance regression.

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21236

Differential Revision: D15592754

Pulled By: soumith

fbshipit-source-id: 105bfecc61c222cfdb7294a03c9ecae3cc7f5817
2019-06-01 10:24:31 -07:00
41b17e2458 Fix wrong type hints for Tensor.is_cuda, is_leaf (#21192)
Summary:
`Tensor.is_cuda` and `is_leaf` is not a predicate function but a `bool` attribute. This patch fixes the type hints in `torch/__init__.pyi` for those attributes.

```diff
- def is_cuda(self) -> bool: ...
+ is_cuda: bool
- def is_leaf(self) -> bool: ...
+ is_leaf: bool
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21192

Differential Revision: D15592766

Pulled By: soumith

fbshipit-source-id: 8c4ecd6939df8b8a8a19e1c9db6d40193bca7e4a
2019-06-01 10:04:52 -07:00
be7fc40621 Fix sccache not being used on Windows (#21248)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21167.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21248

Differential Revision: D15592742

Pulled By: soumith

fbshipit-source-id: 4add002698c13301f142526cd783c866d345bf5e
2019-06-01 09:47:39 -07:00
619261d7a7 Add file-line info for jit.load and string frontend (#21217)
Summary:
This makes file-line reporting also work for things loaded using `torch.jit.load()` as well as the string frontend (via `CompilationUnit`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21217

Differential Revision: D15590838

Pulled By: jamesr66a

fbshipit-source-id: 6b6a12574bf9eca0b83f24f0b50535fda5863243
2019-05-31 23:43:15 -07:00
b663eec119 Lazily build error strings in schema matching using replay. (#21241)
Summary:
Saves ~20% (5.3s -> 4.3s) loading DenseNet on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21241

Differential Revision: D15590338

fbshipit-source-id: 2c8aebc829d4ea46f358d74d396cc44f5f57fcf5
2019-05-31 23:34:20 -07:00
5bc7c1f83d fix contribution and governance links (#21243)
Summary:
Updated web links on contribution_guide and governance documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21243

Differential Revision: D15591065

Pulled By: soumith

fbshipit-source-id: fdcfc518605a08a2ac35a10c146122d7d0a3f609
2019-05-31 21:02:13 -07:00
85786bea7d Export feature length information for onnxifi operator (#21110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21110

Export feature length information for onnxifi operator

Reviewed By: ipiszy

Differential Revision: D15548138

fbshipit-source-id: 460118648bb4467c096f79dea524060c9524f23d
2019-05-31 20:25:34 -07:00
516ea33f6a add PT maxpool and avgpool ops to the benchmark suite (#21200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200

This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15541980

fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3
2019-05-31 19:35:29 -07:00
dceea73460 add PT conv and convtranspose ops to the benchmark suite (#21199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199

This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15520817

fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde
2019-05-31 19:35:25 -07:00
2d75d31398 add PT linear op to the benchmark suite (#21204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204

as title

Reviewed By: hl475

Differential Revision: D15484743

fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e
2019-05-31 19:35:22 -07:00
00b3e69211 add PT batchnorm op to the benchmark suite (#21201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201

as title

Reviewed By: hl475

Differential Revision: D15482581

fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb
2019-05-31 19:35:18 -07:00
ed1078bde3 migrate matmul operator to the new interface (#21198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198

as title

Reviewed By: hl475

Differential Revision: D15325768

fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc
2019-05-31 19:35:15 -07:00
c8dc707fee avoid multiple writes to files on export (#21186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21186
ghimport-source-id: 2f62fed50e0d74f4162b74b6a2f44b8baa376316

Differential Revision: D15581527

Pulled By: suo

fbshipit-source-id: b1150cfa47d8df6f217f048c742a5ba9fa7f7935
2019-05-31 19:14:46 -07:00
4c19421f16 Register gradient op with engine (#21205)
Summary:
cc dreiss
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205

Differential Revision: D15578948

Pulled By: bddppq

fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345
2019-05-31 18:48:47 -07:00
daa1e2de1a Add file:line:graph to graph printout (#21180)
Summary:
Example:

```
import torch

torch.jit.script
def foo(x):
    y = torch.neg(x)
    return x - y

print(foo.graph.debug_str())
```

```
graph(%x : Tensor):
  %2 : int = prim::Constant[value=1]()
  %y : Tensor = aten::neg(%x) # demo.py:5:9
  %3 : Tensor = aten::sub(%x, %y, %2) # demo.py:6:12
  return (%3)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21180

Differential Revision: D15583548

Pulled By: jamesr66a

fbshipit-source-id: 0c6dc2fb7555c01dde9c563b78422ef234b2681b
2019-05-31 18:14:18 -07:00
678dc44d4c use _sparse_coo_tensor_unsafe in coalesce for speedup (#21214)
Summary:
Studied why sparse tensor coalesce was slow:  issue #10757.

Using nv-prof, and writing a simple benchmark, I determined bulk of the time was used ``kernelTransformReduceInnermostDimIndex``, which is called when sparse tensor is constructed with sparse_coo_tensor when it does sanity check on the minimum and maximum indices. However, we do not need this sanity check because after coalescing the tensor, these min/maxs won't change.

On my benchmark with 1 million non-zeros, the runtime of coalesce. was about 10x from 0.52s to 0.005 sec.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21214

Reviewed By: bddppq

Differential Revision: D15584338

Pulled By: akyrola

fbshipit-source-id: a08378baa018dbd0b45d7aba661fc9aefd3791e0
2019-05-31 17:10:05 -07:00
9e5f1db66b Reuse common options between ONNXIFI and TVM transformations (#21163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21163

These two backend transformation share some common traits. Therefore we want to reuse the data struct/code as much as possible.

Reviewed By: hlu1

Differential Revision: D15561177

fbshipit-source-id: 35f5d63b2b5b3657f4ba099634fd27c3af545f1b
2019-05-31 17:01:36 -07:00
b12a5f6155 schema_matching.cpp: mark internal functions as static. (#21140)
Summary:
Some of the functions are only used in this file - mark them `static`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21140

Differential Revision: D15578076

Pulled By: Krovatkin

fbshipit-source-id: 71ae67baabebd40c38ecb9292b5b8202ad2b9fc1
2019-05-31 16:40:16 -07:00
668dbcc41b migrate intraop benchmarks to the new interface (#21202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202

Migrate Ilia's op benchmarks to the new interface

Reviewed By: hl475

Differential Revision: D15322577

fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e
2019-05-31 16:19:04 -07:00
c62d476206 migrate add operator to the new interface (#21152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152

Migrate existing add benchmark to use the new op front-end

Reviewed By: zheng-xq

Differential Revision: D15325524

fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29
2019-05-31 16:19:00 -07:00
fd19d06db4 remaining use of t.quantize_linear (#21219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21219

att

Differential Revision: D15583802

fbshipit-source-id: 742e8b799d67485b2d48b1458839f3f3b000f200
2019-05-31 16:05:44 -07:00
4dbeb87e52 PyTorch Dockerfile should update submodules recursively.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21216

Differential Revision: D15584114

Pulled By: bddppq

fbshipit-source-id: dbe0c3a54024a90fcd2c6689f8b9689ed0cd639b
2019-05-31 14:56:57 -07:00
0aeb971622 conditionally defined var better error message (#20911)
Summary:
i will do loops in a follow up after some other changes I am working on have landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20911

Differential Revision: D15497205

Pulled By: eellison

fbshipit-source-id: 8cac197c6a6045b27b552cbb39e6fc86ca747b18
2019-05-31 14:32:03 -07:00
2f4824b2fb Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work

](https://our.intern.facebook.com/intern/diff/15560490/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15560490

fbshipit-source-id: cc7ef3a1c2772eff9beba5f3e66546d2b7d7198a
2019-05-31 14:27:16 -07:00
834d678eb8 Remove old custom op implementation (#21085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21085

Now that torch::jit::RegisterOperators() always passes through to torch::RegisterOperators() (see diffs stacked below this), we can remove the old custom op implementation.

Reviewed By: dzhulgakov

Differential Revision: D15542261

fbshipit-source-id: ef437e6c71950e58fdd237d6abd035826753c2e4
2019-05-31 13:51:14 -07:00
384d828ea5 Add aliasAnalysis to torch::RegisterOperators() (#21084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21084

- Now AliasAnalysisKind can be set using the torch::RegisterOperators() API
- This also allows us to remove the last place in torch::jit::RegisterOperators that didn't use c10 yet.

Reviewed By: dzhulgakov

Differential Revision: D15542097

fbshipit-source-id: ea127ecf051a5c1e567e035692deed44e04faa9e
2019-05-31 13:51:07 -07:00
80556761c8 c10::OperatorOptions (#21181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21181

Implement c10::OperatorOptions as a class to store metadata about operators.
This is meant to replace torch::jit::OperatorOptions.

Reviewed By: dzhulgakov

Differential Revision: D15569897

fbshipit-source-id: 95bf0bf917c1ef2bdf32702405844e1a116d9a64
2019-05-31 13:51:00 -07:00
b91e0d14a7 registration options should only be callable on rvalues (#21079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21079

They're invalidating *this, so they shouldn't be callable on non-rvalues.

Reviewed By: dzhulgakov

Differential Revision: D15541583

fbshipit-source-id: a2a9dafb29af03477486ea2ce9029399f557c728
2019-05-31 13:50:54 -07:00
181792176d Implement various AliasAnalysis operations directly on top of MemoryLocations. (#21203)
Summary:
This reduces DenseNet load time by about 25% (down to 5.3s on my laptop) and gets AliasAnalysis out of the profile top hits entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21203

Differential Revision: D15578155

fbshipit-source-id: ddbb1ad25c9540b5214702830084aa51cc6fd3cb
2019-05-31 13:38:32 -07:00
e098878d75 Cuda persistent softmax (#20827)
Summary:
Adds persistent cuda kernels that speed up SoftMax applied over the fast dimension, i.e. torch.nn.Softmax(dim=-1) and torch.nn.LogSoftmax(dim=-1). When the size is <= 1024, this code is 2-10x faster than the current code, speedup is higher for smaller sizes. This code works for half, float and double tensors with 1024 or fewer elements in the fast dimension. Numerical accuracy is on par with the current code, i.e. relative error is ~1e-8 for float tensors and ~1e-17 for double tensors. Relative error was computed against the CPU code.

The attached image shows kernel time in us for torch.nn.Softmax(dim=-1) applied to a half precision tensor of shape [16384,n], n is plotted along the horizontal axis. Similar uplifts can be seen for the backward pass and for LogSoftmax.

![image](https://user-images.githubusercontent.com/41591019/58212822-b63ebb00-7cb5-11e9-910d-1fc7d8585d58.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20827

Differential Revision: D15582509

Pulled By: ezyang

fbshipit-source-id: 65805db37487cebbc4ceefb1a1bd486d24745f80
2019-05-31 13:20:15 -07:00
052bab7069 Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 (#21115)
Summary:
This is a follow up on Jame's PR: https://github.com/pytorch/pytorch/pull/19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance.

benchmark(from Jame's script):

```python
import torch, time
ops = ['sinh', 'cosh']
x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')
```
code on master:

```
op	time per iter (ms)	gops/s	GB/s
sinh	3.37614369392395	0.3105839369002935	2.484671495202348
cosh	3.480502033233643	0.3012714803748572	2.4101718429988574
```
after change (on Macbook pro 2018):

```
op	time per iter (ms)	gops/s	GB/s
sinh	0.8956503868103027	1.1707425301677301	9.365940241341841
cosh	0.9392147302627564	1.1164390487217428	8.931512389773943
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21115

Reviewed By: ljk53

Differential Revision: D15574580

Pulled By: xta0

fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e
2019-05-31 12:06:26 -07:00
7f960a9c01 remove quantize_linear from Tensor method (#21196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15577123

fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240
2019-05-31 12:01:10 -07:00
c185145d8c remove dependency to caffe2::math and eigen (#21169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21169

We should minimize dependency from perfkernels (we were including eigen header files only in cc files not compiled with avx or avx2 options but better to be very strict because it's easy to introduce illegal instruction errors in perfkernels)

Reviewed By: salexspb

Differential Revision: D15563839

fbshipit-source-id: d4b1bca22d7f2e6f20f23664d4b99498e5984586
2019-05-31 11:55:16 -07:00
8c927b208c improve test_docs_coverage error messages (#21029)
Summary:
Most important fix: Correct "tensor.rst" to "tensors.rst"

Secondary fix: some minor English spelling/grammar fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21029

Differential Revision: D15523230

Pulled By: umanwizard

fbshipit-source-id: 6052d8609c86efa41a4289cd3a099b2f1037c810
2019-05-31 11:13:39 -07:00
e13b483f58 Fix weak module cuda() _flat_weights bug (#21107)
Summary:
Dynamically creating a type at runtime was messing up the MRO and has been causing many other problems. I think it's best to delete it, this causes a regression since
```python
self.linear = nn.Linear(10, 10)
isinstance(self.linear, nn.Linear)
```
will now be `False` again, but this will be fixed once recursive script mode is the default (#20939)
](https://our.intern.facebook.com/intern/diff/15560549/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21107

Pulled By: driazati

Differential Revision: D15560549

fbshipit-source-id: 7bd6b958acb4f353d427d66196bb4ee577ecb1a6
2019-05-31 10:35:30 -07:00
0223d3744a introduce a new intrace to add op [C2 changes] (#21148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148

The diff modifies the interface for Caffe2 operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433888

fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53
2019-05-31 09:21:07 -07:00
31089b02ce introduce a new interface to add op [core changes] (#21147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147

This diff introduces a new interface to add PT/C2 operators to the benchmark suite.

The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors),  ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs

Reviewed By: zheng-xq

Differential Revision: D15250380

fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
012069ca8f Revert D15454048: Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen
Differential Revision:
D15454048

Original commit changeset: 8bfc57bf015b

fbshipit-source-id: 98c562ab4cf7a00e9041b2aa50eb7fb0f0c48f69
2019-05-31 07:49:22 -07:00
dc8f306b8e Revert D15454052: Move THCTensor_(cauchy) to ATen
Differential Revision:
D15454052

Original commit changeset: 4f4d33ec11cf

fbshipit-source-id: 832a738796e6b6bdf969a44bb2cdcf171cbd5f77
2019-05-31 07:49:18 -07:00
be9ce6318e remove import torchvision when testing torch.hub (#21132)
Summary:
This should pass once https://github.com/pytorch/vision/pull/971 is merged.
To remove torchvision as baseline, we just compare to sum of all param.sum() in pretrained resnet18 model, which means we need to manually update the number only when that pretrained weights are changed, which is generally rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21132

Differential Revision: D15563078

Pulled By: ailzhang

fbshipit-source-id: f28c6874149a1e6bd9894402f6847fd18f38b2b7
2019-05-31 07:38:30 -07:00
e161360b62 Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method
Differential Revision:
D15558784

Original commit changeset: 0b194750c423

fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529
2019-05-31 06:20:05 -07:00
5fcd37bd8f List (#21164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21164

Write a List type to be used in operator kernels. This abstracts away from the concrete list type used (e.g. std::vector vs SmallVector)
and allows us to change these implementation details without breaking the kernel API.
Also, this class allows for handling List<bool>, which would not work with ArrayRef because vector<bool> is a bitset and can't be converted to ArrayRef<bool>.

Reviewed By: ezyang

Differential Revision: D15476434

fbshipit-source-id: 5855ae36b45b70437f996c81580f34a4c91ed18c
2019-05-31 04:15:39 -07:00
f91f24764e remove quantize_linear from Tensor method (#21156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15558784

fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9
2019-05-30 22:02:12 -07:00
0a0ff83124 replace num_bits with quant_min and quant_max (#21097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21097

att

Differential Revision: D15547166

fbshipit-source-id: 60bc7f7d82c424558b67881627fb74f1eff515af
2019-05-30 20:57:57 -07:00
277bf69fa0 Add torch.load/torch.save for QTensor (#20830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830

att

Reviewed By: dzhulgakov

Differential Revision: D15340701

fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226
2019-05-30 20:52:19 -07:00
eb4d43df3b Make CUDA triu / tril support batches of size > 65535 (#21067)
Summary:
In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints.

Changelog:
- Compute offset, row and col in the kernel
- Use 1st dimension of grid alone
- Remove unnecessary contiguity checks on tensors as a result of this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21067

Differential Revision: D15572501

Pulled By: ezyang

fbshipit-source-id: 93851cb661918ce794d43eeb12c8a38762e1358c
2019-05-30 20:16:11 -07:00
057ddab766 on import, register class before defining it (#21182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21182
ghimport-source-id: 2457a4306c0a72888bb8359a267fcd12b43f103a

Differential Revision: D15571334

Pulled By: suo

fbshipit-source-id: 26ca9dddb25df1b1eac2e17c70f682e20e08cb6d
2019-05-30 20:09:01 -07:00
d6438c956b Move THCTensor_(cauchy) to ATen (#20622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20622
ghimport-source-id: b100d6cededf6f2c2020c3d7961271f16497bbdc

Differential Revision: D15454052

Pulled By: ezyang

fbshipit-source-id: 4f4d33ec11cf36b91c67759bd27252d1e457cff1
2019-05-30 18:13:16 -07:00
26d16ae515 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#20621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20621
ghimport-source-id: f461d7f1eb6b5a8306dd8175cbb0a7fcc9f64c76

Differential Revision: D15454048

Pulled By: ezyang

fbshipit-source-id: 8bfc57bf015b85f57ed99a54176926386aab4e34
2019-05-30 18:01:31 -07:00
07ac00d21a Automatic update of fbcode/onnx to 9005291283e943f1a91da5f0acf218bc4e8eb2ca (#21057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21057

Previous import was cc2333a3f929caca7223b98699237f19388dd585

Included changes:
- **[90052912](https://github.com/onnx/onnx/commit/90052912)**: Fix wrong condition and add --user in update_doc.sh (#2050) <daquexian>
- **[a4f44a20](https://github.com/onnx/onnx/commit/a4f44a20)**: Add bit-shift operators for supporting hashing (#1931) <Wei-Sheng Chin>
- **[0098752c](https://github.com/onnx/onnx/commit/0098752c)**: Add shape inference logic for Expand op (#2041) <Hariharan Seshadri>
- **[fbe8addb](https://github.com/onnx/onnx/commit/fbe8addb)**: update qops tests (#2040) <Ashwini Khade>
- **[874fb37c](https://github.com/onnx/onnx/commit/874fb37c)**: Fix torchvision installation (#2054) <bddppq>
- **[1f5f6582](https://github.com/onnx/onnx/commit/1f5f6582)**: Fix bug that kernel_shape rather than effective_kernel_shape is used in dilated conv (#2043) <daquexian>
- **[38b6c44e](https://github.com/onnx/onnx/commit/38b6c44e)**: Changes done internally at Facebook (#2035) <Lu Fang>
- **[5c51f0db](https://github.com/onnx/onnx/commit/5c51f0db)**: Explicitly specify type of integers in the input tensor. (#2034) <Dmitri Smirnov>

Reviewed By: benoitsteiner

Differential Revision: D15534241

fbshipit-source-id: 8d2b78a986e5b7fbeb248f2d7b80c1a07230654e
2019-05-30 17:33:18 -07:00
ff0d00f921 Updated scalar type to onnx mapping (#21095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21095
ghimport-source-id: 32a79eace02216de9170f163027b1aa93756b821

Differential Revision: D15546175

Pulled By: izdeby

fbshipit-source-id: 4e47c8538aaf30b4af198baac7279133e4d74b36
2019-05-30 17:11:12 -07:00
726caeace3 Use QTensor for bias (#21038)
Summary:
Use QTesnor for bias tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21038

Differential Revision: D15524980

Pulled By: dskhudia

fbshipit-source-id: c7bf2efc8fe3f4b5574c721c2f64ff073045ecc4
2019-05-30 16:16:03 -07:00
64f06d4964 Enable all and any for bool tensors (#21033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033
ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783

Differential Revision: D15530497

Pulled By: izdeby

fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966
2019-05-30 16:16:00 -07:00
9a22cb9f49 Enabled add, sum and mul for bool tensor (#21032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032
ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0

Differential Revision: D15530496

Pulled By: izdeby

fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24
2019-05-30 16:11:43 -07:00
fe39602451 Support for rudimentary f-strings (#21037)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/51

This adds support for converting simple f-string literals to calls to `string.format()`. It does not support conversion specifiers or format strings.

This also does not support the string parser frontend, since that implementation would be more involved and likely would require modifying our TorchScript AST
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21037

Reviewed By: zdevito

Differential Revision: D15541183

Pulled By: jamesr66a

fbshipit-source-id: ae9df85e73f646d7219c1349f5b7683becbcef20
2019-05-30 15:50:45 -07:00
76deb450c6 Record source/line info in SourceRange and report in highlight (#21157)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/20898 with flake8 fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21157

Reviewed By: zdevito

Differential Revision: D15560324

Pulled By: jamesr66a

fbshipit-source-id: fc4e429eac03d2768f758b19c9d43e0bb614c2b8
2019-05-30 15:45:30 -07:00
416357648c Optimize alias analysis (#20899)
Summary:
# Overall Improvements
1. Switched from using `unordered_set` to sparse bitset.
1. Prevent some excessive memory allocations (thanks to resistor )
1. Take advantage of the sparse bitset operations
1. Switch to `flat_hash_map` instead of `unordered_map` in some places.

# Benchmarks (somewhat approximate, best of a couple runs)
1. InceptionNet (load + one forward pass): 19.8->13.3
1. GoogleNet(load + one forward pass): 10.0 -> 7.24
1. DenseNet (only load): 7.3 -> 5.3

I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies.

## Some notes on our graph topologies
In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).

PS: Holy crap macbooks throttle an insane amount with the default fan settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899

Differential Revision: D15564612

Pulled By: Chillee

fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0
2019-05-30 15:37:50 -07:00
31aefd9b09 Adding models to jenkins benchmark script (#21010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21010

Adding and editing the jenkins benchmarks script to accommadate both
Renext and Shufflenet models.

Reviewed By: bddppq

Differential Revision: D15515354

fbshipit-source-id: 2a92c272b0b74ed3ecc78af6544a06337c7753cf
2019-05-30 15:17:40 -07:00
f6e5846a67 add handle to run all jit tests (#21161)
Summary:
Now you can run `python test/run_tests --jit` to run all jit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21161

Differential Revision: D15563912

Pulled By: eellison

fbshipit-source-id: 4bb0285cda4168b72a3dc4bba471485566a59873
2019-05-30 14:12:21 -07:00
7f308b88b9 Only populate net_pos in ssaRewrite if the op doesn't already have a net_pos argument (#21051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21051

In net transforms, we perform an SSARewrite where we update the 'net_pos' for all the ops in the net. The transform function also takes a unordered set of net positions for blacklisting. It's possible that SSARewrite will change the indexes of the ops so the blacklist is applied to the wrong ops. We fix this issue by having SSARewrite only assign new net_pos if the op doesn't already have one.

Reviewed By: yinghai

Differential Revision: D15532795

fbshipit-source-id: e020492a7b5196a91cdc39d0eda761b1ca612cdb
2019-05-30 13:37:35 -07:00
80020306ef Added base parameter to math.log (#21151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21151
ghimport-source-id: 76dc0852022a87a000888a787de1391f71923074

Differential Revision: D15563185

Pulled By: Chillee

fbshipit-source-id: 6ed7cc32ed7c103f360022b97f6df47ccd0403e7
2019-05-30 13:32:52 -07:00
4e3e4d7ff5 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 80a731a1b8b04df01cb0d68ec39d4af10e0b61b7
2019-05-30 13:07:20 -07:00
4aee92833c Update libtorch docs (#21150)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21150

Differential Revision: D15559590

Pulled By: pjh5

fbshipit-source-id: 4063bf91464425e8efe4765dc17bb7e9b7bfccc7
2019-05-30 12:49:56 -07:00
313ef4f5d5 Make data_ptr a method on Tensor (#20878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20878
ghimport-source-id: f19993d97ecb8cfcd60b371d9ed49e3ad2e051c7

Differential Revision: D15482061

Pulled By: li-roy

fbshipit-source-id: c0563ce849fc3277e86a1a58bd384e38365786b2
2019-05-30 11:47:59 -07:00
d17aa72373 Added more regression test for groupconv w/o bias. (#18519)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/18218, which was fixed by https://github.com/pytorch/pytorch/pull/18463 with mkl-dnn upgraded to v0.18.1.
Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18519

Differential Revision: D14643071

Pulled By: soumith

fbshipit-source-id: d0ebed59326c67089e042b50583b87ed2c3ccc2f
2019-05-30 11:36:07 -07:00
6dc445e1a8 Conservative alias analysis rules for CallFunction/CallMethod (#21087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21087
ghimport-source-id: 4fa6763ffecc7d2974b902dd9bd2bd9ac467bab7

Differential Revision: D15542512

Pulled By: zdevito

fbshipit-source-id: 2dcd673cd4c200d7a854347429d4f33a11793cbc
2019-05-30 11:01:56 -07:00
b6d1a72f48 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: 7fad3a0567022dd417f4bd079a50a22e3c1dc020

Differential Revision: D15547218

Pulled By: suo

fbshipit-source-id: 5dbd567c79e6d01e9af4b8552777f7f0043df5b2
2019-05-30 10:50:34 -07:00
ec76976a7a Remove all devtoolset7 jobs (#21153)
Summary:
These do not work. We'll save time and cpu until someone has the time to fix these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21153

Differential Revision: D15558601

Pulled By: pjh5

fbshipit-source-id: f9bfe580aa7962a88506f9af0032647f553637a4
2019-05-30 10:39:26 -07:00
fffffde2f8 Delete more tabs, fix lint. (#21142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21142
ghimport-source-id: 4666c0731d9c08e9990ffafd0ae88fa1e7896348

Differential Revision: D15555285

Pulled By: ezyang

fbshipit-source-id: 9e5bfacf202ceba37bd29cfd5dcb651b7f79068d
2019-05-30 06:36:47 -07:00
e9df9e7960 Revert D15552424: [pytorch][PR] [JIT] Record source/line info in SourceRange and report in highlight
Differential Revision:
D15552424

Original commit changeset: 78d0f0de03f7

fbshipit-source-id: cc24f62189b7bbcdc1406912cfb3d4ca52b8e67e
2019-05-30 05:17:15 -07:00
c4a90ca18e Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method
Differential Revision:
D15477933

Original commit changeset: c8aa81f681e0

fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb
2019-05-30 05:04:12 -07:00
3805490d6a Typo fix (#21122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21122

fix a typo

Reviewed By: dzhulgakov

Differential Revision: D15553921

fbshipit-source-id: 260b0be5975d49bb6d70e45d83505efcecf02875
2019-05-30 00:16:01 -07:00
52ded63128 Revert D15546045: [jit] Add support for recursive compilation on Modules
Differential Revision:
D15546045

Original commit changeset: c2c8fe179088

fbshipit-source-id: c921fb92cf9f5c6c94c77fa5070f9c5775c91b77
2019-05-29 23:42:50 -07:00
3083c71cde First class functions in IR, inlined eagerly (#21052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21052
ghimport-source-id: cc476b9cc301967dde5de6212ca144cdb252e84c

Differential Revision: D15533353

Pulled By: zdevito

fbshipit-source-id: 4d25461969cfcc9e5f641d585584cc100c7b34ae
2019-05-29 23:04:18 -07:00
6b099edb53 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21118

Differential Revision: D15553121

Pulled By: jamesr66a

fbshipit-source-id: 14ebf0e4cb33f8155ac86a9538beb8570bdfe8c8
2019-05-29 21:50:12 -07:00
7cea6d9b71 Redesign the output shape adjustment of OnnxifiOp (#21027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21027

Previously, we are only able to adjust batch size when output shape has batch size conditioned at its first dim. Although not common, there are cases where we want to slice back the output whose batch size is conditioned on non-first dim, or whose output shape doesn't really has batch size in it but rather is an expression of it. Examples are shapes at the output of `Transpose` or `Tile`. This diff redesigns how we handle the output size. The key is when we run OnnxifiOp, the input shapes are given, and we can actually do a shape inference to derive the real output shapes, no matter how they got transformed. And then we compare the real output shape with max batch sized output shape, dim by dim and use a `Slice` op to cut the max output back to real output shape.

Notice that general `Slice` op is slow and in most of the cases, we still prefer adjusting batch size by shrinking its first dim, which is just an operation on meta info without data allocation/manipulation. Therefore, we add a flag `fast_path` to detect this situation and operate accordingly.

Reviewed By: tracelogfb

Differential Revision: D15515189

fbshipit-source-id: 9c1fff161f82d0bc20eeac07ca4a2756e964e9fd
2019-05-29 21:39:00 -07:00
6875018793 Record source/line info in SourceRange and report in highlight (#20898)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/29

Examples:

```
import torch

torch.jit.script
def foobar(x):
    return torch.blargh(xyz)

==

RuntimeError:
object has no attribute blargh:
at compile.py:5:12
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```

It also gets the correct column number in the case where the original source file has common leading whitespace in front of the callable:

```
import torch

with torch.no_grad():
            torch.jit.script
            def foo(x):
                return torch.blargh(x)

==
RuntimeError:
object has no attribute blargh:
at compile_leading.py:6:24
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20898

Differential Revision: D15552424

Pulled By: jamesr66a

fbshipit-source-id: 78d0f0de03f7ccbf3e7ea193a1b4eced57ea5d69
2019-05-29 21:32:33 -07:00
57f4f98c40 Fix borked SourceRanges
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21109

Reviewed By: zdevito

Differential Revision: D15551392

Pulled By: jamesr66a

fbshipit-source-id: 4f29214049b8feced0e740f84007b5751703ee20
2019-05-29 20:13:14 -07:00
67291ba74f remove quantize_linear and dequantize from Tensor method (#20874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874

A criteria for what should go in Tensor method is whether numpy has it, for this one it does not
so we are removing it as a Tensor method, we can still call it as function.
Python
```
torch.quantize_linear(t, ...), torch.dequantize(t)
```
C++
```
at::quantize_linear(t, ...), at::dequantize(t)
```

Reviewed By: dzhulgakov

Differential Revision: D15477933

fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437
2019-05-29 19:17:16 -07:00
8d3388aef2 Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work
](https://our.intern.facebook.com/intern/diff/15546045/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15546045

fbshipit-source-id: c2c8fe179088ffbdad47198e799a456560655b86
2019-05-29 18:52:36 -07:00
33d35f5f93 Fixed isinstance typos
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21102

Differential Revision: D15549564

Pulled By: Chillee

fbshipit-source-id: 6746dc9e01b5a30d55d544beb70b7005f0cfd8ae
2019-05-29 17:51:27 -07:00
990e63f587 Remove unnecessary sources from base CircleCI AMI (#21103)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21103

Differential Revision: D15550213

Pulled By: kostmo

fbshipit-source-id: b4a2c38d168f722b30c96494079ccdd468b9ece8
2019-05-29 17:46:08 -07:00
12b0dede39 Support exporting tensor factories from scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20255

Differential Revision: D15534186

Pulled By: houseroad

fbshipit-source-id: 182e117a35fa31445fcad8cb492160500f71599a
2019-05-29 16:53:49 -07:00
9be72ce44f Convert Tree to use intrusive_ptr instead of shared_ptr.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20815

Differential Revision: D15453817

fbshipit-source-id: 569ab807d32fb3dcebfe201a049c770b1600e5c7
2019-05-29 16:33:02 -07:00
4900edebcf QTensor permute, transpose and contiguous (#20869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869

Adding support for the functions listed in the title, by implementing the copy kernel.

Differential Revision: D15474060

fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317
2019-05-29 16:05:53 -07:00
99b057d89c Failing assertions is unlikely (#20876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876

Tell the compiler that assertions are likely to succeed.
This allows the compiler to generate betterr code and optimize for the success case.

Differential Revision: D15480066

fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee
2019-05-29 15:59:33 -07:00
9daf48525e Quantized Max Pool op (#20474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20474

parallel implementaiton of the MaxPool (no ReLU).

Reviewed By: dskhudia

Differential Revision: D15327923

fbshipit-source-id: ca6475e7fe1434b55d4b7730a074bb7ff50355fd
2019-05-29 15:01:01 -07:00
154029a6ff Revert D15534670: [jit] improve error message on inferred type
Differential Revision:
D15534670

Original commit changeset: 8bbfd6e9c1af

fbshipit-source-id: fe62cf954292e8ef1d00a3cc569206f73cedcd31
2019-05-29 14:56:08 -07:00
5dacf6b048 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: e7d6e082b0faf4f3d3e683f2c98863ee269439f0

Differential Revision: D15534670

Pulled By: suo

fbshipit-source-id: 8bbfd6e9c1afbc3006d7d55ed633e18618e05021
2019-05-29 14:47:00 -07:00
6ea9044d3c add 'all' builtin (#20521)
Summary:
[jit] add 'all' builtin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20521

Differential Revision: D15527657

Pulled By: driazati

fbshipit-source-id: eaa3c1c560810581150646858339369e4305fdf2
2019-05-29 14:46:56 -07:00
8fcd80af20 Fix "cuda: unknown error" on Windows (#21062)
Summary:
Thanks Jonas1312 for validating this workground.
Fixes #20635.
However, I don't know exactly why this one is needed.
The following are my guesses:
1. It is a CUDA bug. Static linking against `cudart` is the default now, so they didn't run enough tests for dynamic ones.
2. It is related to UCRT. But (1)according to msdn, shared DLLs should share the same CRT. (2) The CUDA related objects like `CUDevice` passing to `cudart` are stored on the stack, not the heap. (3) If this is the case, it should always fail, not sometimes. https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=vs-2019
3. It is a bug of our side. However, I was unable to find it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21062

Differential Revision: D15543557

Pulled By: ezyang

fbshipit-source-id: c23af45ebf582fad93ce5f029af6e1f06cf1d49d
2019-05-29 14:34:02 -07:00
157fcfc07d Add quantize_linear_per_channel (#20765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765

att

Reviewed By: dskhudia

Differential Revision: D15435455

fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e
2019-05-29 14:29:16 -07:00
53ccba004f New torch assertion macros (#20887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20887

Switch AT_xxx assertion macros to the TORCH_ variants and make sure the separation between TORCH_CHECK and TORCH_INTERNAL_ASSERT makes sense.

Differential Revision: D15484658

fbshipit-source-id: 490ae64cc36946756c30971f1b685048bc5f77da
2019-05-29 14:15:04 -07:00
449a2c3555 Fixes #20124 (#20203)
Summary:
Fixes #20124

Description:
Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change:

![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png)

cc SsnL, bado-lee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203

Differential Revision: D15543060

Pulled By: ezyang

fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5
2019-05-29 14:15:01 -07:00
74375299e0 add torch.nn._intrinsic and torch.nn._intrinsic.quantized namespace (#20940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20940

- `torch.nn._intrinsic` will contain normal(unquantized) fused modules like Conv2DRelu, Conv2DBnRelu, FakeQuantize ops etc.
- `torch.nn._intrinsic` will contain fused and quantized modules like Quantized Conv2DRelu, Quantized LinearRelu etc.
Right now I only added FakeQuantize op in `torch.nn._intrinsic` namespace, we'll have more later

Differential Revision: D15505228

fbshipit-source-id: d380929e38af7a5bcfbea27474d5b80f95d43b03
2019-05-29 14:09:37 -07:00
736bf7b46c Fix __constants__ for some nn modules (#21071)
Summary:
A bunch of modules were missing entries for `__constants__` which was making their `__repr__`s not work. Others had `__constants__` that were not necessary since it was provided by some parent class instead.

Fixes #20978
](https://our.intern.facebook.com/intern/diff/15539518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21071

Pulled By: driazati

Differential Revision: D15539518

fbshipit-source-id: 24bdd1ef41ef636eefd5d2bad4ab2d79646ed4f0
2019-05-29 13:55:53 -07:00
1e1f2c85f0 remove constant pooling expect (#21003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21003
ghimport-source-id: c1e0d0555758cab12ce82e0283bab559c7e8e4e2

Differential Revision: D15523443

Pulled By: wanchaol

fbshipit-source-id: 40973c1c0c0ab07fe4b1334e9ae0e4b16b5add8e
2019-05-29 13:55:50 -07:00
0ffd20c268 Fix empty tensor for unique_dim (#19000)
Summary:
Fixes: #18408

cc: zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000

Reviewed By: ezyang

Differential Revision: D15470136

Pulled By: VitalyFedyunin

fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2
2019-05-29 13:50:32 -07:00
2cd1c78632 Revert D15523444: [jit] move casting ops from prim to aten
Differential Revision:
D15523444

Original commit changeset: 642342bf1cce

fbshipit-source-id: 29de1c7e19cbb3273230c280346e786e61d2d445
2019-05-29 13:42:05 -07:00
7cb1aa67b0 Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031
ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5

Reviewed By: ezyang

Differential Revision: D15530499

Pulled By: izdeby

fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2
2019-05-29 13:22:54 -07:00
85777b92b2 Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14430749

fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707
2019-05-29 13:16:00 -07:00
a0111aaf0d move casting ops from prim to aten (#21002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21002
ghimport-source-id: 4c88a54a3ecb76c5ca3c2c328b749350860a166d

Differential Revision: D15523444

Pulled By: wanchaol

fbshipit-source-id: 642342bf1ccea83c88897bc023979a32ee01addf
2019-05-29 12:36:47 -07:00
dd903eb645 Add start and step parameters for range in torchscript (#20795)
Summary:
Fixes #18440

I calculate a derived index from `start,stop,step` as `start + step*index`. When `start=0` and `step=1` (the defaults/`range(n)`), this is the same behavior as before.

Unluckily, it seems that we do not optimize out operations like `x*1` or `x+0`. That means that we're doing lots of redundant operations when we don't need to. EDIT: More specifically, it seems like we only do this optimization for (tensor, scalar): https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/peephole.cpp#L128

The most annoying part of this code is calculating the number of iterations, given `start, stop, step`. I ended up going with the formula `(abs(stop-start) + abs(step)-1)//abs(step)`. Other intuitively appealing formulas like `(stop-start + step -1)//step` don't work for negative numbers.

I tried using `SymbolicVariable` for the calculations, but it seems that `symbolicvariable` only outputs ops for `tensors`, not the integers we have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20795

Differential Revision: D15446869

Pulled By: Chillee

fbshipit-source-id: 6085545ace04e25985c6ac870226f7a651f670d5
2019-05-29 12:31:29 -07:00
fa8c132e24 Revert D15502768: [pytorch][PR] [jit] Make ScriptModule.training an attribute instead of a parameter
Differential Revision:
D15502768

Original commit changeset: 3022f2d57ec6

fbshipit-source-id: 5cd08d3c3a75e38e3aa9b75a0c0059a2c6c85a1e
2019-05-29 12:18:18 -07:00
94b9706017 fix dequantize_linear (#21035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035

Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear`

Differential Revision: D15521931

fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616
2019-05-29 12:18:15 -07:00
cbf2a4f5c4 print a warning if a type annotation prefix is invalid according to mypy (#20884)
Summary:
This PR adds a check that prints a warning if a type annotation prefix isn't what mypy expects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20884

Differential Revision: D15511043

Pulled By: Krovatkin

fbshipit-source-id: 9038e074807832931faaa5f4e69628f94f51fd72
2019-05-29 11:56:55 -07:00
a6bb15493d Removed accidental TensorFlow dependency (#21066)
Summary:
I accidentally added a TF dependency in #20413 by using the from tensorboard.plugins.mesh.summary import _get_json_config import.

I'm removing it at the cost of code duplication.

orionr, Please review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21066

Reviewed By: natalialunova

Differential Revision: D15538746

Pulled By: orionr

fbshipit-source-id: 8a822719a4a9f5d67f1badb474e3a73cefce507f
2019-05-29 11:18:10 -07:00
f2199a34eb Hook to store additional metadata about environment (#20863)
Summary:
In larger system environment, there's usually a need to store some information about how the model was created (e.g. from which process, workflow, by which user, etc). It's almost like JPEG metadata written by camera.

This PR adds a low-level c++ hook to allow population of additional files in zip container based on environment. The reason to have it a low-level hook instead of top-level API wrapper (e.g. `m.save_with_metadata`) is to capture all usages of the saving API transparently for user.

Let me know if there are concerns.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20863

Differential Revision: D15487941

Pulled By: dzhulgakov

fbshipit-source-id: 120c5a4c9758aa82846bb51a1207f923e3da1333
2019-05-29 10:11:58 -07:00
00c1584979 Added possibility to index scalars by bool masks (#21030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21030
ghimport-source-id: 7a66ca096c62d050a38a6fcc9f6b2d61e387eb34

Differential Revision: D15530498

Pulled By: izdeby

fbshipit-source-id: d5d38f9610caa55fb7179d41f568c5ea5fa1f2e2
2019-05-29 09:32:55 -07:00
1d4685c20f Improve test_proper_exit error printing (#20166)
Summary:
This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166

Differential Revision: D15536504

Pulled By: ezyang

fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356
2019-05-29 07:51:31 -07:00
aa42742df0 ctc_loss: fix backward when 2d target tensor is larger than max_target_length (#20971)
Summary:
Previously, we didn't work when 2d target tensors had extra columns at the end. Now we just ignore those.
Also fix the confusion in the doc example regarding the number of classes.

Thank you, ypw-rich for the report with reproducing example.

Fixes: #20522
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20971

Differential Revision: D15535481

Pulled By: ezyang

fbshipit-source-id: 397e44e20165fc4fa2547bee9390d4c0b688df93
2019-05-29 05:13:00 -07:00
55f5eb3c47 DilatedMaxPool2d: small cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20936

Differential Revision: D15514542

Pulled By: ezyang

fbshipit-source-id: 6341a4bb8a9ee0b632c32a013ea609d842a21962
2019-05-29 05:06:33 -07:00
f8565121d9 Port dilated_max_pool3d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20933

Differential Revision: D15525485

Pulled By: ezyang

fbshipit-source-id: 6ff44f11d984903cd20d79cfad04963e6443e6ca
2019-05-29 04:58:42 -07:00
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
3038cf8eee Remove THSTensor and SparseTensorRef (#20877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877
ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea

Differential Revision: D15480353

Pulled By: li-roy

fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f
2019-05-29 01:37:03 -07:00
9faa409b56 Fix __irshift__ dispatch (#21047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21047
ghimport-source-id: 8c781c9882eebb07325a1fc7aa6f340bbec18886

Differential Revision: D15529160

Pulled By: li-roy

fbshipit-source-id: d9a444e42df5c509ae10849ba6f8006fbec830c5
2019-05-29 01:03:34 -07:00
8dda19b79f Remove extraneous TensorId checks in as_strided (#21045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21045
ghimport-source-id: e95fbf50bccf6ebc613bb13fb16915254912f22d

Differential Revision: D15528971

Pulled By: li-roy

fbshipit-source-id: c721cc6280dff6e14c5533681d0b35aaa8f98f00
2019-05-29 00:53:53 -07:00
d76546a463 Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (#20932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20932
ghimport-source-id: f0a7f12ffd77aec063a088b18c6b1d108c712df8

Differential Revision: D15501251

Pulled By: zdevito

fbshipit-source-id: 91e6e5944d2663b673afde45fc6eed22f31101c4
2019-05-29 00:14:25 -07:00
5c53aa4869 Make build with makefiles less noisy (#21053)
Summary:
https://github.com/pytorch/pytorch/pull/17783 has made ninja and makefile builds to print out build commands unconditionally, this has made the build log very verbose, e.g. ROCm CI build log becomes >13mb. Large build log make searching for real error hard.
https://github.com/pytorch/pytorch/pull/20508 has reverted the ninja change, and this one reverts the makefile change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21053

Differential Revision: D15533412

Pulled By: bddppq

fbshipit-source-id: ad89b617d06acc670d75d4cf25111a4081e9c95e
2019-05-29 00:08:45 -07:00
9b147961c4 Fix get_gpu_memory_info in non-cuda builds (#21054)
Summary:
#21024 broke master
cc akyrola
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21054

Reviewed By: akyrola

Differential Revision: D15533406

Pulled By: bddppq

fbshipit-source-id: 0dcfa0ce865e109b46280ef1786dbc7a8af30739
2019-05-28 23:05:15 -07:00
ffdce79078 Deprecate variadic inputs of checkpoint_sequential (#21006)
Summary:
I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`.

I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith.

Please review this pull request. Any comment will be my pleasure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006

Differential Revision: D15530801

Pulled By: soumith

fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183
2019-05-28 21:33:45 -07:00
d23d04f17f Allow nondet_tol for nondeterminism in gradcheck and gradgradcheck (#20980)
Summary:
gradcheck currently includes a determinism check (although only trying twice and seeing if results match).
This can lead to flaky tests, e.g. in #20971, but also #13818.
This adds nondet_tol for both gradcheck and gradgradcheck. It does not change / reenable any tests yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20980

Differential Revision: D15530129

Pulled By: soumith

fbshipit-source-id: 04d7f85b5b59cd62867820c74b064ba14f4fa7f8
2019-05-28 21:26:13 -07:00
d190450a35 Fix typo in CyclicLR docs (#21021)
Summary:
Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021

Differential Revision: D15530109

Pulled By: soumith

fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286
2019-05-28 21:18:50 -07:00
f1fe4b1114 add simple memory analyzer and log warning if GPU underutilized (#21024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21024

Add a new pybinded call to CUDAGetMemoryInfo.

Reviewed By: wesolwsk

Differential Revision: D15520607

fbshipit-source-id: f6d04e48f7d7cb089fc52fa8835cfee3f452d2f1
2019-05-28 19:58:54 -07:00
1bed5f39f4 Fix warning in register_c10_ops by making index unsigned (#20964)
Summary:
Just an annoying warning that's been popping up a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20964

Differential Revision: D15531064

Pulled By: Chillee

fbshipit-source-id: 9580115676c5e246481054bbfc749a551a3cca5e
2019-05-28 18:02:09 -07:00
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
c1048182be Use constants from math.h for gelu op (#20974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20974

Use constants from math.h for gelu op

Reviewed By: hl475, houseroad

Differential Revision: D15511736

fbshipit-source-id: 7d069888fb5c7c310774d056f18711365b39b8e4
2019-05-28 17:52:34 -07:00
0290897bca tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603

When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.

Reviewed By: ilia-cher

Differential Revision: D14820008

fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9
2019-05-28 17:39:23 -07:00
9a989ec469 Add an option to stop the build process once cmake terminates. (#21034)
Summary:
Add an option to setup.py to stop the build process once cmake terminates. This leaves users a chance to fine adjust build options. Also update README accordingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21034

Differential Revision: D15530096

Pulled By: soumith

fbshipit-source-id: 71ac6ff8483c3ee77c38d88f0d059db53a7d3901
2019-05-28 17:11:00 -07:00
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
2791a44948 Renaming the relu kernel and adding hypothesis tests (#20647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20647

The initial assumption was that `qint8` would be unsigned. After introduction of `quint8` and `qint8`, some tests break.

Reviewed By: jerryzh168

Differential Revision: D15332106

fbshipit-source-id: 6ed18da428915aea918a363c5f38754a3c75d06b
2019-05-28 16:46:44 -07:00
d6d192e0af Added engine information to the profiling result. (#20493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493

This helps distinguish if the op was a quantized op or not.

Reviewed By: salexspb

Differential Revision: D15337854

fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94
2019-05-28 16:41:12 -07:00
7afa75006e Enable operator profiling via command line (#20173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173

Enabled op profiling even when net type is not dag or prof dag. Also added
engine type info to summary.

Reviewed By: salexspb, ilia-cher

Differential Revision: D15177813

fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb
2019-05-28 16:41:08 -07:00
2ba608b4a0 Fixed gcd to use 64 bit integers (#21041)
Summary:
Not much to say. Fixes implementation introduced here: https://github.com/pytorch/pytorch/pull/19115
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21041

Differential Revision: D15528801

Pulled By: Chillee

fbshipit-source-id: bacd709eb711ca00156bd70480d6051b437517ed
2019-05-28 16:20:55 -07:00
28079c3906 Make ScriptModule.training an attribute instead of a parameter (#19587)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19587 [jit] Make ScriptModule.training an attribute instead of a parameter**

Remove the hack we had previously where `training` was a buffer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19587

Differential Revision: D15502768

Pulled By: driazati

fbshipit-source-id: 3022f2d57ec6849868f9225d9bc2bfb7828cb318
2019-05-28 16:06:46 -07:00
18809f7b0b Better error message in __get_state__ to let a user know that ScriptModules can't be deep-copied atm (#20885)
Summary:
Before we look into supporting `deepcopy` we could at least improve an error msg.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20885

Differential Revision: D15511023

Pulled By: Krovatkin

fbshipit-source-id: 93b8730a2cc663eee0147f14d3341d0606748eaf
2019-05-28 15:09:07 -07:00
07c4e45ca6 Some minor fixes for the changes in #20945 (#21008)
Summary:
Fixes after #20945
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21008

Differential Revision: D15526193

Pulled By: ezyang

fbshipit-source-id: 4cfabc482c149e0aeb92ae7fff04098771fe33ed
2019-05-28 14:48:50 -07:00
0885dd28c8 refactor register_prim_ops (#21001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21001
ghimport-source-id: f1b8e3999bf18fb0f3b857a13c3e3f609e1e4b4e

Differential Revision: D15523445

Pulled By: wanchaol

fbshipit-source-id: c1e29b0985bde580703a1fca9df46da773826df6
2019-05-28 14:11:04 -07:00
b85c52923b Re-land "Fix advanced indexing on "huge" Tensors" (#21019)
Summary:
This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019

Differential Revision: D15518477

Pulled By: colesbury

fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34
2019-05-28 12:45:56 -07:00
52d27890dc Improve error message for missing attribute (#20779)
Summary:
Fixes #20495 .

Now for
```python
        class A(torch.jit.ScriptModule):
            def __init__(self):
                super(A, self).__init__()

            torch.jit.script_method
            def forward(self, x):
                return x + self.whatisgoingon

        class B(A):
            def __init__(self):
                super(B, self).__init__()
            torch.jit.script_method
            def bar(self, x):
                return x * x

        A()
```
it does
```
RuntimeError:
attribute 'whatisgoingon' does not exist:
torch.jit.script_method
def forward(self, x):
    return x + self.whatisgoingon
               ~~~~~~~~~~~~~~~~~~ <--- HERE

```

I added a test in `test_jit.py` as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20779

Differential Revision: D15441138

Pulled By: Chillee

fbshipit-source-id: 88f458c36b5e32a1ffc467b27bbc28a3c5c07321
2019-05-28 12:27:52 -07:00
bc10677fcb Some name and variable cleanup (#20861)
Summary:
As a part of https://github.com/pytorch/pytorch/pull/20580 I noticed that we had some unusual variable naming in `summary.py`. This cleans it up and also removes some variables that weren't being used.

I'll wait until we have an `add_custom_scalars` test to land this.

cc lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20861

Differential Revision: D15503420

Pulled By: orionr

fbshipit-source-id: 86d105a346198a1ca543d1c5d297804402ab5a0c
2019-05-28 12:22:47 -07:00
99674eb86f Re-enable test_dag_net_forking on ROCm (#21013)
Summary:
Fixes #16229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013

Differential Revision: D15515824

Pulled By: bddppq

fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0
2019-05-28 12:12:53 -07:00
082936f033 Clarify cycliclr param docs (#20880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880

This clarifies how the momentum parameters should be used.

Reviewed By: soumith

Differential Revision: D15482450

fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766
2019-05-28 12:07:47 -07:00
68c3ef72b5 Change bound shape inference for LengthsRangeFill & GatherRanges, add more tests (#20610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20610

Change InferLengthsRangeFill
Add InferGatherRanges
add tests from ClipRangesGatherSigridHash all the way to SparseLengthsWeightedSum
add tests from SigridTransforms all the way to SparseLengthsWeightedSum

e2e test will be added in the following diff

Reviewed By: ipiszy

Differential Revision: D15382730

fbshipit-source-id: a611cd129007a273dfc43955cd99af1c4ed04efd
2019-05-28 11:33:51 -07:00
bbe3411846 Refactor schema_matching.cpp (#20549)
Summary:
It was kind of hard to read through this code so this adds a bunch of comments, no behavior should be changed

](https://our.intern.facebook.com/intern/diff/15499974/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20549

Pulled By: driazati

Differential Revision: D15499974

fbshipit-source-id: 95bf660c3b2bab1c90a2250696cece68bd1925cc
2019-05-28 10:55:09 -07:00
ff6cda0da6 Generate TH functions outside of Type (#20309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20309
ghimport-source-id: d0a0195be53f991f20eb0fbb03edf3814f18b831

Differential Revision: D15509848

Pulled By: li-roy

fbshipit-source-id: 35aafdcb9bb868a41f75cf422c48d357f8655d67
2019-05-28 02:55:51 -07:00
eacb311810 Move 1d tensor checks to TH (#20859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20859
ghimport-source-id: 675cea2c31b48bb5e3b4676640021ace783ea3a8

Differential Revision: D15509850

Pulled By: li-roy

fbshipit-source-id: 468b3b1249d58dd8104643d61d263d1f9b0308bf
2019-05-28 02:55:48 -07:00
d2f14db6cb Change view dispatch to abstract (#20308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20308
ghimport-source-id: cac8d130d45cc36e51d1661c15ad98c10353ea54

Differential Revision: D15509849

Pulled By: li-roy

fbshipit-source-id: 9576028b7075f58c431dc8c12a38c4c5a34c9340
2019-05-28 02:55:41 -07:00
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
82aecfad6a Native ATen/Parallel backend (#20087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20087
ghimport-source-id: bcfc8a86abe0893e4a380fe6f6123e2082ba4317

Differential Revision: D15248663

Pulled By: ilia-cher

fbshipit-source-id: fdb7a8860c85d8202026b629cb7fa344782bd2c4
2019-05-28 01:40:54 -07:00
f4b434a6a5 Fix incorrect torch version in CMake (#21007)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20525
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21007

Differential Revision: D15515260

Pulled By: soumith

fbshipit-source-id: 149084cce276c5e76ca0c5c0872c5e990c47bdfd
2019-05-27 23:46:49 -07:00
0556141339 fix small typo muliprocessing -> multiprocessing (#20998)
Summary:
Minor typo fix in docstring.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998

Differential Revision: D15514698

Pulled By: soumith

fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851
2019-05-27 21:36:13 -07:00
5ddbfc97e9 Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors
Differential Revision:
D15501945

Original commit changeset: e876e678e866

fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581
2019-05-27 20:26:37 -07:00
3b0d431bf5 Check for incompatible versions between CUDA and MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20945

Differential Revision: D15514576

Pulled By: ezyang

fbshipit-source-id: 3c0b8b64edce236a84a7195605d437a00a67b7f4
2019-05-27 19:22:21 -07:00
0d35f14565 Update cuSPARSE namespace collision w/ CUDA 10.1 Update 1 (#20889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20889
ghimport-source-id: 8f5f500fa542d4992cd9213923e1af8de115ee58

Differential Revision: D15495545

Pulled By: ezyang

fbshipit-source-id: 60057cf13694158299a8124b1a787cb4e3c21d21
2019-05-27 18:43:32 -07:00
9d9751f634 Convert dequantize_linear to an internal function _dequantize_linear (#20938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938

Dequantize_linear need not be exposed to the front end users.
It will only be used for the jit passes for q-dq insertion and op
substitution.

Differential Revision: D15446097

fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d
2019-05-27 15:40:21 -07:00
8e3311c5e2 Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653)
Summary:
Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653

Differential Revision: D15398888

Pulled By: cpuhrsch

fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929
2019-05-27 15:12:58 -07:00
6e76813a39 fix SyncBatchNorm doc (#20991)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20991

Differential Revision: D15513518

Pulled By: soumith

fbshipit-source-id: 9618c0b2442e013e4d37793cdb04cb4f4b1b141c
2019-05-27 14:46:58 -07:00
ebc8d7170e fix the bug for mkldnn clone (#20943)
Summary:
This PR is to solve the bug for clone a MKLDNN tensor, please see the issue https://github.com/pytorch/pytorch/issues/20895.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20943

Differential Revision: D15511516

Pulled By: mrshenli

fbshipit-source-id: 05b41d6c7eaf8703521f4c768b8f26ec8501dc5e
2019-05-27 12:09:52 -07:00
6480d3f140 Revert D15511921: [pytorch][PR] BatchSampler now uses list.clear() instead of creating new objects
Differential Revision:
D15511921

Original commit changeset: e943d21e75e1

fbshipit-source-id: 933b7ef74c7a530f0a2cc087c8ee6f0455cf9239
2019-05-27 10:51:24 -07:00
482ae8e6b2 BatchSampler now uses list.clear() instead of creating new objects
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20976

Differential Revision: D15511921

Pulled By: soumith

fbshipit-source-id: e943d21e75e19f9154a0570f3188cc3ce174083e
2019-05-26 23:45:26 -07:00
ecf012213b Update submodule URL based on redirection. (#20973)
Summary:
Changes:
  - protobuf has been moved to protocolbuffers/protobuf a while ago.
  - cpuinfo has been moved to pytorch/cpuinfo and updated in FBGEMM recently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20973

Differential Revision: D15511926

Pulled By: soumith

fbshipit-source-id: 2c50373c9b245524f839bd1059870dd2b84e3b81
2019-05-26 22:29:21 -07:00
bb89827e1d Update cuda pinned memory note to include tensor.to (#20977)
Summary:
separate bits of changes from #19228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977

Differential Revision: D15511919

Pulled By: soumith

fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec
2019-05-26 22:22:06 -07:00
1e8f129a05 In setup.py, also check some submodules of submodules. (#20937)
Summary:
Sometimes users forget using the "--recursive" option when they update submodules. This added check should help expose this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20937

Differential Revision: D15502846

Pulled By: mrshenli

fbshipit-source-id: 34c28a2c71ee6442d16b8b741ea44a18733b1536
2019-05-26 18:43:24 -07:00
8dbdd00f87 tweak tqdm to have download speed in kB/MB/etc (#20908)
Summary:
This changes the progress bars in `_download_url_to_file` from saying things like `49773343.40it/s` to `47.5MB/s`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20908

Differential Revision: D15511223

Pulled By: soumith

fbshipit-source-id: 2422eb5fb486f9ef4bd69c556c4ed1775b8b2860
2019-05-26 15:34:47 -07:00
5ab6e07180 .view(...) now suggests .reshape(...) instead .contiguous().view(...)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20968

Differential Revision: D15511236

Pulled By: soumith

fbshipit-source-id: 673fc2982ad6ea287fdd0cff2684bdc2317a6709
2019-05-26 15:34:44 -07:00
c611630b9d Fix subscripts in RNN documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20949

Differential Revision: D15510760

Pulled By: soumith

fbshipit-source-id: 51e9dbea7d8c8194e46e12311e397deff32dbe2f
2019-05-26 14:57:40 -07:00
a3a458ed30 Fix align corner docs (#20961)
Summary:
I believe the `True` and `False` in the doc are reversed :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20961

Differential Revision: D15510806

Pulled By: soumith

fbshipit-source-id: 62566bb595e187506b23dedc24892e48f35b1147
2019-05-26 14:57:37 -07:00
5e69e76aba Remove padding_mode from torch.nn.functional.conv{1,2,3}d's docstr (#20891)
Summary:
Fixes #20694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20891

Differential Revision: D15510790

Pulled By: soumith

fbshipit-source-id: aa3630693c7446bf18a390cb49c4df9bc9c59eea
2019-05-26 14:52:51 -07:00
4c5b1e3460 Update nccl submodule to v2.4.6 (#20882)
Summary:
Fixes #20630

Haven't tested it yet. Let's see if it passes all CI tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20882

Reviewed By: pietern

Differential Revision: D15483561

Pulled By: mrshenli

fbshipit-source-id: 5f0730a04d92906af077b2fe2170b674ca371e6c
2019-05-26 13:00:26 -07:00
9310e600f6 Use a simpler way to delete recursive function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20913

Differential Revision: D15508071

Pulled By: mrshenli

fbshipit-source-id: ad9a0ab4295bb0f1063d43682a10c124d8384635
2019-05-26 12:17:25 -07:00
66e6571eb8 fixed issue #20921 (#20922)
Summary:
For tensor creation ops like `torch.zeros` and `torch.ones`, the docs [0], [1] use `sizes` as the first argument to the function call while the correct argument is `size`.  This is tested for pytorch 1.1 installed using pip on ubuntu 19.04

An example

```
>>> torch.zeros(2, 3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.zeros(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: zeros() missing 1 required positional arguments: "size"
>>> torch.zeros(size = (2, 3))
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.ones(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ones() missing 1 required positional arguments: "size"
>>> torch.ones(size = (2, 3))
tensor([[1., 1., 1.],
        [1., 1., 1.]])
```

[0]: https://pytorch.org/docs/master/torch.html#torch.zeros
[1]: https://pytorch.org/docs/master/torch.html#torch.ones
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20922

Differential Revision: D15498741

Pulled By: mrshenli

fbshipit-source-id: 963324ffa004d62ca77ce30ed6f0c3932b5b79b7
2019-05-25 22:22:18 -07:00
83fe92870d Update multiprocessing note now that shared CUDA tensors are refcounted (#19904)
Summary:
The mp notes are not updated after https://github.com/pytorch/pytorch/pull/16854. (The torch.multiprocessing page is.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19904

Differential Revision: D15509661

Pulled By: soumith

fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0
2019-05-25 17:40:42 -07:00
bdce5533fe Fix pytorch_macos_10_13_py3_test (#20944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20944
ghimport-source-id: da2dcbfaff4e0e75f6b4e836fc1af1d8aee11c56

Differential Revision: D15508912

Pulled By: ezyang

fbshipit-source-id: 6758a8a516d0a875a5f6bbbb12e43d899bcf2161
2019-05-25 08:17:34 -07:00
81e70ffa19 fix bug of not using get_score_cls_index in BoxWithNMSLimitOp (#20868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20868

When `input_boxes_include_bg_cls` is false (which means `input_scores_fg_cls_starting_id` is 0), It doesn't map the class index of score currectly when sorting and limiting the detections over all classes after nms.

Reviewed By: newstzpz

Differential Revision: D15472706

fbshipit-source-id: dc1e808b63ad09fb4bd95acf866771bb3fa92d69
2019-05-24 22:31:01 -07:00
2fb665a9df Add warning about memory overhead when using multiple tiny tensors (#20801)
Summary:
added note in docs regarding #19408
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20801

Differential Revision: D15503351

Pulled By: mrshenli

fbshipit-source-id: 7ab371a7992233fb867aadd4bb6b74fccd232c33
2019-05-24 21:45:51 -07:00
c7e0722814 allow pass ordered dict for nn sequential (#20796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20796
ghimport-source-id: 9f895a2a6ebc71984196b868dc3ea6a12286bc81

Differential Revision: D15505330

Pulled By: wanchaol

fbshipit-source-id: 2922c56606b477a34f4e6433fa790d5b2de9d77a
2019-05-24 20:31:05 -07:00
b93bdf6989 Fix advanced indexing on "huge" Tensors (#20919)
Summary:
This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

Fixes #20888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919

Differential Revision: D15501945

Pulled By: colesbury

fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0
2019-05-24 16:25:04 -07:00
430d1a2761 Attempt to fix flaky test_structseq_repr (#20931)
Summary:
Previously, this used `crepr` afer the decref of `repr`. This is not
allowed because `repr` owns the cached copy of `crepr`.

Let's see if this fixes the contbuild.

See #20926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20931

Differential Revision: D15501929

Pulled By: colesbury

fbshipit-source-id: 24141ba62df8758d2a3998cf7c2054be09088b6a
2019-05-24 15:55:44 -07:00
b1df8bfe8a Reduce set of build/tests which run on PRs. (#20930)
Summary:
Resubmit of #20775

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20930

Differential Revision: D15503173

Pulled By: ezyang

fbshipit-source-id: a5de8eacf6b29ee26f07ac53c915fff3f4d32569
2019-05-24 15:25:37 -07:00
c46c6a4fe6 Zero slice bug (#20914)
Summary:
Bug reported internally at FB:

```python
>>> t=torch.from_numpy(np.empty((0,4)))
>>> t[:,1::2]*=1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76
```

This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays.

(We could *also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914

Differential Revision: D15497860

Pulled By: umanwizard

fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9
2019-05-24 15:10:59 -07:00
3858e1684b Don't print backtrace for interpreter errors (#20925)
Summary:
Eager Python errors don't include a backtrace so script shouldn't either

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20925

Pulled By: driazati

Differential Revision: D15499952

fbshipit-source-id: 1169f13ba5578cd52948725eda73de8229146bb1
2019-05-24 14:58:48 -07:00
371bd043d6 register ResizeNearestOp to C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20928

Reviewed By: smessmer

Differential Revision: D15499661

fbshipit-source-id: 5af24d5c9d7ff739b8355e19dfe66b496bc026a5
2019-05-24 14:39:11 -07:00
b5a5e296aa Support 3D mesh/point cloud (#20413)
Summary:
I started adding support for the new **[mesh/point cloud](https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/g3doc/tensorboard.md)** data type introduced to TensorBoard recently.

I created the functions to add the data, created the appropriate summaries.
This new data type however requires a **Merged** summary containing the data for the vertices, colors and faces.

I got stuck at this stage. Maybe someone can help. lanpa?

I converted the example code by Google to PyTorch:
```python
import numpy as np
import trimesh

import torch
from torch.utils.tensorboard import SummaryWriter

sample_mesh = 'https://storage.googleapis.com/tensorflow-graphics/tensorboard/test_data/ShortDance07_a175_00001.ply'
log_dir = 'runs/torch'
batch_size = 1

# Camera and scene configuration.
config_dict = {
    'camera': {'cls': 'PerspectiveCamera', 'fov': 75},
    'lights': [
        {
            'cls': 'AmbientLight',
            'color': '#ffffff',
            'intensity': 0.75,
        }, {
            'cls': 'DirectionalLight',
            'color': '#ffffff',
            'intensity': 0.75,
            'position': [0, -1, 2],
        }],
    'material': {
        'cls': 'MeshStandardMaterial',
        'roughness': 1,
        'metalness': 0
    }
}

# Read all sample PLY files.
mesh = trimesh.load_remote(sample_mesh)
vertices = np.array(mesh.vertices)
# Currently only supports RGB colors.
colors = np.array(mesh.visual.vertex_colors[:, :3])
faces = np.array(mesh.faces)

# Add batch dimension, so our data will be of shape BxNxC.
vertices = np.expand_dims(vertices, 0)
colors = np.expand_dims(colors, 0)
faces = np.expand_dims(faces, 0)

# Create data placeholders of the same shape as data itself.
vertices_tensor = torch.as_tensor(vertices)
faces_tensor = torch.as_tensor(faces)
colors_tensor = torch.as_tensor(colors)

writer = SummaryWriter(log_dir)

writer.add_mesh('mesh_color_tensor', vertices=vertices_tensor, faces=faces_tensor,
                colors=colors_tensor, config_dict=config_dict)

writer.close()
```

I tried adding only the vertex summary, hence the others are supposed to be optional.
I got the following error from TensorBoard and it also didn't display the points:
```
Traceback (most recent call last):
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 302, in run_wsgi
    execute(self.server.app)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 290, in execute
    application_iter = app(environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 309, in __call__
    return self.data_applications[clean_path](environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/wrappers/base_request.py", line 235, in application
    resp = f(*args[:-2] + (request,))
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 252, in _serve_mesh_metadata
    tensor_events = self._collect_tensor_events(request)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 188, in _collect_tensor_events
    tensors = self._multiplexer.Tensors(run, instance_tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 400, in Tensors
    return accumulator.Tensors(tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 437, in Tensors
    return self.tensors_by_tag[tag].Items(_TENSOR_RESERVOIR_KEY)
KeyError: 'mesh_color_tensor_COLOR'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20413

Differential Revision: D15500737

Pulled By: orionr

fbshipit-source-id: 426e8b966037d08c065bce5198fd485fd80a2b67
2019-05-24 14:30:58 -07:00
6063ffd055 Specify dispatch key with kernel (#20821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821

Change registration API. Instead of

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>()
        .dispatchKey(CPUTensorId()));

it is now

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(CPUTensorId()));

This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers.

The semantic problem behind this is that the dispatch key is a *kernel config parameter* and not an *operator config parameter* while things like autograd wrappers, alias info, and actually the kernel itself are *operator config parameters*. And while previously, the different kind of config parameters have been mixed, this diff now separates them.

Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example.

    // what is this supposed to do?
    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .aliasInfo(DEFAULT)
        .dispatchKey(CPUTensorId()));

If we get more kernel config parameters in the future, we could introduce something like this

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(torch::RegisterOperators::kernelOptions()
            .dispatchKey(CPUTensorId())
            .otherConfig());

but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility.

A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call:

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel1>(CPUTensorId())
        .kernel<Kernel2>(CUDATensorId()));

Reviewed By: dzhulgakov

Differential Revision: D15455790

fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11
2019-05-24 14:23:35 -07:00
a2328a27e9 Improve torch.cdist performance (#20605)
Summary:
Fix based on https://github.com/pytorch/pytorch/issues/15253
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20605

Differential Revision: D15396123

Pulled By: ifedan

fbshipit-source-id: 3ed373e68339a35360f083d4aad1b655abcaf97e
2019-05-24 14:06:55 -07:00
4501dc305d Assert against using Operator methods not supported when exporting it to c10 (#17818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14392459

fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52
2019-05-24 13:45:01 -07:00
c8f404a68e Revert D15499918: Reduce set of build/tests which run on PRs.
Differential Revision:
D15499918

Original commit changeset: 992e3f91f95d

fbshipit-source-id: 86fc43d3da7ea3e3a32e95fc4f4f3de6cbd5d49b
2019-05-24 12:55:04 -07:00
d03265b44f Reduce set of build/tests which run on PRs. (#20775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20775
ghimport-source-id: 8d05ed03b8a841233d578a38b7c84bd1152c08e5

Differential Revision: D15499918

Pulled By: ezyang

fbshipit-source-id: 992e3f91f95dd9c0564e5ed6793dd1b286ddba00
2019-05-24 12:29:52 -07:00
dee11a92c1 Use Device instead of Backend in TensorIterator (#20690)
Summary:
This PR also moves Device::validate into the header file, which makes
statements like `Device d = kCPU` effectively free.

Device includes the device's index, so TensorIterator::compute_types
now implicitly checks that all CUDA inputs are on the same GPU.
Previously, this was done ad-hoc in places like TensorIterator::binary_op.

Note that zero-dim Tensor (scalars) are NOT required to be on the
same device as other inputs because they behave almost like Python numbers.
TensorIterator handles copying zero-dim Tensors to the common device.

Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU
and GPU, but not between different GPUs (because Backend didn't encode
the GPU index). This removes that restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690

Differential Revision: D15414826

Pulled By: colesbury

fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f
2019-05-24 12:14:08 -07:00
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
2968 changed files with 293768 additions and 109054 deletions

View File

@ -1,3 +1,23 @@
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
@ -35,4 +55,450 @@ Future direction
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 2.7m, 2.7mu, 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos: 2.7, 3.5, 3.6, 3.7
* windows: 3.5, 3.6, 3.7
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies (the only supported option for Windows)
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binarybuilds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
* This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Azure Pipelines structure of the binaries
TODO: fill in stuff
## How are the workflows structured?
TODO: fill in stuff
## How are the jobs structured?
TODO: fill in stuff
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
- windows/
- build_pytorch.bat # Entrypoint for wheel builds on Windows
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Linux, MacOS and Windows use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tldr; on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for Linux, MacOS and Windows
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## Windows Wheels (Windows pip and libtorch packages)
The entrypoint file `builder/windows/build_pytorch.bat` is complicated because
* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusing. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* soumith/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* soumith/manylinux-cuda90
* soumith/manylinux-cuda92
* soumith/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```
# Run the docker
# Use the correct docker image, soumith/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it soumith/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a loong time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.
### Windows
TODO: fill in

View File

@ -59,11 +59,21 @@ CONFIG_TREE_DATA = OrderedDict(
)),
)
DEVTOOLSET_VERSIONS = [
3,
7,
]
# GCC config variants:
#
# All the nightlies (except libtorch with new gcc ABI) are built with devtoolset7,
# which can only build with old gcc ABI. It is better than devtoolset3
# because it understands avx512, which is needed for good fbgemm performance.
#
# Libtorch with new gcc ABI is built with gcc 5.4 on Ubuntu 16.04.
LINUX_GCC_CONFIG_VARIANTS = OrderedDict(
manywheel=['devtoolset7'],
conda=['devtoolset7'],
libtorch=[
"devtoolset7",
"gcc5.4_cxx11-abi",
],
)
class TopLevelNode(ConfigNode):
@ -97,24 +107,24 @@ class PackageFormatConfigNode(ConfigNode):
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux" and self.find_prop("package_format") != "conda":
return [LinuxGccConfigNode(self, v) for v in DEVTOOLSET_VERSIONS]
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
class LinuxGccConfigNode(ConfigNode):
def __init__(self, parent, devtoolset_version):
super(LinuxGccConfigNode, self).__init__(parent, "DEVTOOLSET=" + str(devtoolset_version))
def __init__(self, parent, gcc_config_variant):
super(LinuxGccConfigNode, self).__init__(parent, "GCC_CONFIG_VARIANT=" + str(gcc_config_variant))
self.props["devtoolset_version"] = devtoolset_version
self.props["gcc_config_variant"] = gcc_config_variant
def get_children(self):
cuda_versions = self.find_prop("cuda_versions")
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("devtoolset_version") == 7:
if self.find_prop("gcc_config_variant") == 'devtoolset7':
cuda_versions = filter(lambda x: x != "90", cuda_versions)
return [ArchConfigNode(self, v) for v in cuda_versions]
@ -142,7 +152,7 @@ class PyVersionConfigNode(ConfigNode):
package_format = self.find_prop("package_format")
os_name = self.find_prop("os_name")
has_libtorch_variants = smoke and package_format == "libtorch" and os_name == "linux"
has_libtorch_variants = package_format == "libtorch" and os_name == "linux"
linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []
return [LinkingVariantConfigNode(self, v) for v in linking_variants]

View File

@ -9,7 +9,7 @@ import cimodel.lib.visualization as visualization
class Conf(object):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, devtoolset_version):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant):
self.os = os
self.cuda_version = cuda_version
@ -17,15 +17,17 @@ class Conf(object):
self.parms = parms
self.smoke = smoke
self.libtorch_variant = libtorch_variant
self.devtoolset_version = devtoolset_version
self.gcc_config_variant = gcc_config_variant
def gen_build_env_parms(self):
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]
if self.devtoolset_version is not None:
elems.append("devtoolset" + str(self.devtoolset_version))
if self.gcc_config_variant is not None:
elems.append(str(self.gcc_config_variant))
return elems
def gen_docker_image(self):
if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
return miniutils.quote("soumith/conda-cuda-cxx11-ubuntu1604:latest")
docker_word_substitution = {
"manywheel": "manylinux",
@ -34,8 +36,8 @@ class Conf(object):
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the soumith/manylinux-cuda80 docker image
alt_docker_suffix = self.cuda_version or "80"
# The cpu nightlies are built on the soumith/manylinux-cuda100 docker image
alt_docker_suffix = self.cuda_version or "100"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
@ -46,44 +48,45 @@ class Conf(object):
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
if self.smoke:
if self.libtorch_variant:
parts.append(self.libtorch_variant)
else:
if self.libtorch_variant:
parts.append(self.libtorch_variant)
if not self.smoke:
parts.append(build_or_test)
return "_".join(parts)
def gen_yaml_tree(self, build_or_test):
env_tuples = [("BUILD_ENVIRONMENT", miniutils.quote(" ".join(self.gen_build_env_parms())))]
joined = "_".join(parts)
return joined.replace(".", "_")
def gen_workflow_job(self, phase, upload_phase_dependency=None):
job_def = OrderedDict()
job_def["name"] = self.gen_build_name(phase)
job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))
job_def["requires"] = ["setup"]
job_def["filters"] = {"branches": {"only": "nightly"}}
if self.libtorch_variant:
env_tuples.append(("LIBTORCH_VARIANT", miniutils.quote(self.libtorch_variant)))
job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)
if phase == "test":
if not self.smoke:
job_def["requires"].append(self.gen_build_name("build"))
if not (self.smoke and self.os == "macos"):
job_def["docker_image"] = self.gen_docker_image()
if self.cuda_version:
job_def["use_cuda_docker_runtime"] = miniutils.quote("1")
else:
if self.os == "linux" and phase != "upload":
job_def["docker_image"] = self.gen_docker_image()
if phase == "test":
if self.cuda_version:
job_def["resource_class"] = "gpu.medium"
if phase == "upload":
job_def["context"] = "org-member"
job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency)]
os_name = miniutils.override(self.os, {"macos": "mac"})
d = {"<<": "*" + "_".join([self.get_name_prefix(), os_name, build_or_test])}
if build_or_test == "test":
if not (self.smoke and self.os == "macos"):
env_tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))
if self.cuda_version:
env_tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))
else:
if self.os == "linux" and build_or_test != "upload":
d["docker"] = [{"image": self.gen_docker_image()}]
d["environment"] = OrderedDict(env_tuples)
if build_or_test == "test":
if self.cuda_version:
d["resource_class"] = "gpu.medium"
return d
job_name = "_".join([self.get_name_prefix(), os_name, phase])
return {job_name : job_def}
def get_root(smoke, name):
@ -108,7 +111,7 @@ def gen_build_env_list(smoke):
[c.find_prop("pyver")],
c.find_prop("smoke"),
c.find_prop("libtorch_variant"),
c.find_prop("devtoolset_version"),
c.find_prop("gcc_config_variant"),
)
newlist.append(conf)
@ -116,31 +119,17 @@ def gen_build_env_list(smoke):
def predicate_exclude_nonlinux_and_libtorch(config):
return config.os == "linux" and (config.smoke or config.pydistro != "libtorch")
return config.os == "linux"
def add_build_entries(jobs_dict, phase, smoke, filter_predicate=lambda x: True):
configs = gen_build_env_list(smoke)
for conf_options in filter(filter_predicate, configs):
jobs_dict[conf_options.gen_build_name(phase)] = conf_options.gen_yaml_tree(phase)
def add_binary_build_specs(jobs_dict):
add_build_entries(jobs_dict, "build", False)
def add_binary_build_tests(jobs_dict):
add_build_entries(jobs_dict, "test", False, predicate_exclude_nonlinux_and_libtorch)
def add_binary_build_uploads(jobs_dict):
add_build_entries(jobs_dict, "upload", False)
def add_smoke_test_specs(jobs_dict):
add_build_entries(jobs_dict, "test", True)
def get_nightly_uploads():
configs = gen_build_env_list(False)
mylist = []
for conf in configs:
phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"
mylist.append(conf.gen_workflow_job("upload", phase_dependency))
return mylist
def get_nightly_tests():
@ -149,55 +138,22 @@ def get_nightly_tests():
tests = []
for conf_options in filtered_configs:
params = {"requires": ["setup", conf_options.gen_build_name("build")]}
tests.append({conf_options.gen_build_name("test"): params})
yaml_item = conf_options.gen_workflow_job("test")
tests.append(yaml_item)
return tests
def get_nightly_uploads():
configs = gen_build_env_list(False)
def gen_config(conf, phase_dependency):
return {
conf.gen_build_name("upload"): OrderedDict([
("context", "org-member"),
("requires", ["setup", conf.gen_build_name(phase_dependency)]),
]),
}
mylist = []
for conf in configs:
phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"
mylist.append(gen_config(conf, phase_dependency))
return mylist
def gen_schedule_tree(cron_timing):
return [{
"schedule": {
"cron": miniutils.quote(cron_timing),
"filters": {
"branches": {
"only": ["master"],
},
},
},
}]
def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):
jobs_list = ["setup"]
configs = gen_build_env_list(smoke)
phase = "build" if toplevel_key == "binarybuilds" else "test"
for build_config in configs:
build_name = build_config.gen_build_name("build")
jobs_list.append({build_name: {"requires": ["setup"]}})
jobs_list.append(build_config.gen_workflow_job(phase))
jobs_dict[toplevel_key] = OrderedDict(
triggers=gen_schedule_tree(cron_schedule),
jobs=jobs_list,
)

View File

@ -1,8 +1,7 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
from cimodel.lib.conf_tree import Ver
import cimodel.data.dimensions as dimensions
CONFIG_TREE_DATA = [
@ -14,16 +13,17 @@ CONFIG_TREE_DATA = [
(Ver("cuda", "9.0"), [
# TODO make explicit that this is a "secret TensorRT build"
# (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)
# TODO Uh oh, were we supposed to make this one important?!
X("py2"),
X("cmake"),
XImportant("cmake"),
]),
(Ver("cuda", "9.1"), [X("py2")]),
(Ver("mkl"), [X("py2")]),
(Ver("gcc", "5"), [X("onnx_py2")]),
(Ver("cuda", "10.1"), [XImportant("py3.5")]), # TensorRT 6 build
(Ver("mkl"), [XImportant("py2")]),
(Ver("gcc", "5"), [XImportant("onnx_py2")]),
(Ver("clang", "3.8"), [X("py2")]),
(Ver("clang", "3.9"), [X("py2")]),
(Ver("clang", "7"), [X("py2")]),
(Ver("android"), [X("py2")]),
(Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),
(Ver("android"), [XImportant("py2")]),
]),
(Ver("centos", "7"), [
(Ver("cuda", "9.0"), [X("py2")]),
@ -32,7 +32,7 @@ CONFIG_TREE_DATA = [
# TODO ios and system aren't related. system qualifies where the python comes
# from (use the system python instead of homebrew or anaconda)
(Ver("ios"), [X("py2")]),
(Ver("system"), [X("py2")]),
(Ver("system"), [XImportant("py2")]),
]),
]
@ -54,6 +54,8 @@ class TreeConfigNode(ConfigNode):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_py3.6":
return False
return str(self.find_prop("compiler_version")) in [
"gcc4.9",
"clang3.8",
@ -95,16 +97,13 @@ class LanguageConfigNode(TreeConfigNode):
self.props["language_version"] = node_name
self.props["build_only"] = self.is_build_only()
def get_children(self):
children = []
for phase in dimensions.PHASES:
if phase == "build" or not self.props["build_only"]:
children.append(PhaseConfigNode(self, phase, []))
return children
def child_constructor(self):
return ImportantConfigNode
class PhaseConfigNode(TreeConfigNode):
class ImportantConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["phase_name"] = node_name
self.props["important"] = True
def get_children(self):
return []

View File

@ -2,30 +2,34 @@
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
from cimodel.lib.conf_tree import Ver
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode
from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = 276
DOCKER_IMAGE_VERSION = 324
class Conf(object):
def __init__(self, language, distro, compiler, phase, build_only):
self.language = language
self.distro = distro
self.compiler = compiler
self.phase = phase
self.build_only = build_only
@dataclass
class Conf:
language: str
distro: Ver
compiler: Ver
build_only: bool
is_important: bool
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_py3.6" \
or self.compiler.name in ["android", "mkl", "clang"] \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
@ -44,9 +48,6 @@ class Conf(object):
root_parts = self.get_build_name_root_parts()
return "_".join(root_parts + [phase]).replace(".", "_")
def get_name(self):
return self.construct_phase_name(self.phase)
def get_platform(self):
platform = self.distro.name
if self.distro.name != "macos":
@ -57,6 +58,7 @@ class Conf(object):
lang_substitutions = {
"onnx_py2": "py2",
"onnx_py3.6": "py3.6",
"cmake": "py2",
}
@ -64,12 +66,11 @@ class Conf(object):
parts = [lang] + self.get_build_name_middle_parts()
return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))
def gen_yaml_tree(self):
tuples = []
def gen_workflow_params(self, phase):
parameters = OrderedDict()
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3.6": "onnx-py3.6",
}
lang = miniutils.override(self.language, lang_substitutions)
@ -77,39 +78,42 @@ class Conf(object):
parts = [
"caffe2",
lang,
] + self.get_build_name_middle_parts() + [self.phase]
build_env = "-".join(parts)
if not self.distro.name == "macos":
build_env = miniutils.quote(build_env)
tuples.append(("BUILD_ENVIRONMENT", build_env))
] + self.get_build_name_middle_parts() + [phase]
build_env_name = "-".join(parts)
parameters["build_environment"] = miniutils.quote(build_env_name)
if self.compiler.name == "ios":
tuples.append(("BUILD_IOS", miniutils.quote("1")))
if self.phase == "test":
parameters["build_ios"] = miniutils.quote("1")
if phase == "test":
# TODO cuda should not be considered a compiler
if self.compiler.name == "cuda":
tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))
parameters["use_cuda_docker_runtime"] = miniutils.quote("1")
if self.distro.name == "macos":
tuples.append(("PYTHON_VERSION", miniutils.quote("2")))
else:
tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))
if self.distro.name != "macos":
parameters["docker_image"] = self.gen_docker_image()
if self.build_only:
tuples.append(("BUILD_ONLY", miniutils.quote("1")))
d = OrderedDict({"environment": OrderedDict(tuples)})
if self.phase == "test":
parameters["build_only"] = miniutils.quote("1")
if phase == "test":
resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"
d["resource_class"] = resource_class
parameters["resource_class"] = resource_class
d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), self.phase, "defaults"])
return parameters
return d
def gen_workflow_job(self, phase):
job_def = OrderedDict()
job_def["name"] = self.construct_phase_name(phase)
job_def["requires"] = ["setup"]
if phase == "test":
job_def["requires"].append(self.construct_phase_name("build"))
job_name = "caffe2_" + self.get_platform() + "_test"
else:
job_name = "caffe2_" + self.get_platform() + "_build"
if not self.is_important:
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}
def get_root():
@ -125,11 +129,11 @@ def instantiate_configs():
for fc in found_configs:
c = Conf(
fc.find_prop("language_version"),
fc.find_prop("distro_version"),
fc.find_prop("compiler_version"),
fc.find_prop("phase_name"),
fc.find_prop("build_only"),
language=fc.find_prop("language_version"),
distro=fc.find_prop("distro_version"),
compiler=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
is_important=fc.find_prop("important"),
)
config_list.append(c)
@ -137,17 +141,7 @@ def instantiate_configs():
return config_list
def add_caffe2_builds(jobs_dict):
configs = instantiate_configs()
for conf_options in configs:
jobs_dict[conf_options.get_name()] = conf_options.gen_yaml_tree()
graph = visualization.generate_graph(get_root())
graph.draw("caffe2-config-dimensions.png", prog="twopi")
def get_caffe2_workflows():
def get_workflow_jobs():
configs = instantiate_configs()
@ -158,11 +152,11 @@ def get_caffe2_workflows():
x = []
for conf_options in filtered_configs:
requires = ["setup"]
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
if conf_options.phase == "test":
requires.append(conf_options.construct_phase_name("build"))
x.append({conf_options.get_name(): {"requires": requires}})
for phase in phases:
x.append(conf_options.gen_workflow_job(phase))
return x

View File

@ -5,8 +5,9 @@ PHASES = ["build", "test"]
CUDA_VERSIONS = [
None, # cpu build
"90",
"92",
"100",
"101",
]
STANDARD_PYTHON_VERSIONS = [

View File

@ -1,31 +1,38 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
CONFIG_TREE_DATA = [
("trusty", [
("xenial", [
(None, [
X("2.7.9"),
XImportant("2.7.9"),
X("2.7"),
("3.5", [("important", [X(True)])]),
XImportant("3.5"), # Not run on all PRs, but should be included on [test all]
X("nightly"),
]),
("gcc", [
("4.8", [X("3.6")]),
("5.4", [
X("3.6"),
("5.4", [ # All this subtree rebases to master and then build
XImportant("3.6"),
("3.6", [
("xla", [X(True)]),
("namedtensor", [X(True)]),
("namedtensor", [XImportant(True)]),
]),
]),
("7", [X("3.6")]),
]),
]),
("xenial", [
("clang", [
("5", [X("3.6")]),
("5", [
XImportant("3.6"), # This is actually the ASAN build
("3.6", [
("namedtensor", [XImportant(True)]), # ASAN
]),
]),
("7", [
("3.6", [
("xla", [XImportant(True)]),
]),
]),
]),
("cuda", [
("9", [
@ -36,14 +43,25 @@ CONFIG_TREE_DATA = [
# and
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)
("2.7", [("important", [X(True)])]),
X("3.6"),
X("2.7"),
XImportant("3.6"),
("2.7", [
("namedtensor", [XImportant(True)]),
]),
]),
("9.2", [X("3.6")]),
("10", [X("3.6")]),
("10.1", [X("3.6")]),
]),
("android", [
("r19c", [X("3.6")]),
("r19c", [
("3.6", [
("android_abi", [XImportant("x86_32")]),
("android_abi", [X("x86_64")]),
("android_abi", [X("arm-v7a")]),
("android_abi", [X("arm-v8a")]),
])
]),
]),
]),
]
@ -87,34 +105,11 @@ class DistroConfigNode(TreeConfigNode):
distro = self.find_prop("distro_name")
next_nodes = {
"trusty": TrustyCompilerConfigNode,
"xenial": XenialCompilerConfigNode,
}
return next_nodes[distro]
class TrustyCompilerConfigNode(TreeConfigNode):
def modify_label(self, label):
return label or "<unspecified>"
def init2(self, node_name):
self.props["compiler_name"] = node_name
def child_constructor(self):
return TrustyCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode
class TrustyCompilerVersionConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return PyVerConfigNode
class PyVerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["pyver"] = node_name
@ -136,6 +131,7 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
"xla": XlaConfigNode,
"namedtensor": NamedTensorConfigNode,
"important": ImportantConfigNode,
"android_abi": AndroidAbiConfigNode,
}
return next_nodes[experimental_feature]
@ -147,6 +143,9 @@ class XlaConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_xla"] = node_name
def child_constructor(self):
return ImportantConfigNode
class NamedTensorConfigNode(TreeConfigNode):
def modify_label(self, label):
@ -155,6 +154,16 @@ class NamedTensorConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_namedtensor"] = node_name
def child_constructor(self):
return ImportantConfigNode
class AndroidAbiConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["android_abi"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ImportantConfigNode(TreeConfigNode):
def modify_label(self, label):
@ -163,15 +172,22 @@ class ImportantConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_important"] = node_name
def get_children(self):
return []
class XenialCompilerConfigNode(TreeConfigNode):
def modify_label(self, label):
return label or "<unspecified>"
def init2(self, node_name):
self.props["compiler_name"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return XenialCompilerVersionConfigNode
return XenialCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode
class XenialCompilerVersionConfigNode(TreeConfigNode):

View File

@ -6,51 +6,44 @@ from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from dataclasses import dataclass, field
from typing import List, Optional
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"
DOCKER_IMAGE_VERSION = 300
# ARE YOU EDITING THIS NUMBER? MAKE SURE YOU READ THE GUIDANCE AT THE
# TOP OF .circleci/config.yml
DOCKER_IMAGE_VERSION = 347
class Conf(object):
def __init__(self,
distro,
parms,
pyver=None,
cuda_version=None,
is_xla=False,
restrict_phases=None,
gpu_resource=None,
dependent_tests=None,
parent_build=None,
is_namedtensor=False,
is_important=False):
self.distro = distro
self.pyver = pyver
self.parms = parms
self.cuda_version = cuda_version
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
self.is_xla = is_xla
self.is_namedtensor = is_namedtensor
self.is_important = is_important
self.restrict_phases = restrict_phases
self.gpu_resource = gpu_resource
self.dependent_tests = dependent_tests or []
self.parent_build = parent_build
@dataclass
class Conf:
distro: str
parms: List[str]
parms_list_ignored_for_docker_image: Optional[List[str]] = None
pyver: Optional[str] = None
cuda_version: Optional[str] = None
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
is_xla: bool = False
restrict_phases: Optional[List[str]] = None
gpu_resource: Optional[str] = None
dependent_tests: List = field(default_factory=list)
parent_build: Optional['Conf'] = None
is_namedtensor: bool = False
is_important: bool = False
# TODO: Eliminate the special casing for docker paths
# In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch
def get_parms(self, for_docker):
leading = []
if self.is_important and not for_docker:
leading.append("AAA")
# We just don't run non-important jobs on pull requests;
# previously we also named them in a way to make it obvious
# if self.is_important and not for_docker:
# leading.append("AAA")
leading.append("pytorch")
if self.is_xla and not for_docker:
leading.append("xla")
@ -60,7 +53,10 @@ class Conf(object):
cuda_parms = []
if self.cuda_version:
cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])
return leading + ["linux", self.distro] + cuda_parms + self.parms
result = leading + ["linux", self.distro] + cuda_parms + self.parms
if (not for_docker and self.parms_list_ignored_for_docker_image is not None):
result = result + self.parms_list_ignored_for_docker_image
return result
def gen_docker_image_path(self):
@ -78,44 +74,27 @@ class Conf(object):
def get_dependents(self):
return self.dependent_tests or []
def gen_yaml_tree(self, build_or_test):
build_job_name_pieces = self.get_build_job_name_pieces(build_or_test)
def gen_workflow_params(self, phase):
parameters = OrderedDict()
build_job_name_pieces = self.get_build_job_name_pieces(phase)
build_env_name = "-".join(map(str, build_job_name_pieces))
env_dict = OrderedDict([
("BUILD_ENVIRONMENT", build_env_name),
("DOCKER_IMAGE", self.gen_docker_image_path()),
])
if self.pyver:
env_dict["PYTHON_VERSION"] = miniutils.quote(self.pyver)
if build_or_test == "test" and self.gpu_resource:
env_dict["USE_CUDA_DOCKER_RUNTIME"] = miniutils.quote("1")
d = {
"environment": env_dict,
"<<": "*" + "_".join(["pytorch", "linux", build_or_test, "defaults"]),
}
if build_or_test == "test":
parameters["build_environment"] = miniutils.quote(build_env_name)
parameters["docker_image"] = self.gen_docker_image_path()
if phase == "test" and self.gpu_resource:
parameters["use_cuda_docker_runtime"] = miniutils.quote("1")
if phase == "test":
resource_class = "large"
if self.gpu_resource:
resource_class = "gpu." + self.gpu_resource
parameters["resource_class"] = resource_class
return parameters
if self.gpu_resource == "large":
env_dict["MULTI_GPU"] = miniutils.quote("1")
d["resource_class"] = resource_class
return d
def gen_workflow_yaml_item(self, phase):
def gen_workflow_job(self, phase):
# All jobs require the setup job
parameters = OrderedDict({"requires": ["setup"]})
job_def = OrderedDict()
job_def["name"] = self.gen_build_name(phase)
job_def["requires"] = ["setup"]
if phase == "test":
@ -125,9 +104,19 @@ class Conf(object):
# pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)
dependency_build = self.parent_build or self
parameters["requires"].append(dependency_build.gen_build_name("build"))
job_def["requires"].append(dependency_build.gen_build_name("build"))
job_name = "pytorch_linux_test"
else:
job_name = "pytorch_linux_build"
return {self.gen_build_name(phase): parameters}
if not self.is_important:
# If you update this, update
# caffe2_build_definitions.py too
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}
# TODO This is a hack to special case some configs just for the workflow list
@ -136,8 +125,7 @@ class HiddenConf(object):
self.name = name
self.parent_build = parent_build
def gen_workflow_yaml_item(self, phase):
def gen_workflow_job(self, phase):
return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}
def gen_build_name(self, _):
@ -166,11 +154,12 @@ def gen_dependent_configs(xenial_parent_config):
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=xenial_parent_config.is_important,
)
configs.append(c)
for x in ["pytorch_short_perf_test_gpu", "pytorch_doc_push"]:
for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
return configs
@ -196,16 +185,18 @@ def instantiate_configs():
for fc in found_configs:
distro_name = fc.find_prop("distro_name")
compiler_name = fc.find_prop("compiler_name")
compiler_version = fc.find_prop("compiler_version")
is_xla = fc.find_prop("is_xla") or False
parms_list_ignored_for_docker_image = []
python_version = None
if distro_name == "xenial":
if compiler_name == "cuda" or compiler_name == "android":
python_version = fc.find_prop("pyver")
parms_list = [fc.find_prop("abbreviated_pyver")]
else:
parms_list = ["py" + fc.find_prop("pyver")]
compiler_name = fc.find_prop("compiler_name")
cuda_version = None
if compiler_name == "cuda":
cuda_version = fc.find_prop("compiler_version")
@ -215,20 +206,24 @@ def instantiate_configs():
# TODO: do we need clang to compile host binaries like protoc?
parms_list.append("clang5")
parms_list.append("android-ndk-" + android_ndk_version)
android_abi = fc.find_prop("android_abi")
parms_list_ignored_for_docker_image.append(android_abi)
restrict_phases = ["build"]
elif compiler_name:
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
parms_list.append(gcc_version)
if compiler_name == "clang":
# TODO: This is a nasty special case
if compiler_name == "clang" and not is_xla:
parms_list.append("asan")
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
if cuda_version in ["9.2", "10"]:
if cuda_version in ["9.2", "10", "10.1"]:
# TODO The gcc version is orthogonal to CUDA version?
parms_list.append("gcc7")
is_xla = fc.find_prop("is_xla") or False
is_namedtensor = fc.find_prop("is_namedtensor") or False
is_important = fc.find_prop("is_important") or False
@ -239,6 +234,7 @@ def instantiate_configs():
c = Conf(
distro_name,
parms_list,
parms_list_ignored_for_docker_image,
python_version,
cuda_version,
is_xla,
@ -251,44 +247,26 @@ def instantiate_configs():
if cuda_version == "9" and python_version == "3.6":
c.dependent_tests = gen_dependent_configs(c)
if (compiler_name == "gcc"
and compiler_version == "5.4"
and not is_namedtensor):
bc_breaking_check = Conf(
"backward-compatibility-check",
[],
is_xla=False,
restrict_phases=["test"],
is_namedtensor=False,
is_important=True,
parent_build=c,
)
c.dependent_tests.append(bc_breaking_check)
config_list.append(c)
return config_list
def add_build_env_defs(jobs_dict):
mydict = OrderedDict()
config_list = instantiate_configs()
for c in config_list:
phases = c.restrict_phases or dimensions.PHASES
for phase in phases:
# TODO why does this not have a test?
if phase == "test" and c.cuda_version == "10":
continue
d = c.gen_yaml_tree(phase)
mydict[c.gen_build_name(phase)] = d
if phase == "test":
for x in filter(lambda x: type(x) is not HiddenConf, c.get_dependents()):
d = x.gen_yaml_tree(phase)
mydict[x.gen_build_name(phase)] = d
# this is the circleci api version and probably never changes
jobs_dict["version"] = 2
jobs_dict["jobs"] = mydict
graph = visualization.generate_graph(get_root())
graph.draw("pytorch-config-dimensions.png", prog="twopi")
def get_workflow_list():
def get_workflow_jobs():
config_list = instantiate_configs()
@ -303,10 +281,10 @@ def get_workflow_list():
if phase == "test" and conf_options.cuda_version == "10":
continue
x.append(conf_options.gen_workflow_yaml_item(phase))
x.append(conf_options.gen_workflow_job(phase))
# TODO convert to recursion
for conf in conf_options.get_dependents():
x.append(conf.gen_workflow_yaml_item("test"))
x.append(conf.gen_workflow_job("test"))
return x

View File

@ -1,6 +1,10 @@
#!/usr/bin/env python3
from dataclasses import dataclass, field
from typing import Optional, Dict
def X(val):
"""
Compact way to write a leaf node
@ -8,23 +12,28 @@ def X(val):
return val, []
class Ver(object):
def XImportant(name):
"""Compact way to write an important (run on PRs) leaf node"""
return (name, [("important", [X(True)])])
@dataclass
class Ver:
"""
Represents a product with a version number
"""
def __init__(self, name, version=""):
self.name = name
self.version = version
name: str
version: str = ""
def __str__(self):
return self.name + self.version
class ConfigNode(object):
def __init__(self, parent, node_name):
self.parent = parent
self.node_name = node_name
self.props = {}
@dataclass
class ConfigNode:
parent: Optional['ConfigNode']
node_name: str
props: Dict[str, str] = field(default_factory=dict)
def get_label(self):
return self.node_name

View File

@ -9,23 +9,13 @@ INDENTATION_WIDTH = 2
def is_dict(data):
return type(data) is dict or type(data) is OrderedDict
return type(data) in [dict, OrderedDict]
def is_collection(data):
return is_dict(data) or type(data) is list
# TODO can eventually drop this custom sorting
def sortkey(x):
k = x[0]
return (
k == "<<",
k != "environment",
k,
)
def render(fh, data, depth, is_list_member=False):
"""
PyYaml does not allow precise control over the quoting
@ -39,7 +29,7 @@ def render(fh, data, depth, is_list_member=False):
tuples = list(data.items())
if type(data) is not OrderedDict:
tuples.sort(key=sortkey)
tuples.sort()
for i, (k, v) in enumerate(tuples):
@ -51,10 +41,6 @@ def render(fh, data, depth, is_list_member=False):
render(fh, v, depth + 1 + int(is_list_member))
# TODO Could eventually drop this cosmetic convention
if depth == 2:
fh.write("\n")
elif type(data) is list:
for v in data:
render(fh, v, depth, True)

File diff suppressed because it is too large Load Diff

View File

@ -74,41 +74,33 @@ class Header(object):
# Order of this list matters to the generated config.yml.
YAML_SOURCES = [
File("header-section.yml"),
File("linux-build-defaults.yml"),
File("macos-build-defaults.yml"),
File("commands.yml"),
File("nightly-binary-build-defaults.yml"),
File("linux-binary-build-defaults.yml"),
File("macos-binary-build-defaults.yml"),
File("nightly-build-smoke-tests-defaults.yml"),
Header("Job specifications job specs"),
Treegen(pytorch_build_definitions.add_build_env_defs, 0),
Header("Build parameters"),
File("pytorch-build-params.yml"),
File("caffe2-build-params.yml"),
File("binary-build-params.yml"),
Header("Job specs"),
File("pytorch-job-specs.yml"),
File("caffe2-job-specs.yml"),
File("binary-job-specs.yml"),
File("job-specs-setup.yml"),
File("job-specs-custom.yml"),
Treegen(caffe2_build_definitions.add_caffe2_builds, 1),
File("binary_update_htmls.yml"),
Header("Binary build specs individual job specifications"),
Treegen(binary_build_definitions.add_binary_build_specs, 1),
Header(
"Binary build tests", [
"These are the smoke tests run right after the build, before the upload.",
"If these fail, the upload doesn't happen."
]
),
Treegen(binary_build_definitions.add_binary_build_tests, 1),
File("binary-build-tests.yml"),
Header("Binary build uploads"),
Treegen(binary_build_definitions.add_binary_build_uploads, 1),
Header("Smoke test specs individual job specifications"),
Treegen(binary_build_definitions.add_smoke_test_specs, 1),
File("workflows.yml"),
Listgen(pytorch_build_definitions.get_workflow_list, 3),
Listgen(pytorch_build_definitions.get_workflow_jobs, 3),
File("workflows-pytorch-macos-builds.yml"),
Listgen(caffe2_build_definitions.get_caffe2_workflows, 3),
File("workflows-pytorch-android-gradle-build.yml"),
File("workflows-pytorch-ios-builds.yml"),
Listgen(caffe2_build_definitions.get_workflow_jobs, 3),
File("workflows-binary-builds-smoke-subset.yml"),
Header("Daily smoke test trigger"),
Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),
Header("Daily binary build trigger"),
Treegen(binary_build_definitions.add_binary_build_jobs, 1),
File("workflows-nightly-ios-binary-builds.yml"),
File("workflows-nightly-android-binary-builds.yml"),
Header("Nightly tests"),
Listgen(binary_build_definitions.get_nightly_tests, 3),
File("workflows-nightly-uploads-header.yml"),

View File

@ -0,0 +1,4 @@
All the scripts in this directory are callable from `~/workspace/.circleci/scripts/foo.sh`.
Don't try to call them as `.circleci/scripts/foo.sh`, that won't
(necessarily) work. See Note [Workspace for CircleCI scripts] in
job-specs-setup.yml for more details.

View File

@ -41,8 +41,6 @@ popd
# Clone the Builder master repo
git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
git fetch origin
git reset origin/master --hard
echo "Using builder from "
git --no-pager log --max-count 1
popd

View File

@ -0,0 +1,38 @@
#!/bin/bash
set -eux -o pipefail
echo ""
echo "PWD: ${PWD}"
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/Downloads/conda.sh
/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
cat ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
export BUILD_PYTORCH_MOBILE=1
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
#store the binary
cd ${WORKSPACE}
DEST_DIR=${WORKSPACE}/ios
mkdir -p ${DEST_DIR}
cp -R ${PROJ_ROOT}/build_ios/install ${DEST_DIR}
mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -eux -o pipefail
echo ""
echo "PWD: $(pwd)"
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
ARTIFACTS_DIR=${WORKSPACE}/ios
ls ${ARTIFACTS_DIR}
ZIP_DIR=${WORKSPACE}/zip
mkdir -p ${ZIP_DIR}/install/lib
mkdir -p ${ZIP_DIR}/src
# copy header files
cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/
# build a FAT bianry
cd ${ZIP_DIR}/install/lib
target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch.a)
for lib in ${target_libs[*]}
do
libs=(${ARTIFACTS_DIR}/x86_64/lib/${lib} ${ARTIFACTS_DIR}/arm64/lib/${lib})
lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}
done
# for nnpack, we only support arm64 build
cp ${ARTIFACTS_DIR}/arm64/lib/libnnpack.a ./
lipo -i ${ZIP_DIR}/install/lib/*.a
# copy the umbrella header and license
cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
ZIPFILE=libtorch_ios_nightly_build.zip
cd ${ZIP_DIR}
#for testing
touch version.txt
echo $(date +%s) > version.txt
zip -r ${ZIPFILE} install src version.txt LICENSE
# upload to aws
brew install awscli
set +x
export AWS_ACCESS_KEY_ID=${AWS_S3_ACCESS_KEY_FOR_PYTORCH_BINARY_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_S3_ACCESS_SECRET_FOR_PYTORCH_BINARY_UPLOAD}
set +x
# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"
# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read

View File

@ -19,7 +19,7 @@ fi
# We want to call unbuffer, which calls tclsh which finds the expect
# package. The expect was installed by yum into /usr/bin so we want to
# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in
# the conda docker images.
# the conda docker images, so we prepend it to the path here.
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
mkdir /just_tclsh_bin
ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh

View File

@ -11,7 +11,7 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
source activate testenv >/dev/null
elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then
export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"
else
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"
fi
@ -25,6 +25,9 @@ fi
pkg="/final_pkgs/\$(ls /final_pkgs)"
if [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "\$pkg" --offline
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
conda install -y cpuonly -c pytorch
fi
retry conda install -yq future numpy protobuf six
if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
# DESIRED_CUDA is in format cu90 or cu100
@ -35,10 +38,15 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
fi
retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"
fi
else
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
pip install "\$pkg"
retry pip install -q future numpy protobuf six
fi
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
pkg="\$(ls /final_pkgs/*-latest.zip)"
unzip "\$pkg" -d /tmp
cd /tmp/libtorch
fi
# Test the package
/builder/check_binary.sh

View File

@ -26,7 +26,7 @@ pushd /home/circleci/project/final_pkgs
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry timeout 30 /home/circleci/project/login_to_anaconda.sh
anaconda upload "$(ls)" -u pytorch --label main --no-progress --force
anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -26,7 +26,7 @@ pushd "$workdir/final_pkgs"
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry /Users/distiller/project/login_to_anaconda.sh
retry anaconda upload "$(ls)" -u pytorch --label main --no-progress --force
retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -29,29 +29,37 @@ if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
fi
# Pick docker image
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="soumith/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="soumith/manylinux-cuda80"
else
export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"
export DOCKER_IMAGE=${DOCKER_IMAGE:-}
if [[ -z "$DOCKER_IMAGE" ]]; then
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="soumith/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="soumith/manylinux-cuda100"
else
export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"
fi
fi
# Upload to parallel folder for gcc abis
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
export PIP_UPLOAD_FOLDER='nightly/devtoolset7/'
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
echo "We don't handle conda builds with gcc ABI of 1, since we don't"
echo "want to add a new package name to the conda builds"
exit 1
fi
else
# Upload to parallel folder for devtoolsets
# All nightlies used to be devtoolset3, then devtoolset7 was added as a build
# option, so the upload was redirected to nightly/devtoolset7 to avoid
# conflicts with other binaries (there shouldn't be any conflicts). Now we are
# making devtoolset7 the default.
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$(uname)" == 'Darwin' ]]; then
export PIP_UPLOAD_FOLDER='nightly/'
else
# On linux machines, this shouldn't actually be called anymore. This is just
# here for extra safety.
export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'
fi
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE"
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu100" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE"
else
export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE+$DESIRED_CUDA"
fi
export PYTORCH_BUILD_NUMBER=1
cat >>"$envfile" <<EOL
@ -67,15 +75,16 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.2.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.3.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
export TORCH_PACKAGE_NAME='torch-nightly'
# TODO: We don't need this anymore IIUC
export TORCH_PACKAGE_NAME='torch'
export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
export NO_FBGEMM=1
export USE_FBGEMM=1
export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
export DOCKER_IMAGE="$DOCKER_IMAGE"

View File

@ -0,0 +1,59 @@
#!/usr/bin/env bash
set -eux -o pipefail
export ANDROID_NDK_HOME=/opt/ndk
export ANDROID_HOME=/opt/android/sdk
export GRADLE_VERSION=4.10.3
export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
export GRADLE_PATH=$GRADLE_HOME/bin/gradle
BUILD_ANDROID_INCLUDE_DIR_x86=~/workspace/build_android/install/include
BUILD_ANDROID_LIB_DIR_x86=~/workspace/build_android/install/lib
BUILD_ANDROID_INCLUDE_DIR_x86_64=~/workspace/build_android_install_x86_64/install/include
BUILD_ANDROID_LIB_DIR_x86_64=~/workspace/build_android_install_x86_64/install/lib
BUILD_ANDROID_INCLUDE_DIR_arm_v7a=~/workspace/build_android_install_arm_v7a/install/include
BUILD_ANDROID_LIB_DIR_arm_v7a=~/workspace/build_android_install_arm_v7a/install/lib
BUILD_ANDROID_INCLUDE_DIR_arm_v8a=~/workspace/build_android_install_arm_v8a/install/include
BUILD_ANDROID_LIB_DIR_arm_v8a=~/workspace/build_android_install_arm_v8a/install/lib
PYTORCH_ANDROID_SRC_MAIN_DIR=~/workspace/android/pytorch_android/src/main
JNI_INCLUDE_DIR=${PYTORCH_ANDROID_SRC_MAIN_DIR}/cpp/libtorch_include
mkdir -p $JNI_INCLUDE_DIR
JNI_LIBS_DIR=${PYTORCH_ANDROID_SRC_MAIN_DIR}/jniLibs
mkdir -p $JNI_LIBS_DIR
ln -s ${BUILD_ANDROID_INCLUDE_DIR_x86} ${JNI_INCLUDE_DIR}/x86
ln -s ${BUILD_ANDROID_LIB_DIR_x86} ${JNI_LIBS_DIR}/x86
if [[ "${BUILD_ENVIRONMENT}" != *-gradle-build-only-x86_32* ]]; then
ln -s ${BUILD_ANDROID_INCLUDE_DIR_x86_64} ${JNI_INCLUDE_DIR}/x86_64
ln -s ${BUILD_ANDROID_LIB_DIR_x86_64} ${JNI_LIBS_DIR}/x86_64
ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v7a} ${JNI_INCLUDE_DIR}/armeabi-v7a
ln -s ${BUILD_ANDROID_LIB_DIR_arm_v7a} ${JNI_LIBS_DIR}/armeabi-v7a
ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v8a} ${JNI_INCLUDE_DIR}/arm64-v8a
ln -s ${BUILD_ANDROID_LIB_DIR_arm_v8a} ${JNI_LIBS_DIR}/arm64-v8a
fi
env
echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"
export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then
$GRADLE_PATH -PABI_FILTERS=x86 -p ~/workspace/android/ assembleRelease
else
$GRADLE_PATH -p ~/workspace/android/ assembleRelease
fi
find . -type f -name *aar -print | xargs tar cfvz ~/workspace/android/artifacts.tgz

View File

@ -0,0 +1,127 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$3" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
# ======================== Building PyTorch C++ API Docs ========================
echo "Building PyTorch C++ API docs..."
# Clone the cppdocs repo
rm -rf cppdocs
git clone https://github.com/pytorch/cppdocs
set -ex
sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
# Copy some required files
cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py
cp torch/_utils_internal.py tools/shared
# Generate PyTorch files
time python tools/setup_helpers/generate_code.py \
--declarations-path build/aten/src/ATen/Declarations.yaml \
--nn-path aten/src/
# Build the docs
pushd docs/cpp
pip install breathe==4.11.1 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx==1.8.5
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j
popd
popd
pushd cppdocs
# Purge everything with some exceptions
mkdir /tmp/cppdocs-sync
mv _config.yml README.md /tmp/cppdocs-sync/
rm -rf *
# Copy over all the newly generated HTML
cp -r "${pt_checkout}"/docs/cpp/build/html/* .
# Copy back _config.yml
rm -rf _config.yml
mv /tmp/cppdocs-sync/* .
# Make a new commit
git add . || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Automatic sync on $(date)" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to https://github.com/pytorch/cppdocs"
set +x
/usr/bin/expect <<DONE
spawn git push -u origin master
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -0,0 +1,44 @@
#!/usr/bin/env bash
# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables
set -eu -o pipefail
export ANDROID_NDK_HOME=/opt/ndk
export ANDROID_HOME=/opt/android/sdk
export GRADLE_VERSION=4.10.3
export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
export GRADLE_PATH=$GRADLE_HOME/bin/gradle
echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"
ls -la ~/workspace
GRADLE_PROPERTIES=~/workspace/android/gradle.properties
IS_SNAPSHOT="$(grep 'VERSION_NAME=[0-9\.]\+-SNAPSHOT' "$GRADLE_PROPERTIES")"
echo "IS_SNAPSHOT:$IS_SNAPSHOT"
if [ -z "$IS_SNAPSHOT" ]; then
echo "Error: version is not snapshot."
elif [ -z "$SONATYPE_NEXUS_USERNAME" ]; then
echo "Error: missing env variable SONATYPE_NEXUS_USERNAME."
elif [ -z "$SONATYPE_NEXUS_PASSWORD" ]; then
echo "Error: missing env variable SONATYPE_NEXUS_PASSWORD."
elif [ -z "$ANDROID_SIGN_KEY" ]; then
echo "Error: missing env variable ANDROID_SIGN_KEY."
elif [ -z "$ANDROID_SIGN_PASS" ]; then
echo "Error: missing env variable ANDROID_SIGN_PASS."
else
GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES
echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES
echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES
echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES
$GRADLE_PATH -p ~/workspace/android/ uploadArchives
fi

View File

@ -0,0 +1,118 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: The branch to push to. Usually is "site"
branch="$3"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1
fi
# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$4" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
git clone https://github.com/pytorch/pytorch.github.io -b $branch
pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt || true
if [ "$is_master_doc" = true ]; then
make html
else
make html-stable
fi
# Move them into the docs repo
popd
popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "$is_master_doc" = true ]; then
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
fi
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "auto-generating sphinx docs" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:$branch"
set +x
/usr/bin/expect <<DONE
spawn git push origin $branch
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -1,52 +1,18 @@
#!/usr/bin/env bash
set -ex -o pipefail
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [[ "${BUILD_ENVIRONMENT}" == *-slow-* ]]; then
if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then
# It's a PR; test for [slow ci] tag on the TOPMOST commit
topmost_commit=$(git log --format='%B' -n 1 HEAD)
if !(echo $topmost_commit | grep -q -e '\[slow ci\]' -e '\[ci slow\]' -e '\[test slow\]' -e '\[slow test\]'); then
circleci step halt
exit
fi
fi
fi
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then
# It's a PR; test for [xla ci] tag on the TOPMOST commit
topmost_commit=$(git log --format='%B' -n 1 HEAD)
if !(echo $topmost_commit | grep -q -e '\[xla ci\]' -e '\[ci xla\]' -e '\[test xla\]' -e '\[xla test\]'); then
# NB: This doesn't halt everything, just this job. So
# the rest of the workflow will keep going and you need
# to make sure you halt there too. Blegh.
circleci step halt
exit
fi
fi
fi
if [[ "${BUILD_ENVIRONMENT}" == *namedtensor* ]]; then
if ! [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then
# It's a PR; test for [namedtensor] tag on the TOPMOST commit
topmost_commit=$(git log --format='%B' -n 1 HEAD)
if !(echo $topmost_commit | grep -q -e '\[namedtensor\]' -e '\[ci namedtensor\]' -e '\[namedtensor ci\]'); then
# NB: This doesn't halt everything, just this job. So
# the rest of the workflow will keep going and you need
# to make sure you halt there too. Blegh.
circleci step halt
exit
fi
fi
fi
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
# Remove unnecessary sources
sudo rm -f /etc/apt/sources.list.d/google-chrome.list
sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
@ -72,10 +38,14 @@ sudo apt-get -y install \
sudo pkill -SIGHUP dockerd
sudo pip -q install awscli==1.16.35
retry () {
$* || $* || $* || $* || $*
}
retry sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"
DRIVER_FN="NVIDIA-Linux-x86_64-430.40.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
@ -84,7 +54,6 @@ fi
if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env
echo "declare -x PYTHON_VERSION=${PYTHON_VERSION:-}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env
@ -97,12 +66,14 @@ if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# This IAM user allows write access to S3 bucket for sccache & bazels3cache
set +x
echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
set -x
else
# This IAM user allows write access to S3 bucket for sccache
set +x
echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
set -x

View File

@ -29,10 +29,22 @@ done
# See if we actually were successful
systemctl list-units --all | cat
# For good luck, try even harder to kill apt-get
sudo pkill apt-get || true
# For even better luck, purge unattended-upgrades
sudo apt-get purge -y unattended-upgrades
cat /etc/apt/sources.list
# Bail out early if we detect apt/dpkg is stuck
ps auxfww | (! grep '[a]pt')
ps auxfww | (! grep '[d]pkg')
# For the bestest luck, kill again now
sudo pkill apt || true
sudo pkill dpkg || true
# Try to detect if apt/dpkg is stuck
if ps auxfww | grep '[a]pt'; then
echo "WARNING: There are leftover apt processes; subsequent apt update will likely fail"
fi
if ps auxfww | grep '[d]pkg'; then
echo "WARNING: There are leftover dpkg processes; subsequent apt update will likely fail"
fi

View File

@ -0,0 +1,132 @@
import argparse
import re
import sys
# Modify this variable if you want to change the set of default jobs
# which are run on all pull requests.
#
# WARNING: Actually, this is a lie; we're currently also controlling
# the set of jobs to run via the Workflows filters in CircleCI config.
default_set = set([
# PyTorch CPU
# Selected oldest Python 2 version to ensure Python 2 coverage
'pytorch-linux-xenial-py2.7.9',
# PyTorch CUDA
'pytorch-linux-xenial-cuda9-cudnn7-py3',
# PyTorch ASAN
'pytorch-linux-xenial-py3-clang5-asan',
# PyTorch DEBUG
'pytorch-linux-xenial-py3.6-gcc5.4',
# Caffe2 CPU
'caffe2-py2-mkl-ubuntu16.04',
# Caffe2 CUDA
'caffe2-py3.5-cuda10.1-cudnn7-ubuntu16.04',
# Caffe2 ONNX
'caffe2-onnx-py2-gcc5-ubuntu16.04',
'caffe2-onnx-py3.6-clang7-ubuntu16.04',
# Caffe2 Clang
'caffe2-py2-clang7-ubuntu16.04',
# Caffe2 CMake
'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',
# Binaries
'manywheel 2.7mu cpu devtoolset7',
'libtorch 2.7m cpu devtoolset7',
'libtorch 2.7m cpu gcc5.4_cxx11-abi',
'libtorch-ios-10.2.1-nightly-x86_64-build',
'libtorch-ios-10.2.1-nightly-arm64-build',
'libtorch-ios-10.2.1-nightly-binary-build-upload',
# Caffe2 Android
'caffe2-py2-android-ubuntu16.04',
# Caffe2 OSX
'caffe2-py2-system-macos10.13',
# PyTorch OSX
'pytorch-macos-10.13-py3',
'pytorch-macos-10.13-cuda9.2-cudnn7-py3',
# PyTorch Android
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',
# PyTorch Android gradle
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',
# Pytorch iOS builds
'pytorch-ios-10.2.1-x86_64_build',
'pytorch-ios-10.2.1-arm64_build',
# Pytorch backward compatibility check
'pytorch-linux-backward-compatibility-check-test',
# XLA
'pytorch-xla-linux-xenial-py3.6-clang7',
# Named tensor
"pytorch-namedtensor-linux-xenial-py3.6-gcc5.4",
"pytorch-namedtensor-linux-xenial-py3-clang5-asan",
"pytorch-namedtensor-linux-xenial-cuda9-cudnn7-py2",
# Other checks
'pytorch-short-perf-test-gpu',
'pytorch-python-doc-push',
'pytorch-cpp-doc-push',
])
# Collection of jobs that are *temporarily* excluded from running on PRs.
# Use this if there is a long-running job breakage that we can't fix with a
# single revert.
skip_override = {
# example entry:
# 'pytorch-cpp-doc-push': "https://github.com/pytorch/pytorch/issues/<related issue>"
}
# Takes in commit message to analyze via stdin
#
# This script will query Git and attempt to determine if we should
# run the current CI job under question
#
# NB: Try to avoid hard-coding names here, so there's less place to update when jobs
# are updated/renamed
#
# Semantics in the presence of multiple tags:
# - Let D be the set of default builds
# - Let S be the set of explicitly specified builds
# - Let O be the set of temporarily skipped builds
# - Run S \/ (D - O)
parser = argparse.ArgumentParser()
parser.add_argument('build_environment')
args = parser.parse_args()
commit_msg = sys.stdin.read()
# Matches anything that looks like [foo ci] or [ci foo] or [foo test]
# or [test foo]
RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')
markers = RE_MARKER.finditer(commit_msg)
for m in markers:
if m.group(1) and m.group(2):
print("Unrecognized marker: {}".format(m.group(0)))
continue
spec = m.group(1) or m.group(2)
if spec is None:
print("Unrecognized marker: {}".format(m.group(0)))
continue
if spec in args.build_environment or spec == 'all':
print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))
sys.exit(0)
skip_override_set = set(skip_override.keys())
should_run_set = default_set - skip_override_set
for spec in should_run_set:
if spec in args.build_environment:
print("Accepting {} as part of default set".format(args.build_environment))
sys.exit(0)
print("Rejecting {}".format(args.build_environment))
for spec, issue in skip_override.items():
if spec in args.build_environment:
print("This job is temporarily excluded from running on PRs. Reason: {}".format(issue))
break
sys.exit(1)

View File

@ -0,0 +1,29 @@
#!/usr/bin/env bash
set -exu -o pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [ -z "${BUILD_ENVIRONMENT:-}" ]; then
echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"
echo "CircleCI scripts are probably misconfigured."
exit 1
fi
if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then
echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"
echo "written out. Are you perhaps running the wrong copy of this script?"
echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"
exit 1
fi
if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then
if [[ $CIRCLE_BRANCH != "ci-all/"* ]] && [[ $CIRCLE_BRANCH != "nightly" ]] && [[ $CIRCLE_BRANCH != "postnightly" ]] ; then
# Don't swallow "script doesn't exist
[ -e "$SCRIPT_DIR/should_run_job.py" ]
if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then
circleci step halt
exit
fi
fi
fi

View File

@ -0,0 +1,44 @@
#!/usr/bin/env python3
import urllib.request
import re
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.caffe2_build_definitions as caffe2_build_definitions
RE_VERSION = re.compile(r'allDeployedVersions = "([0-9,]+)"')
URL_TEMPLATE = (
"https://raw.githubusercontent.com/pytorch/ossci-job-dsl/"
"master/src/main/groovy/ossci/{}/DockerVersion.groovy"
)
def check_version(job, expected_version):
url = URL_TEMPLATE.format(job)
with urllib.request.urlopen(url) as f:
contents = f.read().decode('utf-8')
m = RE_VERSION.search(contents)
if not m:
raise RuntimeError(
"Unbelievable! I could not find the variable allDeployedVersions in "
"{}; did the organization of ossci-job-dsl change?\n\nFull contents:\n{}"
.format(url, contents)
)
valid_versions = [int(v) for v in m.group(1).split(',')]
if expected_version not in valid_versions:
raise RuntimeError(
"We configured {} to use Docker version {}; but this "
"version is not deployed in {}. Non-deployed versions will be "
"garbage collected two weeks after they are created. DO NOT LAND "
"THIS TO MASTER without also updating ossci-job-dsl with this version."
"\n\nDeployed versions: {}"
.format(job, expected_version, url, m.group(1))
)
def validate_docker_version():
check_version('pytorch', pytorch_build_definitions.DOCKER_IMAGE_VERSION)
check_version('caffe2', caffe2_build_definitions.DOCKER_IMAGE_VERSION)
if __name__ == "__main__":
validate_docker_version()

View File

@ -0,0 +1,54 @@
binary_linux_build_params: &binary_linux_build_params
parameters:
build_environment:
type: string
default: ""
docker_image:
type: string
default: ""
libtorch_variant:
type: string
default: ""
resource_class:
type: string
default: "2xlarge+"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
LIBTORCH_VARIANT: << parameters.libtorch_variant >>
ANACONDA_USER: pytorch
resource_class: << parameters.resource_class >>
docker:
- image: << parameters.docker_image >>
binary_linux_test_upload_params: &binary_linux_test_upload_params
parameters:
build_environment:
type: string
default: ""
docker_image:
type: string
default: ""
libtorch_variant:
type: string
default: ""
resource_class:
type: string
default: "medium"
use_cuda_docker_runtime:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
LIBTORCH_VARIANT: << parameters.libtorch_variant >>
resource_class: << parameters.resource_class >>
binary_mac_params: &binary_mac_params
parameters:
build_environment:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>

View File

@ -0,0 +1,261 @@
binary_linux_build:
<<: *binary_linux_build_params
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Install unbuffer and ts
command: |
set -eux -o pipefail
source /env
OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`
if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then
retry yum -q -y install epel-release
retry yum -q -y install expect moreutils
elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then
retry apt-get update
retry apt-get -y install expect moreutils
conda install -y -c eumetsat expect
conda install -y cmake
fi
- run:
name: Update compiler to devtoolset7
command: |
set -eux -o pipefail
source /env
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
source "/builder/update_compiler.sh"
# Env variables are not persisted into the next step
echo "export PATH=$PATH" >> /env
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env
else
echo "Not updating compiler"
fi
- run:
name: Build
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
- persist_to_workspace:
root: /
paths: final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
# executor (b/c we have to run the docker with --runtime=nvidia and we can't do
# that on the docker executor)
binary_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
# TODO: We shouldn't attach the workspace multiple times
- attach_workspace:
at: /home/circleci/project
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Prepare test code
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
binary_linux_upload:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- attach_workspace:
at: /home/circleci/project
- run:
<<: *binary_populate_env
- run:
<<: *binary_install_miniconda
- run:
name: Upload
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_upload.sh
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
# are runnable. Note that the pytorch repo is never cloned into these jobs
##############################################################################
smoke_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace:
at: /home/circleci/project
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -ex
cat >/home/circleci/project/ci_test_script.sh \<<EOL
# The following code will be executed inside Docker container
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
- run:
<<: *binary_run_in_docker
smoke_mac_test:
<<: *binary_linux_test_upload_params
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh
binary_mac_build:
<<: *binary_mac_params
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "1h"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
binary_mac_upload: &binary_mac_upload
<<: *binary_mac_params
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
name: Upload
no_output_timeout: "10m"
command: |
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"
cat "$script"
source "$script"
binary_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
steps:
- attach_workspace:
at: ~/workspace
- should_run_job
- checkout
- run_brew_for_ios_build
- run:
name: Build
contxt: org-member
no_output_timeout: "1h"
command: |
script="/Users/distiller/project/.circleci/scripts/binary_ios_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/workspace/
paths: ios
binary_ios_upload:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
steps:
- attach_workspace:
at: ~/workspace
- should_run_job
- checkout
- run_brew_for_ios_build
- run:
name: Upload
no_output_timeout: "1h"
command: |
script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"
cat "$script"
source "$script"

View File

@ -1,4 +1,4 @@
# update_s3_htmls job
# These jobs create html files for every cpu/cu## folder in s3. The html
# files just store the names of all the files in that folder (which are
@ -12,8 +12,7 @@
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- setup_linux_system_environment
- run:
<<: *binary_checkout
# N.B. we do not run binary_populate_env. The only variable we need is
@ -67,8 +66,7 @@
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- setup_linux_system_environment
- run:
<<: *binary_checkout
- run:

View File

@ -0,0 +1,28 @@
caffe2_params: &caffe2_params
parameters:
build_environment:
type: string
default: ""
build_ios:
type: string
default: ""
docker_image:
type: string
default: ""
use_cuda_docker_runtime:
type: string
default: ""
build_only:
type: string
default: ""
resource_class:
type: string
default: "large"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
BUILD_IOS: << parameters.build_ios >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
DOCKER_IMAGE: << parameters.docker_image >>
BUILD_ONLY: << parameters.build_only >>
resource_class: << parameters.resource_class >>

View File

@ -0,0 +1,200 @@
caffe2_linux_build:
<<: *caffe2_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- checkout
- setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
cat >/home/circleci/project/ci_build_script.sh \<<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
sudo chown -R jenkins:jenkins '/opt/conda'
fi
# Build
./.jenkins/caffe2/build.sh
# Show sccache stats if it is running
if pgrep sccache > /dev/null; then
sccache --show-stats
fi
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_build_script.sh
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
time docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
time docker push ${COMMIT_DOCKER_IMAGE}
fi
caffe2_linux_test:
<<: *caffe2_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Test
no_output_timeout: "1h"
command: |
set -e
# TODO: merge this into Caffe2 test.sh
cat >/home/circleci/project/ci_test_script.sh \<<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...
sudo ln /dev/null /dev/raw1394
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
fi
# Upgrade SSL module to avoid old SSL warnings
pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1
pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
# Build
./.jenkins/caffe2/test.sh
# Remove benign core dumps.
# These are tests for signal handling (including SIGABRT).
rm -f ./crash/core.fatal_signal_as.*
rm -f ./crash/core.logging_test.*
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_test_script.sh
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
caffe2_macos_build:
<<: *caffe2_params
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run_brew_for_macos_build
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
brew install cmake
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# Reinitialize path (see man page for path_helper(8))
eval `/usr/libexec/path_helper -s`
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh
export PATH="${TMPDIR}/anaconda/bin:${PATH}"
source ${TMPDIR}/anaconda/bin/activate
fi
pip -q install numpy
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
export SCCACHE_BIN=${PWD}/sccache_bin
mkdir -p ${SCCACHE_BIN}
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"
chmod a+x "${SCCACHE_BIN}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"
chmod a+x "${SCCACHE_BIN}/clang"
export PATH="${SCCACHE_BIN}:$PATH"
fi
# Build
if [ "${BUILD_IOS:-0}" -eq 1 ]; then
unbuffer scripts/build_ios.sh 2>&1 | ts
elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then
# All conda build logic should be in scripts/build_anaconda.sh
unbuffer scripts/build_anaconda.sh 2>&1 | ts
else
unbuffer scripts/build_local.sh 2>&1 | ts
fi
# Show sccache stats if it is running
if which sccache > /dev/null; then
sccache --show-stats
fi

View File

@ -0,0 +1,90 @@
commands:
# NB: This command must be run as the first command in a job. It
# attaches the workspace at ~/workspace; this workspace is generated
# by the setup job. Note that ~/workspace is not the default working
# directory (that's ~/project).
should_run_job:
description: "Test if the job should run or not"
steps:
- attach_workspace:
name: Attaching workspace
at: ~/workspace
- run:
name: Should run job
no_output_timeout: "2m"
command: ~/workspace/.circleci/scripts/should_run_job.sh
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and
# building/testing.
setup_linux_system_environment:
steps:
- run:
name: Set Up System Environment
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh
setup_ci_environment:
steps:
- run:
name: Set Up CI Environment After attach_workspace
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_ci_environment.sh
brew_update:
description: "Update Homebrew and install base formulae"
steps:
- run:
name: Update Homebrew
no_output_timeout: "10m"
command: |
set -ex
# Update repositories manually.
# Running `brew update` produces a comparison between the
# current checkout and the updated checkout, which takes a
# very long time because the existing checkout is 2y old.
for path in $(find /usr/local/Homebrew -type d -name .git)
do
cd $path/..
git fetch --depth=1 origin
git reset --hard origin/master
done
export HOMEBREW_NO_AUTO_UPDATE=1
# Install expect and moreutils so that we can call `unbuffer` and `ts`.
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards.
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect
brew_install:
description: "Install Homebrew formulae"
parameters:
formulae:
type: string
default: ""
steps:
- run:
name: Install << parameters.formulae >>
no_output_timeout: "10m"
command: |
set -ex
export HOMEBREW_NO_AUTO_UPDATE=1
brew install << parameters.formulae >>
run_brew_for_macos_build:
steps:
- brew_update
- brew_install:
formulae: libomp
run_brew_for_ios_build:
steps:
- brew_update
- brew_install:
formulae: libtool

View File

@ -7,6 +7,11 @@
# and then update DOCKER_IMAGE_VERSION at the top of the following files:
# * cimodel/data/pytorch_build_definitions.py
# * cimodel/data/caffe2_build_definitions.py
# And the inline copies of the variable in
# * verbatim-sources/job-specs-custom.yml
# (grep for DOCKER_IMAGE)
version: 2.1
docker_config_defaults: &docker_config_defaults
user: jenkins
@ -14,148 +19,3 @@ docker_config_defaults: &docker_config_defaults
# This IAM user only allows read-write access to ECR
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and
# building/testing.
setup_linux_system_environment: &setup_linux_system_environment
name: Set Up System Environment
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh
install_doc_push_script: &install_doc_push_script
name: Install the doc push script
no_output_timeout: "2m"
command: |
cat >/home/circleci/project/doc_push_script.sh <<EOL
# =================== The following code **should** be executed inside Docker container ===================
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "doc_push_script.sh: Invoked with \$*"
git clone https://yf225:${GITHUB_PYTORCHBOT_TOKEN}@github.com/pytorch/pytorch.github.io -b site
pushd pytorch.github.io
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="\$1"
if [ -z "\$install_path" ]; then
echo "error: doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="\$2"
if [ -z "\$version" ]; then
echo "error: doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "\$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "\$3" != "" ]; then
dry_run=true
fi
echo "install_path: \$install_path version: \$version dry_run: \$dry_run"
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "\$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt || true
if [ "\$is_master_doc" = true ]; then
make html
else
make html-stable
fi
# Move them into the docs repo
popd
popd
git rm -rf "\$install_path" || true
mv "\$pt_checkout/docs/build/html" "\$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "\$is_master_doc" = true ]; then
find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\$version \&#x25BC</a>@g"
fi
git add "\$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "auto-generating sphinx docs" || true
git status
if [ "\$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:site"
git push origin site
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/doc_push_script.sh
# `setup_ci_environment` has to be run **after** the ``checkout`` step because
# it writes into the checkout directory and otherwise git will complain
# that
# Directory (/home/circleci/project) you are trying to checkout to is not empty and not git repository
setup_ci_environment: &setup_ci_environment
name: Set Up CI Environment After Checkout
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_ci_environment.sh
# Installs expect and moreutils so that we can call `unbuffer` and `ts`.
# Also installs OpenMP
# !!!!NOTE!!!! this is copied into a binary_macos_brew_update job which is the
# same but does not install libomp. If you are changing this, consider if you
# need to change that step as well.
macos_brew_update: &macos_brew_update
name: Brew update and install moreutils, expect and libomp
no_output_timeout: "1h"
command: |
set -ex
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
brew update
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect
brew install libomp

View File

@ -1,20 +1,17 @@
pytorch_short_perf_test_gpu:
environment:
BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:300"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
PYTHON_VERSION: "3.6"
USE_CUDA_DOCKER_RUNTIME: "1"
resource_class: gpu.medium
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Perf Test
no_output_timeout: "1h"
@ -22,7 +19,7 @@
set -e
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env
@ -36,48 +33,43 @@
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
pytorch_doc_push:
pytorch_python_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:300"
BUILD_ENVIRONMENT: pytorch-python-doc-push
# TODO: stop hardcoding this
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *install_doc_push_script
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -e
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
docker cp /home/circleci/project/doc_push_script.sh $id:/var/lib/jenkins/workspace/doc_push_script.sh
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open (#16502) for merging v1.0.1 -> master for this job.
# XXX: The following code is only run on the v1.0.1 branch, which might
# an eternal PR open for merging v1.3.0 -> master for this job.
# XXX: The following code is only run on the v1.3.0 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'
elif [[ "${CIRCLE_BRANCH}" == "v1.3.0" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/stable 1.3.0 site-v1.3.0 dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -85,30 +77,75 @@
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_cpp_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open (#16502) for merging v1.0.1 -> master for this job.
# XXX: The following code is only run on the v1.0.1 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_macos_10_13_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run:
<<: *macos_brew_update
- run_brew_for_macos_build
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
@ -118,54 +155,50 @@
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
mkdir -p /Users/distiller/pytorch-ci-env/workspace
# copy with -a to preserve relative structure (e.g., symlinks), and be recursive
cp -a /Users/distiller/project/. /Users/distiller/pytorch-ci-env/workspace
cp -a ~/project ~/workspace
- persist_to_workspace:
root: /Users/distiller/pytorch-ci-env
root: ~/workspace
paths:
- "*"
- miniconda3
- project
pytorch_macos_10_13_py3_test:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
macos:
xcode: "9.0"
steps:
- run:
name: Prepare workspace
command: |
sudo mkdir -p /Users/distiller/pytorch-ci-env
sudo chmod -R 777 /Users/distiller/pytorch-ci-env
- attach_workspace:
at: /Users/distiller/pytorch-ci-env
- run:
<<: *macos_brew_update
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
# This workspace also carries binaries from the build job
- should_run_job
- run_brew_for_macos_build
- run:
name: Test
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# copy with -a to preserve relative structure (e.g., symlinks), and be recursive
cp -a /Users/distiller/pytorch-ci-env/workspace/. /Users/distiller/project
cp -a ~/workspace/project/. ~/project
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run:
<<: *macos_brew_update
- run_brew_for_macos_build
- run:
name: Build
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
no_output_timeout: "1h"
command: |
set -e
@ -202,3 +235,212 @@
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
pytorch_android_gradle_build:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- setup_linux_system_environment
- checkout
- setup_ci_environment
- run:
name: pytorch android gradle build
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32
docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64
docker_image_libtorch_android_arm_v7a=${docker_image_commit}-android-arm-v7a
docker_image_libtorch_android_arm_v8a=${docker_image_commit}-android-arm-v8a
echo "docker_image_commit: "${docker_image_commit}
echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}
echo "docker_image_libtorch_android_x86_64: "${docker_image_libtorch_android_x86_64}
echo "docker_image_libtorch_android_arm_v7a: "${docker_image_libtorch_android_arm_v7a}
echo "docker_image_libtorch_android_arm_v8a: "${docker_image_libtorch_android_arm_v8a}
# x86_32
time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null
export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# arm-v7a
time docker pull ${docker_image_libtorch_android_arm_v7a} >/dev/null
export id_arm_v7a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir ~/workspace/build_android_install_arm_v7a
docker cp $id_arm_v7a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v7a
# x86_64
time docker pull ${docker_image_libtorch_android_x86_64} >/dev/null
export id_x86_64=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir ~/workspace/build_android_install_x86_64
docker cp $id_x86_64:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_x86_64
# arm-v8a
time docker pull ${docker_image_libtorch_android_arm_v8a} >/dev/null
export id_arm_v8a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir ~/workspace/build_android_install_arm_v8a
docker cp $id_arm_v8a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v8a
docker cp ~/workspace/build_android_install_arm_v7a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v7a
docker cp ~/workspace/build_android_install_x86_64 $id_x86_32:/var/lib/jenkins/workspace/build_android_install_x86_64
docker cp ~/workspace/build_android_install_arm_v8a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v8a
# run gradle buildRelease
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_android_artifacts
docker cp $id_x86_32:/var/lib/jenkins/workspace/android/artifacts.tgz ~/workspace/build_android_artifacts/
output_image=$docker_image_libtorch_android_x86_32-gradle
docker commit "$id_x86_32" ${output_image}
time docker push ${output_image}
- store_artifacts:
path: ~/workspace/build_android_artifacts/artifacts.tgz
destination: artifacts.tgz
pytorch_android_publish_snapshot:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- setup_linux_system_environment
- checkout
- setup_ci_environment
- run:
name: pytorch android gradle build
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle
echo "docker_image_commit: "${docker_image_commit}
echo "docker_image_libtorch_android_x86_32_gradle: "${docker_image_libtorch_android_x86_32_gradle}
# x86_32
time docker pull ${docker_image_libtorch_android_x86_32_gradle} >/dev/null
export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
output_image=${docker_image_libtorch_android_x86_32_gradle}-publish-snapshot
docker commit "$id_x86_32" ${output_image}
time docker push ${output_image}
pytorch_android_gradle_build-x86_32:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- run:
name: filter out not PR runs
no_output_timeout: "5m"
command: |
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [ -z "${CIRCLE_PULL_REQUEST:-}" ]; then
circleci step halt
fi
- setup_linux_system_environment
- checkout
- setup_ci_environment
- run:
name: pytorch android gradle build only x86_32 (for PR)
no_output_timeout: "1h"
command: |
set -e
docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32
echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}
# x86
time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_android_x86_32_artifacts
docker cp $id:/var/lib/jenkins/workspace/android/artifacts.tgz ~/workspace/build_android_x86_32_artifacts/
output_image=${docker_image_libtorch_android_x86_32}-gradle
docker commit "$id" ${output_image}
time docker push ${output_image}
- store_artifacts:
path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz
destination: artifacts.tgz
pytorch_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run_brew_for_ios_build
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/Downloads/conda.sh
/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
# export
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
export BUILD_PYTORCH_MOBILE=1
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

View File

@ -1,4 +1,4 @@
setup:
docker:
- image: circleci/python:3.7.3
@ -8,6 +8,26 @@
name: Ensure config is up to date
command: ./ensure-consistency.py
working_directory: .circleci
- run:
name: Save commit message
command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG
# Note [Workspace for CircleCI scripts]
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# In the beginning, you wrote your CI scripts in a
# .circleci/config.yml file, and life was good. Your CI
# configurations flourished and multiplied.
#
# Then one day, CircleCI cometh down high and say, "Your YAML file
# is too biggeth, it stresses our servers so." And thus they
# asketh us to smite the scripts in the yml file.
#
# But you can't just put the scripts in the .circleci folder,
# because in some jobs, you don't ever actually checkout the
# source repository. Where you gonna get the scripts from?
#
# Here's how you do it: you persist .circleci/scripts into a
# workspace, attach the workspace in your subjobs, and run all
# your scripts from there.
- persist_to_workspace:
root: .
paths: .circleci/scripts

View File

@ -1,96 +0,0 @@
# binary linux build defaults
##############################################################################
binary_linux_build: &binary_linux_build
resource_class: 2xlarge+
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Install unbuffer and ts
command: |
set -eux -o pipefail
source /env
retry yum -q -y install epel-release
retry yum -q -y install expect moreutils
- run:
name: Upgrade gcc version (based on env var)
command: |
set -eux -o pipefail
source /env
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
source "/builder/upgrade_gcc_abi.sh"
# Env variables are not persisted into the next step
echo "export PATH=$PATH" >> /env
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env
# We need to set this variable manually because
# https://github.com/pytorch/pytorch/blob/master/torch/abi-check.cpp
# sets the ABI to 0 by default
echo "export _GLIBCXX_USE_CXX11_ABI=1" >> /env
else
echo "Not upgrading gcc version"
fi
- run:
name: Build
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
- persist_to_workspace:
root: /
paths: final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
# executor (b/c we have to run the docker with --runtime=nvidia and we can't do
# that on the docker executor)
binary_linux_test: &binary_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace:
at: /home/circleci/project
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Prepare test code
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
binary_linux_upload: &binary_linux_upload
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- attach_workspace:
at: /home/circleci/project
- run:
<<: *binary_populate_env
- run:
<<: *binary_install_miniconda
- run:
name: Upload
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

View File

@ -1,218 +0,0 @@
##############################################################################
# Linux build defaults
##############################################################################
pytorch_linux_build_defaults: &pytorch_linux_build_defaults
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- checkout
- run:
<<: *setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
# Pull Docker image and run build
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
git submodule sync && git submodule update -q --init --recursive
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
NAMED_FLAG="export USE_NAMEDTENSOR=1"
fi
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Note [namedtensor build image]
# The namedtensor build uses the same docker image as
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
# distinguish between these two so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
docker push ${COMMIT_DOCKER_IMAGE}
fi
pytorch_linux_test_defaults: &pytorch_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- checkout
- run:
<<: *setup_ci_environment
- run:
name: Test
no_output_timeout: "90m"
command: |
set -e
# See Note [namedtensor build image]
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
if [ -n "${MULTI_GPU}" ]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
caffe2_linux_build_defaults: &caffe2_linux_build_defaults
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- checkout
- run:
<<: *setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
cat >/home/circleci/project/ci_build_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
sudo chown -R jenkins:jenkins '/opt/conda'
fi
# Build
./.jenkins/caffe2/build.sh
# Show sccache stats if it is running
if pgrep sccache > /dev/null; then
sccache --show-stats
fi
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_build_script.sh
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
docker push ${COMMIT_DOCKER_IMAGE}
fi
caffe2_linux_test_defaults: &caffe2_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Test
no_output_timeout: "1h"
command: |
set -e
# TODO: merge this into Caffe2 test.sh
cat >/home/circleci/project/ci_test_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...
sudo ln /dev/null /dev/raw1394
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
fi
# Upgrade SSL module to avoid old SSL warnings
pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1
pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
# Build
./.jenkins/caffe2/test.sh
# Remove benign core dumps.
# These are tests for signal handling (including SIGABRT).
rm -f ./crash/core.fatal_signal_as.*
rm -f ./crash/core.logging_test.*
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_test_script.sh
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -1,66 +0,0 @@
##############################################################################
# Macos binary build defaults
# The root of everything is /Users/distiller/pytorch-ci-env/workspace
##############################################################################
binary_mac_build: &binary_mac_build
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
binary_mac_upload: &binary_mac_upload
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
name: Upload
no_output_timeout: "10m"
command: |
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"
cat "$script"
source "$script"

View File

@ -1,85 +0,0 @@
##############################################################################
# Macos build defaults
##############################################################################
caffe2_macos_build_defaults: &caffe2_macos_build_defaults
macos:
xcode: "9.0"
steps:
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
brew install cmake
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# Reinitialize path (see man page for path_helper(8))
eval `/usr/libexec/path_helper -s`
# Use Homebrew Python if configured to do so
if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
fi
pip -q install numpy
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh
export PATH="${TMPDIR}/anaconda/bin:${PATH}"
source ${TMPDIR}/anaconda/bin/activate
fi
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
export SCCACHE_BIN=${PWD}/sccache_bin
mkdir -p ${SCCACHE_BIN}
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"
chmod a+x "${SCCACHE_BIN}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"
chmod a+x "${SCCACHE_BIN}/clang"
export PATH="${SCCACHE_BIN}:$PATH"
fi
# Build
if [ "${BUILD_IOS:-0}" -eq 1 ]; then
unbuffer scripts/build_ios.sh 2>&1 | ts
elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then
# All conda build logic should be in scripts/build_anaconda.sh
unbuffer scripts/build_anaconda.sh 2>&1 | ts
else
unbuffer scripts/build_local.sh 2>&1 | ts
fi
# Show sccache stats if it is running
if which sccache > /dev/null; then
sccache --show-stats
fi

View File

@ -48,21 +48,3 @@ binary_run_in_docker: &binary_run_in_docker
# This step only runs on circleci linux machine executors that themselves
# need to start docker images
command: ~/workspace/.circleci/scripts/binary_run_in_docker.sh
# This is copied almost verbatim from the macos_brew_update job
# In version 2.1 and above we could make this a command and pass a parameter to
# it, but in this version there is no way to pass a parameter to a step
binary_macos_brew_update: &binary_macos_brew_update
name: Brew update and install moreutils and expect
no_output_timeout: "1h"
command: |
set -eux -o pipefail
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
brew update
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect

View File

@ -1,64 +0,0 @@
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
# are runnable. Note that the pytorch repo is never cloned into these jobs
##############################################################################
smoke_linux_test: &smoke_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace:
at: /home/circleci/project
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -ex
cat >/home/circleci/project/ci_test_script.sh <<EOL
# The following code will be executed inside Docker container
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
- run:
<<: *binary_run_in_docker
smoke_mac_test: &smoke_mac_test
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh

View File

@ -0,0 +1,39 @@
pytorch_params: &pytorch_params
parameters:
build_environment:
type: string
default: ""
docker_image:
type: string
default: ""
resource_class:
type: string
default: "large"
use_cuda_docker_runtime:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
resource_class: << parameters.resource_class >>
pytorch_ios_params: &pytorch_ios_params
parameters:
build_environment:
type: string
default: ""
ios_arch:
type: string
default: ""
ios_platform:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
IOS_ARCH: << parameters.ios_arch >>
IOS_PLATFORM: << parameters.ios_platform >>

View File

@ -0,0 +1,117 @@
jobs:
pytorch_linux_build:
<<: *pytorch_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- checkout
- setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
# Pull Docker image and run build
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
time docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
# TODO We may want to move the rebase logic to a separate step after checkout
# Rebase to v1.3.0 only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "v1.3.0" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge v1.3.0 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
set -x
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/v1.3.0:refs/remotes/origin/v1.3.0
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/v1.3.0:refs/remotes/origin/v1.3.0 --depth=50 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/v1.3.0`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
git checkout -f ${GIT_COMMIT}
git reset --hard ${GIT_COMMIT}
git merge --no-edit --no-ff ${GIT_MERGE_TARGET}
set +x
else
echo "Do NOT merge v1.3.0 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
fi
git submodule sync && git submodule update -q --init --recursive
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
fi
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Note [Special build images]
# The namedtensor and xla builds use the same docker image as
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
time docker push ${COMMIT_DOCKER_IMAGE}
fi
pytorch_linux_test:
<<: *pytorch_params
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Test
no_output_timeout: "90m"
command: |
set -e
# See Note [Special build images]
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
export NAMED_FLAG="export BUILD_NAMEDTENSOR=1 && export TEST_NAMEDTENSOR=1"
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -5,44 +5,97 @@
# pytorch-ci-hud to adjust the list of whitelisted builds
# at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_build:
- binary_linux_build:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset3_build:
docker_image: "soumith/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build
build_environment: "manywheel 3.7m cu100 devtoolset7"
requires:
- setup
- binary_linux_conda_2.7_cpu_build:
docker_image: "soumith/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_conda_2_7_cpu_devtoolset7_build
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_build
docker_image: "soumith/conda-cuda"
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_build
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
build_environment: "libtorch 2.7m cpu devtoolset7"
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "soumith/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2.7m_cu90_devtoolset3_build
- binary_macos_wheel_3.6_cpu_build:
# - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build
- binary_mac_build:
name: binary_macos_wheel_3_6_cpu_build
build_environment: "wheel 3.6 cpu"
requires:
- setup
- binary_macos_conda_2.7_cpu_build:
- binary_mac_build:
name: binary_macos_conda_2_7_cpu_build
build_environment: "conda 2.7 cpu"
requires:
- setup
- binary_macos_libtorch_2.7_cpu_build:
- binary_mac_build:
name: binary_macos_libtorch_2_7_cpu_build
build_environment: "libtorch 2.7 cpu"
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_test:
- binary_linux_test:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset3_build
- binary_linux_manywheel_3.7m_cu100_devtoolset3_test:
- binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
docker_image: "soumith/manylinux-cuda100"
- binary_linux_test:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test
build_environment: "manywheel 3.7m cu100 devtoolset7"
requires:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset3_build
- binary_linux_conda_2.7_cpu_test:
- binary_linux_manywheel_3_7m_cu100_devtoolset7_build
docker_image: "soumith/manylinux-cuda100"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- binary_linux_test:
name: binary_linux_conda_2_7_cpu_devtoolset7_test
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
- binary_linux_conda_2.7_cpu_build
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_test:
# requires:
# - setup
# - binary_linux_conda_3.6_cu90_build
- binary_linux_conda_2_7_cpu_devtoolset7_build
docker_image: "soumith/conda-cuda"
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_test:
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_test
build_environment: "libtorch 2.7m cpu devtoolset7"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "soumith/manylinux-cuda100"
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"

View File

@ -0,0 +1,56 @@
- pytorch_linux_build:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
filters:
branches:
only: nightly
- pytorch_linux_build:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
filters:
branches:
only: nightly
- pytorch_linux_build:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
filters:
branches:
only: nightly
- pytorch_linux_build:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
filters:
branches:
only: nightly
- pytorch_android_gradle_build:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build
requires:
- nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build
- nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build
- nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build
filters:
branches:
only: nightly
- pytorch_android_publish_snapshot:
name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_android_publish_snapshot
requires:
- nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build
context: org-member
filters:
branches:
only: nightly

View File

@ -0,0 +1,31 @@
# Pytorch iOS binary builds
- binary_ios_build:
name: pytorch_ios_10_2_1_nightly_x86_64_build
build_environment: "libtorch-ios-10.2.1-nightly-x86_64-build"
ios_platform: "SIMULATOR"
ios_arch: "x86_64"
requires:
- setup
filters:
branches:
only: nightly
- binary_ios_build:
name: pytorch_ios_10_2_1_nightly_arm64_build
build_environment: "libtorch-ios-10.2.1-nightly-arm64-build"
ios_arch: "arm64"
ios_platform: "OS"
requires:
- setup
filters:
branches:
only: nightly
- binary_ios_upload:
build_environment: "libtorch-ios-10.2.1-nightly-binary-build-upload"
context: org-member
requires:
- setup
- pytorch_ios_10_2_1_nightly_x86_64_build
- pytorch_ios_10_2_1_nightly_arm64_build
filters:
branches:
only: nightly

View File

@ -0,0 +1,12 @@
- pytorch_android_gradle_build-x86_32:
name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_android_gradle_build:
name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

View File

@ -0,0 +1,13 @@
# Pytorch iOS PR builds
- pytorch_ios_build:
name: pytorch_ios_10_2_1_x86_64_build
build_environment: "pytorch-ios-10.2.1-x86_64_build"
ios_platform: "SIMULATOR"
requires:
- setup
- pytorch_ios_build:
name: pytorch_ios_10_2_1_arm64_build
build_environment: "pytorch-ios-10.2.1-arm64_build"
ios_arch: "arm64"
requires:
- setup

View File

@ -1,3 +1,4 @@
# Warning: indentation here matters!
# Pytorch MacOS builds
- pytorch_macos_10_13_py3_build:
@ -10,4 +11,3 @@
- pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
requires:
- setup

View File

@ -6,25 +6,30 @@
# These jobs are all idempotent and very lightweight; they just upload html
# files that track what binaries are available and what their sizes are.
update_s3_htmls:
triggers:
- schedule:
cron: "0 9 * * *"
jobs:
- setup:
filters:
branches:
only:
- master
jobs:
- setup
only: postnightly
- update_s3_htmls_for_nightlies:
context: org-member
requires:
- setup
filters:
branches:
only: postnightly
- update_s3_htmls_for_nightlies_devtoolset7:
context: org-member
requires:
- setup
filters:
branches:
only: postnightly
- upload_binary_sizes:
context: org-member
requires:
- setup
filters:
branches:
only: postnightly

View File

@ -7,6 +7,5 @@
# PR jobs pr builds
workflows:
version: 2
build:
jobs:

View File

@ -5,6 +5,7 @@ Checks: '
,bugprone-*
,-bugprone-forward-declaration-namespace
,-bugprone-macro-parentheses
,-bugprone-lambda-function-name
,cppcoreguidelines-*
,-cppcoreguidelines-interfaces-global-init
,-cppcoreguidelines-owning-memory

View File

@ -5,6 +5,8 @@ max-line-length = 120
# E501 is not flexible enough, we're using B950 instead
ignore =
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
# EXE001 is skipped for now because some files use shebang to determine Python version.
EXE001,
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!

1
.github/pytorch-probot.yml vendored Normal file
View File

@ -0,0 +1 @@
tracking_issue: 24422

48
.github/workflows/lint.yml vendored Normal file
View File

@ -0,0 +1,48 @@
name: Lint
on:
push:
branches:
- master
pull_request:
jobs:
flake8-py3:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.7.4
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@master
- name: Checkout PR tip
run: |
set -eux
if [ -z "${GITHUB_HEAD_REF}" ]; then
# We are on master, just set the SHA from our current location
echo ::set-output name=commit_sha::${GITHUB_SHA}
else
# We are on a PR, so actions/checkout leaves us on merge commit.
# Check out the actual tip of the branch.
PR_TIP=$(git rev-parse HEAD^2)
git checkout ${PR_TIP}
echo ::set-output name=commit_sha::${PR_TIP}
fi
id: get_pr_tip
- name: Run flake8
run: |
set -eux
pip install flake8
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
uses: pytorch/add-annotations-github-action@master
with:
check_name: 'flake8-py3'
linter_output_path: 'flake8-output.txt'
commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

26
.gitignore vendored
View File

@ -8,6 +8,9 @@
## PyTorch
.coverage
.gradle
.hypothesis
.mypy_cache
*/*.pyc
*/*.so*
@ -27,14 +30,14 @@ dist/
docs/src/**/*
docs/cpp/build
docs/cpp/source/api
log
test/.coverage
test/.hypothesis/
test/cpp/api/mnist
test/custom_operator/model.pt
test/data/gpu_tensors.pt
test/data/legacy_modules.t7
test/data/legacy_serialized.pt
test/data/linear.pt
test/data/*.pt
test/backward_compatibility/new_schemas.txt
dropout_model.pt
test/generated_type_hints_smoketest.py
test/htmlcov
@ -43,6 +46,8 @@ third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/
torch/__init__.pyi
torch/nn/functional.pyi
torch/nn/modules/*.pyi
torch/csrc/autograd/generated/*
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
@ -85,6 +90,7 @@ torch/version.py
# Root level file used in CI to specify certain env configs.
# E.g., see .circleci/config.yaml
env
.circleci/scripts/COMMIT_MSG
# IPython notebook checkpoints
.ipynb_checkpoints
@ -220,11 +226,6 @@ caffe2.egg-info
# Files generated by CLion
cmake-build-debug
# Files generated by ctags
CTAGS
tags
TAGS
# BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
#
# Below files are not deleted by "setup.py clean".
@ -239,3 +240,12 @@ TAGS
# Files generated when a patch is rejected
*.orig
*.rej
# Files generated by ctags
CTAGS
GTAGS
GRTAGS
GSYMS
GPATH
tags
TAGS

14
.gitmodules vendored
View File

@ -21,7 +21,7 @@
[submodule "third_party/protobuf"]
ignore = dirty
path = third_party/protobuf
url = https://github.com/google/protobuf.git
url = https://github.com/protocolbuffers/protobuf.git
[submodule "third_party/ios-cmake"]
ignore = dirty
path = third_party/ios-cmake
@ -57,7 +57,7 @@
[submodule "third-party/cpuinfo"]
ignore = dirty
path = third_party/cpuinfo
url = https://github.com/Maratyszcza/cpuinfo.git
url = https://github.com/pytorch/cpuinfo.git
[submodule "third_party/python-enum"]
ignore = dirty
path = third_party/python-enum
@ -81,7 +81,7 @@
[submodule "third_party/sleef"]
ignore = dirty
path = third_party/sleef
url = https://github.com/zdevito/sleef
url = https://github.com/shibatch/sleef
[submodule "third_party/ideep"]
ignore = dirty
path = third_party/ideep
@ -110,3 +110,11 @@
ignore = dirty
path = third_party/foxi
url = https://github.com/houseroad/foxi.git
[submodule "third_party/tbb"]
path = third_party/tbb
url = https://github.com/01org/tbb
branch = tbb_2018
[submodule "android/libs/fbjni"]
ignore = dirty
path = android/libs/fbjni
url = https://github.com/IvanKobzarev/fbjni.git

View File

@ -17,30 +17,44 @@ fi
caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')"
# Resnet50
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4
fi
# ResNext
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4
fi
# Shufflenet
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu --model shufflenet
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --model shufflenet
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2 --model shufflenet
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 --model shufflenet
fi

View File

@ -156,7 +156,9 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
build_args+=("TORCH_CUDA_ARCH_LIST=Maxwell")
# Explicitly set path to NVCC such that the symlink to ccache or sccache is used
build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")
if [ -n "${CACHE_WRAPPER_DIR}" ]; then
build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")
fi
# Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.
# Setting PATH to resolve to the right nvcc alone isn't enough.
@ -255,7 +257,7 @@ else
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
if [[ -n "${SCCACHE}" && $BUILD_ENVIRONMENT != *rocm* ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
@ -271,4 +273,9 @@ fi
# Install ONNX into a local directory
pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
bash tools/amd_build/unwrap_clang.sh
fi
report_compile_cache_stats

View File

@ -90,9 +90,6 @@ rocm_ignore_test=()
if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# Currently these tests are failing on ROCM platform:
# Unknown reasons, need to debug
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/piecewise_linear_transform_test.py")
# On ROCm, RCCL (distributed) development isn't complete.
# https://github.com/ROCmSoftwarePlatform/rccl
rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")
@ -101,6 +98,12 @@ fi
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
echo "Running Python tests.."
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
# locale setting is required by click package with py3
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
fi
pip install --user pytest-sugar
"$PYTHON" \
-m pytest \
@ -121,5 +124,15 @@ pip install --user pytest-sugar
#####################
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip install -q --user git+https://github.com/pytorch/vision.git
pip install -q --user ninja
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
# default pip version is too old(9.0.2), unable to support tag `manylinux2010`.
# Fix the pip error: Couldn't find a version that satisfies the requirement
sudo pip install --upgrade pip
pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==0.5.0.dev905
fi
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -33,7 +33,7 @@ export ASAN_OPTIONS=detect_leaks=0:symbolize=1
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
USE_ASAN=1 NO_CUDA=1 USE_MKLDNN=0 \
USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py install
assert_git_not_dirty

View File

@ -32,7 +32,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT"
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-py3-clang5-asan* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
@ -46,7 +46,7 @@ echo "CMake version:"
cmake --version
# TODO: Don't run this...
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
# TODO: Don't install this here
if ! which conda; then
@ -54,7 +54,7 @@ if ! which conda; then
# intel cpu and later run tests on machines with amd cpu.
# Also leave out two builds to make sure non-mkldnn builds still work.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then
pip install -q mkl mkl-devel
pip_install mkl mkl-devel
export USE_MKLDNN=1
else
export USE_MKLDNN=0
@ -65,9 +65,16 @@ fi
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
build_args=()
build_args+=("-DBUILD_CAFFE2_MOBILE=OFF")
build_args+=("-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')")
build_args+=("-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')")
if [[ "${BUILD_ENVIRONMENT}" == *-arm-v7a* ]]; then
build_args+=("-DANDROID_ABI=armeabi-v7a")
elif [[ "${BUILD_ENVIRONMENT}" == *-arm-v8a* ]]; then
build_args+=("-DANDROID_ABI=arm64-v8a")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_32* ]]; then
build_args+=("-DANDROID_ABI=x86")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then
build_args+=("-DANDROID_ABI=x86_64")
fi
export BUILD_PYTORCH_MOBILE=1
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
@ -90,7 +97,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++; do
for compiler in cc c++ gcc g++ clang clang++; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""
@ -108,6 +115,10 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer
# LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
bash tools/amd_build/unwrap_clang.sh
exit 0
fi
@ -126,7 +137,7 @@ if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0"
fi
if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-py3.6-gcc5.4* ]]; then
export DEBUG=1
fi
@ -136,6 +147,11 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
./xla/scripts/apply_patches.sh
fi
if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then
export CC=clang
export CXX=clang++
fi
# check that setup.py would fail with bad arguments
echo "The next three invocations are expected to fail with invalid command error messages."
@ -146,19 +162,24 @@ echo "The next three invocations are expected to fail with invalid command error
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
WERROR=1 python setup.py install
else
python setup.py install
fi
if which sccache > /dev/null; then
echo 'PyTorch Build Statistics'
sccache --show-stats
fi
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
@ -201,10 +222,10 @@ fi
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# TODO: Move this to Dockerfile.
pip install -q lark-parser
pip_install lark-parser
# Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642
sudo add-apt-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main"
sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
sudo apt-get -qq update
@ -235,7 +256,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
exit 1
fi
bazels3cache --bucket=ossci-compiler-cache-circleci-xla --maxEntrySizeBytes=0
bazels3cache --bucket=${XLA_CLANG_CACHE_S3_BUCKET_NAME} --maxEntrySizeBytes=0
pushd xla
export CC=clang-7 CXX=clang++-7
# Use cloud cache to build when available.

View File

@ -17,9 +17,21 @@ function cleanup {
set -ex
# Save the SCRIPT_DIR absolute path in case later we chdir (as occurs in the gpu perf test)
SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )"
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# Figure out which Python to use for ROCm
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
# non-interactive bashs do not expand aliases by default
shopt -s expand_aliases
export PYTORCH_TEST_WITH_ROCM=1
alias python="$PYTHON"
fi
# This token is used by a parser on Jenkins logs for determining
# if a failure is a legitimate problem, or a problem with the build
# system; to find out more, grep for this string in ossci-job-dsl.
@ -89,7 +101,7 @@ if which sccache > /dev/null; then
sccache --zero-stats
function sccache_epilogue() {
echo '=================== sccache compilation log ==================='
python "$(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py" ~/sccache_error.log
python "$SCRIPT_DIR/print_sccache_log.py" ~/sccache_error.log 2>/dev/null
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
@ -144,6 +156,16 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
fi
fi
function pip_install() {
# retry 3 times
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"
}
function pip_uninstall() {
# uninstall 2 times
pip uninstall -y "$@" || pip uninstall -y "$@"
}
function get_exit_code() {
set +e
"$@"

View File

@ -1,27 +1,11 @@
#!/bin/bash
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
export PATH="/usr/local/bin:$PATH"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh
bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3
fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
git submodule sync --recursive
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/
# Build PyTorch
if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
@ -30,7 +14,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
export NO_CUDA=0
export USE_CUDA=1
if [ -z "${IN_CIRCLECI}" ]; then
# Eigen gives "explicit specialization of class must precede its first use" error
@ -43,35 +27,29 @@ else
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${PYTORCH_ENV_DIR}/clang++"
chmod a+x "${PYTORCH_ENV_DIR}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${WORKSPACE_DIR}/clang++"
chmod a+x "${WORKSPACE_DIR}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"
chmod a+x "${PYTORCH_ENV_DIR}/clang"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${WORKSPACE_DIR}/clang"
chmod a+x "${WORKSPACE_DIR}/clang"
if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then
printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"
chmod a+x "${PYTORCH_ENV_DIR}/nvcc"
export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"
printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${WORKSPACE_DIR}/nvcc"
chmod a+x "${WORKSPACE_DIR}/nvcc"
export CUDA_NVCC_EXECUTABLE="${WORKSPACE_DIR}/nvcc"
fi
export PATH="${PYTORCH_ENV_DIR}:$PATH"
export PATH="${WORKSPACE_DIR}:$PATH"
fi
# If we run too many parallel jobs, we will OOM
export MAX_JOBS=2
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
python setup.py install
MAX_JOBS=2 USE_DISTRIBUTED=1 python setup.py install
assert_git_not_dirty
# Upload torch binaries when the build job is finished
if [ -z "${IN_CIRCLECI}" ]; then
7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
7z a ${IMAGE_COMMIT_TAG}.7z ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read
fi

View File

@ -0,0 +1,48 @@
#!/bin/bash
# Common prelude for macos-build.sh and macos-test.sh
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export PATH="/usr/local/bin:$PATH"
export WORKSPACE_DIR="${HOME}/workspace"
mkdir -p ${WORKSPACE_DIR}
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p ${WORKSPACE_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source ${WORKSPACE_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
# The torch.hub tests make requests to GitHub.
#
# The certifi package from conda-forge is new enough to make the
# following error disappear (included for future reference):
#
# > ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED]
# > certificate verify failed: unable to get local issuer certificate
# > (_ssl.c:1056)
#
conda install -y -c conda-forge certifi
# Needed by torchvision, which is imported from TestHub in test_utils.py.
conda install -y pillow
# Building with USE_DISTRIBUTED=1 requires libuv (for Gloo).
conda install -y libuv pkg-config
# Image commit tag is used to persist the build from the build job
# and to retrieve the build from the test job.
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
# These are required for both the build job and the test job.
# In the latter to test cpp extensions.
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang

View File

@ -1,23 +1,9 @@
#!/bin/bash
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export PATH="/usr/local/bin:$PATH"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh
bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3
fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six
conda install -y six
pip install -q hypothesis "librosa>=0.6.2" psutil
# faulthandler become built-in since 3.3
@ -26,12 +12,12 @@ if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1"
fi
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
fi
git submodule sync --recursive
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/
# Test PyTorch
if [ -z "${IN_CIRCLECI}" ]; then
@ -43,19 +29,12 @@ if [ -z "${IN_CIRCLECI}" ]; then
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
fi
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang
# If we run too many parallel jobs, we will OOM
export MAX_JOBS=2
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
# Download torch binaries in the test jobs
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z
7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"
7z x ${IMAGE_COMMIT_TAG}.7z -o"${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages"
fi
# Test that OpenMP is enabled
@ -67,6 +46,10 @@ fi
popd
test_python_all() {
# The CircleCI worker hostname doesn't resolve to an address.
# This environment variable makes ProcessGroupGloo default to
# using the address associated with the loopback interface.
export GLOO_SOCKET_IFNAME=lo0
echo "Ninja version: $(ninja --version)"
python test/run_test.py --verbose
assert_git_not_dirty
@ -116,6 +99,7 @@ test_custom_script_ops() {
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python test_custom_classes.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt

View File

@ -27,5 +27,9 @@ if [ -n "${IN_CIRCLECI}" ]; then
fi
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api
time python test/run_test.py --verbose -i distributed
time python test/run_test.py --verbose -i c10d
time python test/run_test.py --verbose -i c10d_spawn
assert_git_not_dirty

View File

@ -35,30 +35,30 @@ fi
# --user breaks ppc64le builds and these packages are already in ppc64le docker
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# JIT C++ extensions require ninja.
pip install -q ninja --user
pip_install --user ninja
# ninja is installed in /var/lib/jenkins/.local/bin
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
pip install -q hypothesis --user
pip_install --user hypothesis
# TODO: move this to Docker
PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)
echo $PYTHON_VERSION
if [[ $PYTHON_VERSION == "2" ]]; then
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl --user
else
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl --user
fi
# if [[ $PYTHON_VERSION == "2" ]]; then
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl
# else
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# fi
pip_install --user tb-nightly
# mypy will fail to install on Python <3.4. In that case,
# we just won't run these tests.
pip install mypy --user || true
pip_install --user mypy || true
fi
# faulthandler become built-in since 3.3
if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
pip install -q faulthandler --user
pip_install --user faulthandler
fi
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
@ -92,10 +92,6 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export PYTORCH_TEST_WITH_ROCM=1
fi
if [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX-* ]]; then
export ATEN_CPU_CAPABILITY=default
elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then
@ -108,7 +104,7 @@ test_python_nn() {
}
test_python_all_except_nn() {
time python test/run_test.py --exclude nn --verbose
time python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer
assert_git_not_dirty
}
@ -138,22 +134,7 @@ test_aten() {
}
test_torchvision() {
rm -rf ninja
echo "Installing torchvision at branch master"
rm -rf vision
# TODO: This git clone is bad, it means pushes to torchvision can break
# PyTorch CI
git clone https://github.com/pytorch/vision --quiet
pushd vision
# python setup.py install with a tqdm dependency is broken in the
# Travis Python nightly (but not in latest Python nightlies, so
# this should be a transient requirement...)
# See https://github.com/pytorch/pytorch/issues/7525
#time python setup.py install
pip install -q --user .
popd
rm -rf vision
pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d
}
test_libtorch() {
@ -162,13 +143,13 @@ test_libtorch() {
python test/cpp/jit/tests_setup.py setup
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/caffe2/bin/test_jit
"$CPP_BUILD"/caffe2/build/bin/test_jit
else
"$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"
"$CPP_BUILD"/caffe2/build/bin/test_jit "[cpu]"
fi
python test/cpp/jit/tests_setup.py shutdown
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/build/bin/test_api
assert_git_not_dirty
fi
}
@ -181,6 +162,7 @@ test_custom_script_ops() {
cp -a "$CUSTOM_OP_BUILD" build
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python test_custom_classes.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
@ -193,22 +175,47 @@ test_xla() {
export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"
pushd xla
python test/test_operations.py
echo "Running Python Tests"
./test/run_tests.sh
echo "Running MNIST Test"
python test/test_train_mnist.py --tidy
echo "Running C++ Tests"
pushd test/cpp
CC=clang-7 CXX=clang++-7 ./run_tests.sh
popd
assert_git_not_dirty
}
# Do NOT run this test before any other tests, like test_python_nn, etc.
# Because this function uninstalls the torch built from branch, and install
# nightly version.
test_backward_compatibility() {
set -x
pushd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
pip_uninstall torch
pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
python check_backward_compatibility.py --new-schemas new_schemas.txt
popd
set +x
assert_git_not_dirty
}
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then
test_backward_compatibility
# Do NOT add tests after bc check tests, see its comment.
elif [[ "${BUILD_ENVIRONMENT}" == *xla* || "${JOB_BASE_NAME}" == *xla* ]]; then
test_torchvision
test_xla
elif [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then
test_torchvision
test_python_nn
elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *-test2 || "${JOB_BASE_NAME}" == *-test2 ]]; then
test_python_all_except_nn
test_aten
test_libtorch

View File

@ -6,6 +6,12 @@ if "%DEBUG%" == "1" (
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%
:: This inflates our log size slightly, but it is REALLY useful to be
:: able to see what our cl.exe commands are (since you can actually
:: just copy-paste them into a local Windows setup to just rebuild a
:: single file.)
set CMAKE_VERBOSE_MAKEFILE=1
set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers
@ -69,16 +75,26 @@ set CXX=sccache cl
set CMAKE_GENERATOR=Ninja
:: The following code will try to build PyTorch twice if USE_CUDA is neither 0
:: nor 1. It is intended so that both builds can be folded into 1 CI run.
if not "%USE_CUDA%"=="1" (
if "%REBUILD%"=="" (
set NO_CUDA=1
:: Must save and restore the original value of USE_CUDA, otherwise the
:: `if not "%USE_CUDA%"=="0"` line can be messed up.
set OLD_USE_CUDA=%USE_CUDA%
set USE_CUDA=0
python setup.py install
set USE_CUDA=%OLD_USE_CUDA%
)
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
)
if not "%USE_CUDA%"=="0" (
:: sccache will fail for CUDA builds if all cores are used for compiling
if not defined MAX_JOBS set /A MAX_JOBS=%NUMBER_OF_PROCESSORS%-1
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
@ -93,13 +109,12 @@ if not "%USE_CUDA%"=="0" (
set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc
if "%REBUILD%"=="" set NO_CUDA=0
if "%REBUILD%"=="" set USE_CUDA=1
python setup.py install --cmake && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.
) else (
mv %CD%\build\bin\test_api.exe %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch\lib
7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && python %SCRIPT_HELPERS_DIR%\upload_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z
)
)

View File

@ -1,3 +1,5 @@
#!/usr/bin/env python
import os
import sys
import boto3

View File

@ -1,8 +1,8 @@
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output %TMP_DIR_WIN%\mkl.7z
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2019.4.245.7z --output %TMP_DIR_WIN%\mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z %TMP_DIR_WIN%\mkl.7z --quiet
aws s3 cp s3://ossci-windows/mkl_2019.4.245.7z %TMP_DIR_WIN%\mkl.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl
)

View File

@ -19,9 +19,10 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
if NOT "%BUILD_ENVIRONMENT%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba
:: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0
)
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow
:: No need to install faulthandler since we only test Python >= 3.6 on Windows
:: faulthandler is builtin since Python 3.3

View File

@ -1,5 +1,6 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
git submodule update --init --recursive third_party/pybind11
cd test\custom_operator
:: Build the custom operator library.
@ -23,8 +24,18 @@ popd
:: Run tests Python-side and export a script module.
python test_custom_ops.py -v
if ERRORLEVEL 1 exit /b 1
:: TODO: fix and re-enable this test
:: See https://github.com/pytorch/pytorch/issues/25155
:: python test_custom_classes.py -v
:: if ERRORLEVEL 1 exit /b 1
python model.py --export-script-module="build/model.pt"
if ERRORLEVEL 1 exit /b 1
:: Run tests C++-side and load the exported script module.
cd build
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_custom_ops.exe model.pt
if ERRORLEVEL 1 exit /b 1

View File

@ -1,9 +1,27 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
dir
dir %TMP_DIR_WIN%\build
dir %TMP_DIR_WIN%\build\torch
dir %TMP_DIR_WIN%\build\torch\lib
cd %TMP_DIR_WIN%\build\torch\lib
cd %TMP_DIR_WIN%\build\torch\bin
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*"
if errorlevel 1 exit /b 1
cd %TMP_DIR_WIN%\build\torch\test
for /r "." %%a in (*.exe) do (
call :libtorch_check "%%~na" "%%~fa"
)
goto :eof
:libtorch_check
:: See https://github.com/pytorch/pytorch/issues/25161
if "%~1" == "c10_metaprogramming_test" goto :eof
if "%~1" == "module_test" goto :eof
:: See https://github.com/pytorch/pytorch/issues/25312
if "%~1" == "converter_nomigraph_test" goto :eof
echo Running "%~2"
call "%~2"
if errorlevel 1 exit /b 1
goto :eof

View File

@ -1,2 +1,3 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd test && python run_test.py --exclude nn --verbose && cd ..
if ERRORLEVEL 1 exit /b 1

View File

@ -1,3 +1,5 @@
#!/usr/bin/env python
import os
import sys
import boto3

View File

@ -1,58 +0,0 @@
# https://travis-ci.org/pytorch/pytorch
language: python
dist: trusty
git:
submodules: false
# This reportedly works around an issue downloading packages from pypi on
# travis. Consider removing this after the underlying issue is fixed.
# https://github.com/travis-ci/travis-ci/issues/2389
sudo: false
matrix:
fast_finish: true
include:
- name: "Ensure consistent CircleCI YAML"
python: "3.6"
dist: xenial
script: cd .circleci && ./ensure-consistency.py
- name: "Shellcheck Jenkins scripts"
dist: xenial
install: sudo apt-get install -y shellcheck
script: .jenkins/run-shellcheck.sh
- name: "Ensure no tabs"
python: "2.7"
script:
- (! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))
- name: "Python 2.7 Lint"
python: "2.7"
install: pip install flake8
script: flake8
- name: "Python 3.7 Lint"
python: "3.7"
dist: xenial # required for Python 3.7 (travis-ci/travis-ci#9069)
sudo: required # required for Python 3.7 (travis-ci/travis-ci#9069)
install:
- pip install flake8 flake8-mypy flake8-comprehensions flake8-pyi mccabe pycodestyle pyflakes
# Apparently Facebook runs master of this one
# https://github.com/PyCQA/flake8-bugbear/issues/53
- pip install git+https://github.com/PyCQA/flake8-bugbear.git@d9444713a51a9fb6ee8cd2d88fca85e9ff0c2d58
script: flake8
- name: "MyPy typecheck"
python: "3.6"
install: pip install mypy mypy-extensions
script: mypy @mypy-files.txt
- name: "CPP doc check"
python: "3.6"
install:
- sudo apt-get install -y doxygen
- pip install -r requirements.txt
script: cd docs/cpp/source && ./check-doxygen.sh
- name: "clang tidy"
python: "3.6"
script: tools/run-clang-tidy-in-ci.sh
branches:
only:
- master
- /gh\/.*\/base/

View File

@ -5,11 +5,25 @@ cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
# Use compiler ID "AppleClang" instead of "Clang" for XCode.
# Not setting this sometimes makes XCode C compiler gets detected as "Clang",
# even when the C++ one is detected as "AppleClang".
cmake_policy(SET CMP0010 NEW)
cmake_policy(SET CMP0025 NEW)
# Suppress warning flags in default MSVC configuration. It's not
# mandatory that we do this (and we don't if cmake is old), but it's
# nice when it's possible, and it's possible on our Windows configs.
if(NOT CMAKE_VERSION VERSION_LESS 3.15.0)
cmake_policy(SET CMP0092 NEW)
endif()
# ---[ Project and semantic versioning.
project(Caffe2 CXX C)
if (${CMAKE_SYSTEM_NAME} STREQUAL "Linux")
set(LINUX TRUE)
else()
set(LINUX FALSE)
endif()
set(CMAKE_INSTALL_MESSAGE NEVER)
set(CMAKE_CXX_STANDARD 11)
@ -19,6 +33,7 @@ endif()
if (DEFINED GLIBCXX_USE_CXX11_ABI)
if (${GLIBCXX_USE_CXX11_ABI} EQUAL 1)
set(CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=1")
endif()
endif()
@ -62,12 +77,28 @@ if(APPLE)
set(CMAKE_MACOSX_RPATH ON)
endif()
if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")
set(CPU_INTEL ON)
else ()
set(CPU_INTEL OFF)
endif ()
# For non-supported platforms, turn USE_DISTRIBUTED off by default.
# It is not tested and likely won't work without additional changes.
if(NOT LINUX)
set(USE_DISTRIBUTED OFF CACHE STRING "Use distributed")
# On macOS, if USE_DISTRIBUTED is enabled (specified by the user),
# then make Gloo build with the libuv transport.
if(APPLE AND USE_DISTRIBUTED)
set(USE_LIBUV ON CACHE STRING "")
endif()
endif()
# ---[ Options.
# Note to developers: if you add an option below, make sure you also add it to
# cmake/Summary.cmake so that the summary prints out the option values.
include(CMakeDependentOption)
option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)
option(BUILD_ATEN_ONLY "Build only a subset focused on ATen only" OFF)
option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
@ -75,6 +106,8 @@ option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)
option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)
option(USE_STATIC_DISPATCH "Use static dispatch for ATen operators" OFF)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
"BUILD_SHARED_LIBS AND BUILD_CUSTOM_PROTOBUF" OFF)
@ -93,8 +126,10 @@ option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)
cmake_dependent_option(
USE_CUDNN "Use cuDNN" ON
"USE_CUDA" OFF)
option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" OFF)
option(NAMEDTENSOR_ENABLED "Experimental: compile with namedtensor support" OFF)
cmake_dependent_option(
USE_STATIC_CUDNN "Use cuDNN static libraries" OFF
"USE_CUDNN" OFF)
option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" ON)
option(USE_FFMPEG "Use ffmpeg" OFF)
option(USE_GFLAGS "Use GFLAGS" OFF)
option(USE_GLOG "Use GLOG" OFF)
@ -103,11 +138,20 @@ option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(USE_LMDB "Use LMDB" OFF)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
cmake_dependent_option(
USE_NCCL "Use NCCL" ON
"USE_CUDA;UNIX;NOT APPLE" OFF)
cmake_dependent_option(
USE_STATIC_NCCL "Use static NCCL" OFF
"USE_NCCL" OFF)
cmake_dependent_option(
USE_SYSTEM_NCCL "Use system-wide NCCL" OFF
"USE_NCCL" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)
option(USE_NUMA "Use NUMA (only available on Linux)" ON)
cmake_dependent_option(
USE_NUMA "Use NUMA. Only available on Linux." ON
"LINUX" OFF)
cmake_dependent_option(
USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF
"USE_CUDA" OFF)
@ -118,6 +162,7 @@ option(USE_OPENCV "Use OpenCV" OFF)
option(USE_OPENMP "Use OpenMP for parallel code" ON)
option(USE_PROF "Use profiling" OFF)
option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" ON)
option(USE_PYTORCH_QNNPACK "Use ATen/QNNPACK (quantized 8-bit operators)" ON)
option(USE_REDIS "Use Redis" OFF)
option(USE_ROCKSDB "Use RocksDB" OFF)
option(USE_SNPE "Use Qualcomm's SNPE library" OFF)
@ -126,7 +171,13 @@ option(USE_SYSTEM_EIGEN_INSTALL
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_ZMQ "Use ZMQ" OFF)
option(USE_ZSTD "Use ZSTD" OFF)
option(USE_MKLDNN "Use MKLDNN" OFF)
cmake_dependent_option(
USE_MKLDNN "Use MKLDNN. Only available on x86 and x86_64." ON
"CPU_INTEL" OFF)
set(MKLDNN_ENABLE_CONCURRENT_EXEC ${USE_MKLDNN})
cmake_dependent_option(
USE_MKLDNN_CBLAS "Use CBLAS in MKLDNN" OFF
"USE_MKLDNN" OFF)
option(USE_DISTRIBUTED "Use distributed" ON)
cmake_dependent_option(
USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON
@ -134,43 +185,94 @@ cmake_dependent_option(
cmake_dependent_option(
USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
cmake_dependent_option(
USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF
"USE_GLOO" OFF)
option(USE_TBB "Use TBB" OFF)
# Used when building Caffe2 through setup.py
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" ON)
# /Z7 override option
# When generating debug symbols, CMake default to use the flag /Zi.
# However, it is not compatible with sccache. So we rewrite it off.
# But some users don't use sccache; this override is for them.
option(MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON)
cmake_dependent_option(
MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON
"MSVC" OFF)
SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")
set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")
# This is a fix for a rare build issue on Ubuntu:
# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk
# https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu
if(LINUX)
set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-as-needed")
endif()
# For MSVC,
# 1. Replace /Zi and /ZI with /Z7
# 2. Switch off incremental linking in debug builds
if (MSVC)
if(MSVC_Z7_OVERRIDE)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
foreach(flag_var
CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
# Replace /Zi and /ZI with /Z7
if(MSVC_Z7_OVERRIDE)
if(${flag_var} MATCHES "/Z[iI]")
string(REGEX REPLACE "/Z[iI]" "/Z7" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/Z[iI]")
endforeach(flag_var)
endif(MSVC_Z7_OVERRIDE)
endif(MSVC_Z7_OVERRIDE)
# Turn off warnings on Windows. In an ideal world we'd be warning
# clean on Windows too, but this is too much work for our
# non-Windows developers.
#
# NB: Technically, this is not necessary if CMP0092 was applied
# properly, but only cmake >= 3.15 has this policy, so we nail
# it one more time just be safe.
#
# NB2: This is NOT enough to prevent warnings from nvcc on MSVC. At the
# moment only CMP0092 is enough to prevent those warnings too.
string(REPLACE "/W3" "" ${flag_var} "${${flag_var}}")
# Suppress EHs is overridden by EHa warning
string(REPLACE "/EHsc" "" ${flag_var} "${${flag_var}}")
# Turn off warnings (Windows build is currently is extremely warning
# unclean and the warnings aren't telling us anything useful.)
#
# Turn on EHa; I'm not altogether clear why we use the asynchronous
# exception handling model, but someone added it at some point, so
# keep using it.
string(APPEND ${flag_var} " /w /EHa")
if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
else()
if(${flag_var} MATCHES "/MT")
string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")
endif()
endif()
# /bigobj increases number of sections in .obj file, which is needed to link
# against libaries in Python 2.7 under Windows
set(${flag_var} "${${flag_var}} /MP /bigobj")
endforeach(flag_var)
foreach(flag_var
CMAKE_SHARED_LINKER_FLAGS_DEBUG CMAKE_STATIC_LINKER_FLAGS_DEBUG
CMAKE_EXE_LINKER_FLAGS_DEBUG CMAKE_MODULE_LINKER_FLAGS_DEBUG)
# Switch off incremental linking in debug builds
if(${flag_var} MATCHES "/INCREMENTAL" AND NOT ${flag_var} MATCHES "/INCREMENTAL:NO")
string(REGEX REPLACE "/INCREMENTAL" "/INCREMENTAL:NO" ${flag_var} "${${flag_var}}")
endif()
endforeach(flag_var)
# Turning off USE_DISTRIBUTED on default
set(USE_DISTRIBUTED OFF)
foreach(flag_var
CMAKE_SHARED_LINKER_FLAGS CMAKE_STATIC_LINKER_FLAGS
CMAKE_EXE_LINKER_FLAGS CMAKE_MODULE_LINKER_FLAGS)
string(APPEND ${flag_var} " /ignore:4049 /ignore:4217")
endforeach(flag_var)
# Try harder
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler /w -w")
endif(MSVC)
# Set INTERN_BUILD_MOBILE for all mobile builds. Components that are not
@ -179,6 +281,13 @@ if (ANDROID OR IOS)
set(INTERN_BUILD_MOBILE ON)
endif()
# Setting `PYTORCH_BUILD_MOBILE` environment variable can force it to do mobile
# build with host toolchain.
if (DEFINED ENV{PYTORCH_BUILD_MOBILE})
set(INTERN_BUILD_MOBILE ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")
endif()
# INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators.
# It's disabled for caffe2 mobile library.
if (INTERN_BUILD_MOBILE AND BUILD_CAFFE2_MOBILE)
@ -192,27 +301,21 @@ endif()
# When it's disabled it builds libtorch mobile library, which contains ATen/TH ops and native support for
# TorchScript model, but doesn't contain not-yet-unified caffe2 ops;
if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)
if (NOT BUILD_SHARED_LIBS)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DNO_EXPORT")
endif()
set(BUILD_PYTHON OFF)
set(BUILD_TORCH ON)
set(BUILD_CAFFE2_OPS OFF)
set(USE_DISTRIBUTED OFF)
set(FEATURE_TORCH_MOBILE ON)
endif()
if (BUILD_ATEN_ONLY)
set(BUILD_CAFFE2_OPS OFF)
set(BUILD_PYTHON OFF)
set(USE_NUMA OFF)
set(USE_LEVELDB OFF)
set(USE_GFLAGS OFF)
set(USE_GLOG OFF)
set(USE_NCCL OFF)
set(USE_NNPACK OFF)
set(USE_NUMPY OFF)
set(USE_OPENCV OFF)
set(USE_MKLDNN OFF)
set(USE_DISTRIBUTED OFF)
set(USE_LMDB OFF)
set(NO_API ON)
set(USE_FBGEMM OFF)
set(USE_PYTORCH_QNNPACK ON)
set(USE_QNNPACK OFF)
set(USE_STATIC_DISPATCH ON)
set(INTERN_DISABLE_ONNX ON)
set(INTERN_DISABLE_AUTOGRAD ON)
set(INTERN_USE_EIGEN_BLAS ON)
endif()
# ---[ Utils
@ -221,8 +324,12 @@ include(cmake/Utils.cmake)
include(cmake/public/utils.cmake)
# ---[ Version numbers for generated libraries
set(TORCH_DEFAULT_VERSION "1.0.0")
set(TORCH_DEFAULT_VERSION "1.1.0")
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")
if (DEFINED ENV{PYTORCH_BUILD_VERSION})
set(TORCH_BUILD_VERSION "$ENV{PYTORCH_BUILD_VERSION}"
CACHE STRING "Torch build version" FORCE)
endif()
if (NOT TORCH_BUILD_VERSION)
# An empty string was specified so force version to the default
set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"
@ -269,8 +376,16 @@ if(USE_FBGEMM)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")
endif()
if(NAMEDTENSOR_ENABLED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DNAMEDTENSOR_ENABLED")
if(BUILD_NAMEDTENSOR)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBUILD_NAMEDTENSOR")
endif()
if(USE_QNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_QNNPACK")
endif()
if(USE_PYTORCH_QNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_PYTORCH_QNNPACK")
endif()
# ---[ Whitelist file if whitelist is specified
@ -353,25 +468,6 @@ if(NOT MSVC)
set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-math-errno")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-trapping-math")
else()
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO)
if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
else()
if(${flag_var} MATCHES "/MT")
string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")
endif()
endif()
# /bigobj increases number of sections in .obj file, which is needed to link
# against libaries in Python 2.7 under Windows
set(${flag_var} "${${flag_var}} /MP /bigobj")
endforeach(flag_var)
endif()
if (USE_ASAN)
@ -418,6 +514,7 @@ include_directories(BEFORE ${PROJECT_SOURCE_DIR})
include_directories(BEFORE ${PROJECT_BINARY_DIR})
include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)
include_directories(BEFORE ${PROJECT_BINARY_DIR}/aten/src/)
# ---[ Main build
add_subdirectory(c10)

View File

@ -4,8 +4,18 @@
/docs/cpp @goldsborough @ebetica @yf225
/torch/csrc/api/ @ebetica @goldsborough @yf225
/test/cpp/api/ @ebetica @goldsborough @yf225
/torch/lib/c10d/ @apaszke @pietern @mrshenli
/torch/csrc/distributed/ @apaszke @pietern @mrshenli
/torch/lib/c10d/ @pietern @mrshenli
/torch/csrc/distributed/ @pietern @mrshenli
/torch/distributed/ @apaszke @pietern @mrshenli
/test/test_c10d.py @apaszke @pietern @mrshenli
/torch/utils/cpp_extension.py @goldsborough @fmassa @apaszke @soumith @ezyang
/test/test_c10d.py @pietern @mrshenli
/torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
# Not there to stricly require the approval, but to be tagged as a reviewer
# on the PRs to push them into a high priority inbox.
/torch/csrc/api/data/ @apaszke
/torch/csrc/autograd/ @apaszke
/torch/csrc/jit/ @apaszke
/torch/nn/ @apaszke
/torch/autograd/ @apaszke
/torch/jit/ @apaszke
/torch/utils/data/ @apaszke

View File

@ -185,7 +185,7 @@ pytest test/test_nn.py -k Loss -v
The above is an example of testing a change to Loss functions: this command runs tests such as
`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.
## Writing documentation
## Writing Documentation
PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
@ -204,7 +204,87 @@ We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
commands. To run this check locally, run `./check-doxygen.sh` from inside
`docs/cpp`.
## Managing multiple build trees
### Building Documentation
To build the documentation:
1. Build and install PyTorch
2. Install the prequesities
```bash
cd docs
pip install -r requirements.txt
# `katex` must also be available in your PATH.
# If you are using Ubuntu or Debian, you can install it with:
# sudo apt install katex
```
3. Generate the documentation HTML files. The generated files will be in `docs/build/html`.
```bash
cd docs
make html
```
4. To view HTML files, you must start an HTTP server. For example
```bash
# Start a server from the current directory (Python 3 only)
cd docs/build/html
python -m http.server
```
If you are developing on a remote machine, you can set up an SSH tunnel so that
you can access the HTTP server on the remote machine on your local machine. To map
remote port 8086 to local port 8086, use either of the following commands.
```bash
# For SSH
ssh my_machine -L 8086:my_machine:8086
# For Eternal Terminal
et my_machine -t="8086:8086"
```
Then navigate to `localhost:8086` in your web browser.
#### Tips
The `.rst` source files live in [docs/source](docs/source). Some of the `.rst`
files pull in docstrings from PyTorch Python code (for example, via
the `autofunction` or `autoclass` directives). To vastly shorten doc build times,
it is helpful to remove the files you are not working on, only keeping the base
`index.rst` file and the files you are editing. The Sphinx build will produce
missing file warnings but will still complete. For example, to work on `jit.rst`:
```bash
cd docs/source
ls | grep rst | grep -v index | grep -v jit | xargs rm
# Make your changes, build the docs, etc.
# Don't commit the deletions!
git add index.rst jit.rst
...
```
### Adding Documentation Tests
It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs
build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html),
which can run code in documentation as a unit test. To use the extension, use
the `.. testcode::` directive in your `.rst` and docstrings.
To manually run these tests, follow steps 1 and 2 above, then run:
```bash
cd docs
make doctest
```
## Managing Multiple Build Trees
One downside to using `python setup.py develop` is that your development
version of PyTorch will be installed globally on your account (e.g., if
@ -243,18 +323,27 @@ only interested in a specific component.
Caffe2 operators.
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.
- `DEBUG=1` will enable debug builds (-g -O0)
- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
- `USE_DISTRIBUTED=0` will disable distributed (c10d, gloo, mpi, etc.) build.
- `USE_MKLDNN=0` will disable using MKL-DNN.
- `USE_CUDA=0` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
- `BUILD_TEST=0` will disable building C++ test binaries.
- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).
- `USE_NNPACK=0` will disable compiling with NNPACK.
- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).
For example:
```bash
NO_CUDA=1 DEBUG=1 python setup.py develop
DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 python setup.py develop
```
Make sure you continue to pass these flags on subsequent builds.
For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build
options passed for the first time will persist; please run `ccmake build/`, run
`cmake-gui build/`, or directly edit `build/CMakeCache.txt` to adapt build
options.
### Code completion and IDE support
@ -349,6 +438,16 @@ ccache -F 0
# deploy (and add to ~/.bashrc for later)
export PATH="/usr/lib/ccache:$PATH"
```
#### Use a faster linker
If you are editing a single file and rebuilding in a tight loop, the time spent
linking will dominate. The system linker available in most Linux distributions
(GNU `ld`) is quite slow. Use a faster linker, like [lld](https://lld.llvm.org/).
The easiest way to use `lld` this is download the
[latest LLVM binaries](http://releases.llvm.org/download.html#8.0.0) and run:
```
ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld
```
## CUDA Development tips
@ -359,6 +458,39 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:
slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.
2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,
`cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).
3. CUDA supports a lot of C++11 features such as, `std::numeric_limits`, `std::nextafter`,
`std::tuple` etc. in device code. Many of such features are possible because of the
[--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)
nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)
that ROCm errors out on device code, which uses such stl functions.
4. A good performance metric for a CUDA kernel is the
[Effective Memory Bandwidth](https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/).
It is useful for you to measure this metric whenever you are writing/optimizing a CUDA
kernel. Following script shows how we can measure the effective bandwidth of CUDA `uniform_`
kernel.
```python
import torch
import time
size = 128*512
nrep = 100
nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
for i in range(10):
a=torch.Tensor(size).cuda().uniform_()
torch.cuda.synchronize()
start = time.time()
# dry run to alloc
out = a.uniform_()
torch.cuda.synchronize()
start = time.time()
for i in range(nrep):
out = a.uniform_()
torch.cuda.synchronize()
end = time.time()
timec = (end-start)/nrep
print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
size *=2
```
Hope this helps, and thanks for considering to contribute.
@ -472,6 +604,11 @@ static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");
This causes preprocessor tokens inside the literal like an`#endif` to be incorrectly
treated as preprocessor directives. See https://godbolt.org/z/eVTIJq as an example.
* Either MSVC or the Windows headers have a PURE macro defined and will replace
any occurrences of the PURE token in code with an empty string. This is why
we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE.
The same is likely true for other identifiers that we just didn't try to use yet.
### Running Clang-Tidy
[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++
@ -505,7 +642,8 @@ which is in PyTorch's `requirements.txt`.
### Pre-commit Tidy/Linting Hook
We use clang-tidy and flake8 (installed with flake-mypy) to perform additional
We use clang-tidy and flake8 (installed with flake8-bugbear,
flake8-comprehensions, flake8-mypy, and flake8-pyi) to perform additional
formatting and semantic checking of code. We provide a pre-commit git hook for
performing these checks, before a commit is created:
@ -517,6 +655,100 @@ You'll need to install an appropriately configured flake8; see
[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)
for documentation on how to do this.
### Building PyTorch with ASAN
[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very
useful for debugging memory errors in C++. We run it in CI, but here's how to
get the same thing to run on your local machine.
First, install LLVM 8. The easiest way is to get [prebuilt
binaries](http://releases.llvm.org/download.html#8.0.0) and extract them to
folder (later called `$LLVM_ROOT`).
Then set up the appropriate scripts. You can put this in your `.bashrc`:
```
LLVM_ROOT=<wherever your llvm install is>
PYTORCH_ROOT=<wherever your pytorch checkout is>
LIBASAN_RT="$LLVM_ROOT/lib/clang/8.0.0/lib/linux/libclang_rt.asan-x86_64.so"
build_with_asan()
{
LD_PRELOAD=${LIBASAN_RT} \
CC="$LLVM_ROOT/bin/clang" \
CXX="$LLVM_ROOT/bin/clang++" \
LDSHARED="clang --shared" \
LDFLAGS="-stdlib=libstdc++" \
CFLAGS="-fsanitize=address -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
NO_CUDA=1 USE_OPENMP=0 BUILD_CAFFE2_OPS=0 NO_DISTRIBUTED=1 DEBUG=1 \
python setup.py develop
}
run_with_asan()
{
LD_PRELOAD=${LIBASAN_RT} $@
}
# you can look at build-asan.sh to find the latest options the CI uses
export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true
export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PYTORCH_ROOT/ubsan.supp
export ASAN_SYMBOLIZER_PATH=$LLVM_ROOT/bin/llvm-symbolizer
```
Then you can use the scripts like:
```
suo-devfair ~/pytorch build_with_asan
suo-devfair ~/pytorch run_with_asan python test/test_jit.py
```
#### Getting `ccache` to work
The scripts above specify the `clang` and `clang++` binaries directly, which
bypasses `ccache`. Here's how to get `ccache` to work:
1. Make sure the ccache symlinks for `clang` and `clang++` are set up (see
CONTRIBUTING.md)
2. Make sure `$LLVM_ROOT/bin` is available on your `$PATH`.
3. Change the `CC` and `CXX` variables in `build_with_asan()` to point
directly to `clang` and `clang++`.
#### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?
The “standard” workflow for ASAN assumes you have a standalone binary:
1. Recompile your binary with `-fsanitize=address`.
2. Run the binary, and ASAN will report whatever errors it find.
Unfortunately, PyTorch is a distributed as a shared library that is loaded by
a third-party executable (Python). Its too much of a hassle to recompile all
of Python every time we want to use ASAN. Luckily, the ASAN folks have a
workaround for cases like this:
1. Recompile your library with `-fsanitize=address -shared-libasan`. The
extra `-shared-libasan` tells the compiler to ask for the shared ASAN
runtime library.
2. Use `LD_PRELOAD` to tell the dynamic linker to load the ASAN runtime
library before anything else.
More information can be found
[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso).
#### Why LD_PRELOAD in the build function?
We need `LD_PRELOAD` because there is a cmake check that ensures that a
simple program builds and runs. If we are building with ASAN as a shared
library, we need to `LD_PRELOAD` the runtime library, otherwise there will
dynamic linker errors and the check will fail.
We dont actually need either of these if we fix the cmake checks.
#### Why no Leak detection?
Python leaks a lot of memory. Possibly we could configure a suppression file,
but we havent gotten around to it.
## Caffe2 notes
In 2018, we merged Caffe2 into the PyTorch source repository. While the

View File

@ -159,7 +159,7 @@ If you want to compile with CUDA support, install
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 9 or above
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v7 or above
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
If you want to disable CUDA support, export environment variable `USE_CUDA=0`.
Other potentially useful environment variables may be found in `setup.py`.
If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to [are available here](https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/)
@ -212,27 +212,69 @@ If the version of Visual Studio 2017 is higher than 15.4.5, installing of "VC++
NVTX is a part of CUDA distributive, where it is called "Nsight Compute". For installing it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.
Be sure that CUDA with Nsight Compute is installed after Visual Studio 2017.
Currently VS 2017, VS 2019 and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise it will use VS 2017.
<br/> If Ninja is selected as the generator, the latest MSVC which is newer than VS 2015 (14.0) will get selected as the underlying toolchain if you have Python > 3.5, otherwise VS 2015 will be selected so you'll have to activate the environment. If you use CMake <= 3.14.2 and has VS 2019 installed, then even if you specify VS 2017 as the generator, VS 2019 will get selected as the generator.
CUDA and MSVC has strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).
| CUDA version | Newest supported VS version |
| ------------ | ------------------------------------------------------- |
| 9.0 / 9.1 | Visual Studio 2017 Update 4 (15.4) (`_MSC_VER` <= 1911) |
| 9.2 | Visual Studio 2017 Update 5 (15.5) (`_MSC_VER` <= 1912) |
| 10.0 | Visual Studio 2017 (15.X) (`_MSC_VER` < 1920) |
| 10.1 | Visual Studio 2019 (16.X) (`_MSC_VER` < 1930) |
```cmd
cmd
REM [Optional] The following two lines are needed for Python 2.7, but the support for it is very experimental.
:: [Optional] Only add the next two lines if you need Python 2.7. If you use Python 3, ignore these two lines.
set MSSdk=1
set FORCE_PY27_BUILD=1
set CMAKE_GENERATOR=Visual Studio 15 2017 Win64
set DISTUTILS_USE_SDK=1
:: [Optional] If you want to build with VS 2019 generator, please change the value in the next line to `Visual Studio 16 2019`.
:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.
set CMAKE_GENERATOR=Visual Studio 15 2017
REM Run "Visual Studio 2017 Developer Command Prompt"
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.11
:: Read the content in the previous section carefully before you preceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
:: It's an essential step if you use Python 3.5.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.11
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%
:: [Optional] If you want to override the cuda host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe
python setup.py install
```
##### Adjust Build Options (Optional)
You can adjust the configuration of cmake variables optionally (without building first), by doing
the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done
with such a step.
On Linux
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build # or cmake-gui build
```
On macOS
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build # or cmake-gui build
```
### Docker Image
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build from pytorch repo directory as docker needs to copy git repo into docker filesystem while building the image.
```
docker build -t pytorch -f docker/pytorch/Dockerfile .
docker build -t pytorch -f docker/pytorch/Dockerfile . # [optional] --build-arg WITH_TORCHVISION=0
```
You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,

16
android/.gitignore vendored Normal file
View File

@ -0,0 +1,16 @@
local.properties
**/*.iml
.gradle
gradlew*
gradle/wrapper
.idea/*
.externalNativeBuild
build
pytorch_android/src/main/cpp/libtorch_include/x86/**
pytorch_android/src/main/cpp/libtorch_include/x86_64/**
pytorch_android/src/main/cpp/libtorch_include/armeabi-v7a/**
pytorch_android/src/main/cpp/libtorch_include/arm64-v8a/**
pytorch_android/src/main/jniLibs/x86/**
pytorch_android/src/main/jniLibs/x86_64/**
pytorch_android/src/main/jniLibs/armeabi-v7a/**
pytorch_android/src/main/jniLibs/arm64-v8a/**

41
android/build.gradle Normal file
View File

@ -0,0 +1,41 @@
buildscript {
ext {
minSdkVersion = 21
targetSdkVersion = 28
compileSdkVersion = 28
buildToolsVersion = '28.0.3'
coreVersion = "1.2.0"
extJUnitVersion = "1.1.1"
runnerVersion = "1.2.0"
rulesVersion = "1.2.0"
junitVersion = "4.12"
}
repositories {
google()
mavenLocal()
mavenCentral()
jcenter()
}
dependencies {
classpath 'com.android.tools.build:gradle:3.3.2'
classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"
classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
}
}
allprojects {
repositories {
google()
jcenter()
}
}
ext.isPublishing = { ['uploadArchives', 'bintrayUpload'].any { gradle.startParameter.taskNames.contains(it) } }
ext.deps = [
jsr305: 'com.google.code.findbugs:jsr305:3.0.1',
]

24
android/gradle.properties Normal file
View File

@ -0,0 +1,24 @@
ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64
VERSION_NAME=1.3.0
GROUP=org.pytorch
MAVEN_GROUP=org.pytorch
POM_URL=https://github.com/pytorch/pytorch/tree/master/android
POM_SCM_URL=https://github.com/pytorch/pytorch.git
POM_SCM_CONNECTION=scm:git:https://github.com/pytorch/pytorch
POM_SCM_DEV_CONNECTION=scm:git:git@github.com:pytorch/pytorch.git
POM_LICENSE_NAME=BSD 3-Clause
POM_LICENSE_URL=https://github.com/pytorch/pytorch/blob/master/LICENSE
POM_ISSUES_URL=https://github.com/pytorch/pytorch/issues
POM_LICENSE_DIST=repo
POM_DEVELOPER_ID=pytorch
POM_DEVELOPER_NAME=pytorch
syncWithMavenCentral=true
GRADLE_BINTRAY_PLUGIN_VERSION=1.8.0
GRADLE_VERSIONS_PLUGIN_VERSION=0.15.0
ANDROID_MAVEN_GRADLE_PLUGIN_VERSION=2.1
# Gradle internals
org.gradle.internal.repository.max.retries=1
org.gradle.jvmargs=-XX:MaxMetaspaceSize=1024m

View File

@ -0,0 +1,32 @@
apply plugin: 'com.github.dcendents.android-maven'
version = VERSION_NAME
group = GROUP
project.archivesBaseName = POM_ARTIFACT_ID
install {
repositories.mavenInstaller {
pom.project {
name POM_NAME
artifactId POM_ARTIFACT_ID
packaging POM_PACKAGING
description POM_DESCRIPTION
url projectUrl
scm {
url scmUrl
connection scmConnection
developerConnection scmDeveloperConnection
}
licenses projectLicenses
developers {
developer {
id developerId
name developerName
}
}
}
}
}

View File

@ -0,0 +1,95 @@
import java.nio.file.Files
import java.nio.file.Paths
import java.io.FileOutputStream
import java.util.zip.ZipFile
// Android tasks for Javadoc and sources.jar generation
afterEvaluate { project ->
if (POM_PACKAGING == 'aar') {
task androidJavadoc(type: Javadoc, dependsOn: assembleDebug) {
source += files(android.sourceSets.main.java.srcDirs)
failOnError false
// This task will try to compile *everything* it finds in the above directory and
// will choke on text files it doesn't understand.
exclude '**/BUCK'
exclude '**/*.md'
}
task androidJavadocJar(type: Jar, dependsOn: androidJavadoc) {
classifier = 'javadoc'
from androidJavadoc.destinationDir
}
task androidSourcesJar(type: Jar) {
classifier = 'sources'
from android.sourceSets.main.java.srcDirs
}
android.libraryVariants.all { variant ->
def name = variant.name.capitalize()
task "jar${name}"(type: Jar, dependsOn: variant.javaCompileProvider) {
from variant.javaCompileProvider.get().destinationDir
}
androidJavadoc.doFirst {
classpath += files(android.bootClasspath)
classpath += files(variant.javaCompileProvider.get().classpath.files)
// This is generated by `assembleDebug` and holds the JARs generated by the APT.
classpath += fileTree(dir: "$buildDir/intermediates/bundles/debug/", include: '**/*.jar')
// Process AAR dependencies
def aarDependencies = classpath.filter { it.name.endsWith('.aar') }
classpath -= aarDependencies
aarDependencies.each { aar ->
// Extract classes.jar from the AAR dependency, and add it to the javadoc classpath
def outputPath = "$buildDir/tmp/aarJar/${aar.name.replace('.aar', '.jar')}"
classpath += files(outputPath)
// Use a task so the actual extraction only happens before the javadoc task is run
dependsOn task(name: "extract ${aar.name}").doLast {
extractEntry(aar, 'classes.jar', outputPath)
}
}
}
}
artifacts.add('archives', androidJavadocJar)
artifacts.add('archives', androidSourcesJar)
}
if (POM_PACKAGING == 'jar') {
task javadocJar(type: Jar, dependsOn: javadoc) {
classifier = 'javadoc'
from javadoc.destinationDir
}
task sourcesJar(type: Jar, dependsOn: classes) {
classifier = 'sources'
from sourceSets.main.allSource
}
artifacts.add('archives', javadocJar)
artifacts.add('archives', sourcesJar)
}
}
// Utility method to extract only one entry in a zip file
private def extractEntry(archive, entryPath, outputPath) {
if (!archive.exists()) {
throw new GradleException("archive $archive not found")
}
def zip = new ZipFile(archive)
zip.entries().each {
if (it.name == entryPath) {
def path = Paths.get(outputPath)
if (!Files.exists(path)) {
Files.createDirectories(path.getParent())
Files.copy(zip.getInputStream(it), path)
}
}
}
zip.close()
}

View File

@ -0,0 +1,64 @@
apply plugin: 'com.jfrog.bintray'
def getBintrayUsername() {
return project.hasProperty('bintrayUsername') ? property('bintrayUsername') : System.getenv('BINTRAY_USERNAME')
}
def getBintrayApiKey() {
return project.hasProperty('bintrayApiKey') ? property('bintrayApiKey') : System.getenv('BINTRAY_API_KEY')
}
def getBintrayGpgPassword() {
return project.hasProperty('bintrayGpgPassword') ? property('bintrayGpgPassword') : System.getenv('BINTRAY_GPG_PASSWORD')
}
def getMavenCentralUsername() {
return project.hasProperty('mavenCentralUsername') ? property('mavenCentralUsername') : System.getenv('MAVEN_CENTRAL_USERNAME')
}
def getMavenCentralPassword() {
return project.hasProperty('mavenCentralPassword') ? property('mavenCentralPassword') : System.getenv('MAVEN_CENTRAL_PASSWORD')
}
def shouldSyncWithMavenCentral() {
return project.hasProperty('syncWithMavenCentral') ? property('syncWithMavenCentral').toBoolean() : false
}
def dryRunOnly() {
return project.hasProperty('dryRun') ? property('dryRun').toBoolean() : false
}
bintray {
user = getBintrayUsername()
key = getBintrayApiKey()
override = false
configurations = ['archives']
pkg {
repo = bintrayRepo
userOrg = bintrayUserOrg
name = bintrayName
desc = bintrayDescription
websiteUrl = projectUrl
issueTrackerUrl = issuesUrl
vcsUrl = scmUrl
licenses = [ POM_LICENSE_NAME ]
dryRun = dryRunOnly()
override = false
publish = true
publicDownloadNumbers = true
version {
name = versionName
desc = bintrayDescription
gpg {
sign = true
passphrase = getBintrayGpgPassword()
}
mavenCentralSync {
sync = shouldSyncWithMavenCentral()
user = getMavenCentralUsername()
password = getMavenCentralPassword()
close = '1' // If set to 0, you have to manually click release
}
}
}
}

View File

@ -0,0 +1,81 @@
apply plugin: 'signing'
version = VERSION_NAME
group = MAVEN_GROUP
def isReleaseBuild() {
return !VERSION_NAME.contains('SNAPSHOT')
}
def getReleaseRepositoryUrl() {
return hasProperty('RELEASE_REPOSITORY_URL') ? RELEASE_REPOSITORY_URL
: "https://oss.sonatype.org/service/local/staging/deploy/maven2/"
}
def getSnapshotRepositoryUrl() {
return hasProperty('SNAPSHOT_REPOSITORY_URL') ? SNAPSHOT_REPOSITORY_URL
: "https://oss.sonatype.org/content/repositories/snapshots/"
}
def getRepositoryUsername() {
return hasProperty('SONATYPE_NEXUS_USERNAME') ? SONATYPE_NEXUS_USERNAME : ""
}
def getRepositoryPassword() {
return hasProperty('SONATYPE_NEXUS_PASSWORD') ? SONATYPE_NEXUS_PASSWORD : ""
}
afterEvaluate { project ->
uploadArchives {
repositories {
mavenDeployer {
beforeDeployment { MavenDeployment deployment -> signing.signPom(deployment) }
pom.groupId = MAVEN_GROUP
pom.artifactId = POM_ARTIFACT_ID
pom.version = VERSION_NAME
repository(url: getReleaseRepositoryUrl()) {
authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())
}
snapshotRepository(url: getSnapshotRepositoryUrl()) {
authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())
}
pom.project {
name POM_NAME
packaging POM_PACKAGING
description POM_DESCRIPTION
url POM_URL
scm {
url POM_SCM_URL
connection POM_SCM_CONNECTION
developerConnection POM_SCM_DEV_CONNECTION
}
licenses {
license {
name POM_LICENSE_NAME
url POM_LICENSE_URL
distribution POM_LICENSE_DIST
}
}
developers {
developer {
id POM_DEVELOPER_ID
name POM_DEVELOPER_NAME
}
}
}
}
}
}
signing {
required { isReleaseBuild() && gradle.taskGraph.hasTask('uploadArchives') }
sign configurations.archives
}
}

View File

@ -0,0 +1,5 @@
apply from: rootProject.file('gradle/android_tasks.gradle')
apply from: rootProject.file('gradle/release_bintray.gradle')
apply from: rootProject.file('gradle/gradle_maven_push.gradle')

View File

@ -0,0 +1,32 @@
ext {
bintrayRepo = 'maven'
bintrayUserOrg = 'pytorch'
bintrayName = "${GROUP}:${POM_ARTIFACT_ID}"
bintrayDescription = POM_DESCRIPTION
projectUrl = POM_URL
issuesUrl = POM_ISSUES_URL
scmUrl = POM_SCM_URL
scmConnection = POM_SCM_CONNECTION
scmDeveloperConnection = POM_SCM_DEV_CONNECTION
publishedGroupId = GROUP
libraryName = 'pytorch_android'
artifact = 'pytorch_android'
developerId = POM_DEVELOPER_ID
developerName = POM_DEVELOPER_NAME
versionName = VERSION_NAME
projectLicenses = {
license = {
name = POM_LICENSE_NAME
url = POM_LICENSE_URL
distribution = POM_LICENSE_DIST
}
}
}
apply from: rootProject.file('gradle/android_maven_install.gradle')
apply from: rootProject.file('gradle/bintray.gradle')

1
android/libs/fbjni Submodule

Submodule android/libs/fbjni added at dc916917e1

View File

@ -0,0 +1,47 @@
apply plugin: 'com.android.library'
apply plugin: 'maven'
android {
compileSdkVersion rootProject.compileSdkVersion
buildToolsVersion rootProject.buildToolsVersion
defaultConfig {
minSdkVersion rootProject.minSdkVersion
targetSdkVersion rootProject.targetSdkVersion
sourceSets {
main {
manifest.srcFile '../fbjni/ApplicationManifest.xml'
java {
srcDir '../fbjni/java'
}
}
}
}
buildTypes {
debug {
minifyEnabled false
}
release {
minifyEnabled false
}
}
externalNativeBuild {
cmake {
path "../fbjni/CMakeLists.txt"
}
}
}
dependencies {
compileOnly 'com.google.code.findbugs:jsr305:3.0.1'
}
apply from: rootProject.file('gradle/release.gradle')
task sourcesJar(type: Jar) {
from android.sourceSets.main.java.srcDirs
classifier = 'sources'
}
artifacts.add('archives', sourcesJar)

View File

@ -0,0 +1,4 @@
POM_NAME=pytorch_android_fbjni
POM_DESCRIPTION=pytorch_android_fbjni
POM_ARTIFACT_ID=pytorch_android_fbjni
POM_PACKAGING=aar

View File

@ -0,0 +1,63 @@
cmake_minimum_required(VERSION 3.4.1)
project(pytorch CXX)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_VERBOSE_MAKEFILE ON)
set(pytorch_android_DIR ${CMAKE_CURRENT_LIST_DIR}/src/main/cpp)
set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})
message(STATUS "libtorch dir:${libtorch_DIR}")
file(GLOB pytorch_android_SOURCES
${pytorch_android_DIR}/*.cpp
)
add_library(pytorch SHARED
${pytorch_android_SOURCES}
)
target_compile_options(pytorch PRIVATE
-fexceptions
)
target_include_directories(pytorch PUBLIC
${libtorch_include_DIR}
)
set(BUILD_DIR ${CMAKE_SOURCE_DIR}/build)
file(MAKE_DIRECTORY ${BUILD_DIR})
set(fbjni_DIR ${CMAKE_CURRENT_LIST_DIR}/../libs/fbjni/)
set(fbjni_BUILD_DIR ${BUILD_DIR}/fbjni/${ANDROID_ABI})
add_subdirectory(${fbjni_DIR} ${fbjni_BUILD_DIR})
function(import_static_lib name)
add_library(${name} STATIC IMPORTED)
set_property(
TARGET ${name}
PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)
endfunction(import_static_lib)
import_static_lib(libtorch)
import_static_lib(libc10)
import_static_lib(libnnpack)
import_static_lib(libpytorch_qnnpack)
import_static_lib(libeigen_blas)
import_static_lib(libcpuinfo)
import_static_lib(libclog)
target_link_libraries(pytorch
fbjni
-Wl,--gc-sections
-Wl,--whole-archive
libtorch
-Wl,--no-whole-archive
libc10
libnnpack
libpytorch_qnnpack
libeigen_blas
libcpuinfo
libclog
)

View File

@ -0,0 +1,75 @@
apply plugin: 'com.android.library'
apply plugin: 'maven'
android {
compileSdkVersion rootProject.compileSdkVersion
buildToolsVersion rootProject.buildToolsVersion
defaultConfig {
minSdkVersion rootProject.minSdkVersion
targetSdkVersion rootProject.targetSdkVersion
versionCode 0
versionName "0.1"
testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
ndk {
abiFilters ABI_FILTERS.split(",")
}
}
buildTypes {
debug {
minifyEnabled false
debuggable true
}
release {
minifyEnabled false
}
}
sourceSets {
main {
jniLibs.srcDirs = ['src/main/jniLibs']
}
}
externalNativeBuild {
cmake {
path "CMakeLists.txt"
}
}
packagingOptions {
if (rootProject.isPublishing()) {
exclude '**/libfbjni.so'
} else {
pickFirst '**/libfbjni.so'
}
}
useLibrary 'android.test.runner'
useLibrary 'android.test.base'
useLibrary 'android.test.mock'
}
dependencies {
api project(':fbjni')
implementation 'com.android.support:appcompat-v7:28.0.0'
testImplementation 'junit:junit:' + rootProject.junitVersion
testImplementation 'androidx.test:core:' + rootProject.coreVersion
androidTestImplementation 'junit:junit:' + rootProject.junitVersion
androidTestImplementation 'androidx.test:core:' + rootProject.coreVersion
androidTestImplementation 'androidx.test.ext:junit:' + rootProject.extJUnitVersion
androidTestImplementation 'androidx.test:rules:' + rootProject.rulesVersion
androidTestImplementation 'androidx.test:runner:' + rootProject.runnerVersion
}
apply from: rootProject.file('gradle/release.gradle')
task sourcesJar(type: Jar) {
from android.sourceSets.main.java.srcDirs
classifier = 'sources'
}
artifacts.add('archives', sourcesJar)

View File

@ -0,0 +1,111 @@
import torch
OUTPUT_DIR = "src/androidTest/assets/"
def scriptAndSave(module, fileName):
print('-' * 80)
script_module = torch.jit.script(module)
print(script_module.graph)
outputFileName = OUTPUT_DIR + fileName
script_module.save(outputFileName)
print("Saved to " + outputFileName)
print('=' * 80)
class Test(torch.jit.ScriptModule):
def __init__(self):
super(Test, self).__init__()
@torch.jit.script_method
def forward(self, input):
return None
@torch.jit.script_method
def eqBool(self, input):
# type: (bool) -> bool
return input
@torch.jit.script_method
def eqInt(self, input):
# type: (int) -> int
return input
@torch.jit.script_method
def eqFloat(self, input):
# type: (float) -> float
return input
@torch.jit.script_method
def eqStr(self, input):
# type: (str) -> str
return input
@torch.jit.script_method
def eqTensor(self, input):
# type: (Tensor) -> Tensor
return input
@torch.jit.script_method
def eqDictStrKeyIntValue(self, input):
# type: (Dict[str, int]) -> Dict[str, int]
return input
@torch.jit.script_method
def eqDictIntKeyIntValue(self, input):
# type: (Dict[int, int]) -> Dict[int, int]
return input
@torch.jit.script_method
def eqDictFloatKeyIntValue(self, input):
# type: (Dict[float, int]) -> Dict[float, int]
return input
@torch.jit.script_method
def listIntSumReturnTuple(self, input):
# type: (List[int]) -> Tuple[List[int], int]
sum = 0
for x in input:
sum += x
return (input, sum)
@torch.jit.script_method
def listBoolConjunction(self, input):
# type: (List[bool]) -> bool
res = True
for x in input:
res = res and x
return res
@torch.jit.script_method
def listBoolDisjunction(self, input):
# type: (List[bool]) -> bool
res = False
for x in input:
res = res or x
return res
@torch.jit.script_method
def tupleIntSumReturnTuple(self, input):
# type: (Tuple[int, int, int]) -> Tuple[Tuple[int, int, int], int]
sum = 0
for x in input:
sum += x
return (input, sum)
@torch.jit.script_method
def optionalIntIsNone(self, input):
# type: (Optional[int]) -> bool
return input is None
@torch.jit.script_method
def intEq0None(self, input):
# type: (int) -> Optional[int]
if input == 0:
return None
return input
@torch.jit.script_method
def str3Concat(self, input):
# type: (str) -> str
return input + input + input
scriptAndSave(Test(), "test.pt")

Some files were not shown because too many files have changed in this diff Show More